Csaba Szepesvari @CsabaSzepesvari Twitter profile

Last Seen Profiles

@SestraResista1

@suksun24488

@cm96251

@lschmidt3

@stephbjork

@Gayhealth

@stephbjork

@luka69535781

@lee_axtm

@zarko553640

@Bryle_c00

@janelletscott

@catalin_doh

@bicarafilmid

@MonetShur

@_v1ru_

@matsutake_cat

@mythiquebitch

@luri_juli

@GlackenLil28905

@tuiopay

@bobfloydsbf

@EmilyEmphraim00

@DelranSTEM

@Miss_NeliiZ

@NexosProxies

@FPLMeets

@luka69535781

@hesaidmag

@sharandhaliwal_

@enyi_ezekiel

@QueeenBee27

@niyahjaay__

@dena_news

@NehkuVT

@BlueOrchidYay

Csaba Szepesvari

@CsabaSzepesvari

2 years

This semester I'll teach an undergraduate "intro to RL" course at the UofA. For the first lecture, I collected some exciting, recent, impactful applications of RL. Link to the relevant slides: I thought this may be worthwhile to share.

27

100

760

Csaba Szepesvari

@CsabaSzepesvari

6 years

Yours truly and his coauthor Tor Lattimore happily present the near-final draft of their upcoming bandit book at The pdf will stay free. In this phase we welcome reader comments. The book will be printed by #CambrideUniversityPress . Please share:)

7

174

422

Csaba Szepesvari

@CsabaSzepesvari

4 years

Interested in hearing about the theoretical foundations of RL from a multidisciplinary perspective (CS, control, stats, OR)? If so, join us at the (all virtual) RL Theory Bootcamp at the Simons Institute next week. Lectures in the morning and the afternoon ==>

4

79

387

Csaba Szepesvari

@CsabaSzepesvari

5 years

After a 2 year break, I'll be teaching in the fall a grad course. Go Bandits!

8

43

359

Csaba Szepesvari

@CsabaSzepesvari

5 years

Glad to announce the "Theory of RL" program at the Simons Institute in the Fall of 2020. DM me if you are interested! @SebastienBubeck @EmmaBrunskill Alan Malek @SeanMeyn Ambuj Tewari and Mengdi Wang are my awesome coorganizers.

Theory of Reinforcement Learning

This program will bring together researchers in computer science, control theory, operations research and statistics to advance the theoretical foundations of reinforcement learning.

simons.berkeley.edu

4

41

227

Csaba Szepesvari

@CsabaSzepesvari

5 years

Is RL used in real applications? If so, how and where? And if not, why not and how can this be fixed? Join our excellent panelists and speakers at the half-day RL2 workshop organized at @icmlconf or submit a paper to present your views.

RL4RealLife@NeurIPS2022

Website @ NeurIPS 2022 (videos, posters, etc.)

sites.google.com

3

22

184

Csaba Szepesvari

@CsabaSzepesvari

4 years

I feel very much honoured to be selected for this role. To make the best of this job, hive mind of ML people on twitter, if you have any ideas about how to improve ICML, drop me a message (or just respond to this tweet).

John Langford

@JohnCLangford

4 years

Some decisions for ICML from the board: ICML General Chairs: 2022: Kamalika Chaudhuri @kamalikac 2023: Andreas Krause @arkrause ICML 2022 Program Chairs: Csaba Szepesvari @CsabaSzepesvari , Le Song @dasongle , and Stefanie Jegelka (maybe @StefanieJegelka )

1

12

179

17

2

175

Csaba Szepesvari

@CsabaSzepesvari

4 years

Friends: I am looking for theory oriented postdocs in RL (with past theory experience). I appreciate if you spread the word.

1

86

162

Csaba Szepesvari

@CsabaSzepesvari

4 years

Just for counterbalancing, hats off to those reviewers who are still doing a great job! I know that you are out there and while your numbers could be diminishing, we need you to keep doing what you do (post inspired by reading actual good reviews doing my editorial job).

4

5

162

Csaba Szepesvari

@CsabaSzepesvari

4 years

Advice for future reviews: An important question to ask when figuring out whether to recommend accept or reject is "How difficult it is to fix the issues I found?" If very difficult, the paper can't be saved. If not too difficult, there is no reason to reject the paper.

5

10

144

Csaba Szepesvari

@CsabaSzepesvari

4 years

Broader impact predictions back in the day.

Fermat's Library

@fermatslibrary

4 years

Heinrich Hertz after proving the existence of radio waves stated that "it's of no use whatsoever" and regarding the applications of the discovery: "Nothing, I guess"

31

487

3K

1

13

145

Csaba Szepesvari

@CsabaSzepesvari

8 months

Our department is hiring theoreticians working on ML! If you are on the job market for faculty positions and have a strong track record in theory, this may be your dream job! Why apply? Read on.. 1/x

4

27

115

Csaba Szepesvari

@CsabaSzepesvari

4 years

This sounded like a crazy idea two weeks ago, but here we go! @RLtheory is the account to follow! Thanks for the speakers who already accepted our invitations! I hope the community will like this series!

Gergely Neu

@neu_rips

4 years

excited to announce a new series of virtual seminars on ~~~REINFORCEMENT LEARNING THEORY~~~ we've set this up with @CiaraPikeBurke and @CsabaSzepesvari to keep track of all the advances of this fast-paced field. hope others will also find it useful!

5

75

306

5

26

116

Csaba Szepesvari

@CsabaSzepesvari

5 years

I have a duty to spread the truth: "Don't worry about the overall importance of the problem; work on it if it looks interesting. I think there's a sufficient correlation between interest and importance. — David Blackwell" And remember:

0

16

113

Csaba Szepesvari

@CsabaSzepesvari

3 years

For whatever it's worth, I am offering a mentoring session at #AISTATS on Wednesday, April 14, 2021 18:30 MDT. All are welcome!

3

13

109

Csaba Szepesvari

@CsabaSzepesvari

6 years

Please share: The newly created "Foundations team" of @DeepMindAI have openings for research scientists with strong theoretical background, and an unstoppable interest in pushing the boundaries of AI and machine learning. PM me if you are interested. #ICML2018

3

37

107

Csaba Szepesvari

@CsabaSzepesvari

4 years

Just in case the travel restrictions would last until July, preorder our book now on Amazon:

4

8

106

Csaba Szepesvari

@CsabaSzepesvari

5 years

Bandit blog revived! Yours truly and Tor Lattimore presents everything you wanted to know about "First order bounds for k-armed adversarial bandits"!

First order bounds for k-armed adversarial bandits

To revive the content on this blog a little we have decided to highlight some of the new topics covered in the book that we are excited about and that were not previously covered in the blog. In th…

banditalgs.com

1

21

103

Csaba Szepesvari

@CsabaSzepesvari

4 years

After creating a new homepage, I discovered, I used to have a blog. Since I already had it, why not add a new post? Here we go:

3

15

101

Csaba Szepesvari

@CsabaSzepesvari

4 years

Tomorrow we will have Martha White! She will talk about "Policy Gradient Methods as Approximate Policy Iteration: Advantages and Open Questions". Talks open to anyone! Join here:

Amii

@AmiiThinks

4 years

The @rlai_lab Tea Time Talks return! Hosted by Amii’s Chief Scientific Advisory Dr. Richard S. Sutton, the 20-minute talks are delivered by students, faculty and guests, and range from ideas starting to take root to finished projects. #AI #ML #RL

0

2

14

2

19

91

Csaba Szepesvari

@CsabaSzepesvari

1 year

RL Theory Seminars are back! First talk, Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality by Ying Jin!

2

20

92

Csaba Szepesvari

@CsabaSzepesvari

4 years

@roydanroy Of course, can't compete with Dan, but I am also still looking for postdocs -- right down in Edmonton, driving distance to the rockies. Awesome hikes, climbs, kayaking, .. + I can promise interesting RL theory problems and a fast paced environment:)

5

10

90

Csaba Szepesvari

@CsabaSzepesvari

3 months

Venting. Reviewer: The paper is bad because of X, Y and Z. Rebuttal: You are wrong on X, Y and Z + detailed explanation. Reviewer: I maintain my score. The paper is bad (no explanation given). How is this ever an acceptable behavior? Why does a reviewer think this is fine?

10

2

92

Csaba Szepesvari

@CsabaSzepesvari

3 years

@peter_richtarik 's recent post gave me this idea: As next year yours truly will be partially responsible for reviewing quality at ICML, and you just got your first round of reviews back from named conference, vent for me. I promise to listen.

27

9

92

Csaba Szepesvari

@CsabaSzepesvari

11 months

@jasondeanlee He skipped this. Vitanyi & Li's book, or article below gives you the answer. In one formulation, see attached pic, one has that maximum likelihood for a large large class of distributions over one-way infinite sequences is implemented by Kolm-compression

3

4

88

Csaba Szepesvari

@CsabaSzepesvari

4 years

This is a mini water treatment plant that will be used to optimize the water treatment process using reinforcement learning. It's really awesome to see this happening in Alberta!

ISL Adapt

@ISLadapt

4 years

We are excited to advance the science of water treatment and AI with our partners @rlai_lab @UAlberta @AmiiThinks @DraytonValley and @ISLengineering ! 💧💻 Many thanks to our supporters @ABInnovates @NSERC_CRSNG for this #aiforgood opportunity!

0

4

31

1

5

90

Csaba Szepesvari

@CsabaSzepesvari

4 years

The third and final workshop in the RL theory program starts tomorrow. The topic is batch RL (sorry @jacobmbuckman ) and simulation-based optimization. All are welcome! The workshop will stream on Youtube. To join on zoom, you need to register.

2

15

90

Csaba Szepesvari

@CsabaSzepesvari

2 years

Offline RL is cool, but will it ever work? Next Tuesday, Yunzong Xu (MIT) will put the nail into the coffin of offline RL by showing us the proof of the correctness of a 2019 conjecture by Chen and Jiang that predicted bad bad news for offline RL.

0

9

89

Csaba Szepesvari

@CsabaSzepesvari

3 years

While some moments are pretty bleak (CMT mishaps), it warms my heart to see how many people care about @icmlconf . Thank you reviewers and other program committee members and I am looking forward to working with you in the coming year.

0

88

Csaba Szepesvari

@CsabaSzepesvari

5 years

#NevernendingReviewingSeason What makes a review good? (1) Objective; (2) helps the decision maker; (3) helps the authors; (4) polite. Constructive criticism is the expression. Constructive, not destructive.

2

12

84

Csaba Szepesvari

@CsabaSzepesvari

5 years

Happy to report that it seems chances are really high that we'll record and will post the lectures online. I'll test the tech on Friday to see whether it is able to track me as I zip from board to board.

2

5

83

Csaba Szepesvari

@CsabaSzepesvari

5 years

To the attention of friends of #ReinforcementLearning : After all those years, finally, our home, @rlai_lab from @UofAResearch is live on twitter.

Reinforcement Learning and Artificial Intelligence

@rlai_lab

5 years

Hello World! This account will share the latest news and updates about what the Reinforcement Learning and Artificial Intelligence (RLAI) Lab at the University of Alberta is up to. Let’s figure out intelligence!

2

27

152

3

6

81

Csaba Szepesvari

@CsabaSzepesvari

4 years

With some glitches, but we are done with the first of the series. Never knew so many people care about RL theory, yay! Great talk Chi Jin! Awesome audience! Next one can only be smoother:) Sign up here if you have not signed up yet:

3

6

78

Csaba Szepesvari

@CsabaSzepesvari

3 months

@thegautamkamath I grind for my students. And for the love of science and knowledge:) It's not rational, but I can't help it. I am not sure whether this sound honest, but I really never cared about anything but my students and the joy I get from learning new things and connecting to others

2

74

Csaba Szepesvari

@CsabaSzepesvari

3 years

Unsolicited student email: "This is my second reminder. I believe your research team is one of the best positions for me to continue my studies, I would be thankful if you could respond to my initial email." (The student never carefully checked my homepage.) Go figure!

5

2

73

Csaba Szepesvari

@CsabaSzepesvari

3 months

We often hear about the theory-practice gap. At this workshop we will take a thorough look at this. Is there a gap? What is the nature of the gap? Who made it? Is it good to have the gap? If not, how to close it? I think this is super important for the healthiness of the field!

ARLET

@arlet_workshop

3 months

🧵 Thrilled to announce the #ICML RL workshop 'Aligning RL Experimentalists and Theorists'! We will have several talks and a panel delivered by a super lineup of speakers: @white_martha , @ShamKakade6 , @yayitsamyzhang , Dylan Foster, Niao He, @svlevine , and @MengdiWang10 . 1/3

1

14

65

1

11

72

Csaba Szepesvari

@CsabaSzepesvari

4 years

.. and we will finish every day with a bonus talk which brings in the perspective of some particular application. For registration (no fees, just to receive the zoom link) and further details, visit the bootcamp website.

Theory of Reinforcement Learning Boot Camp

Because of COVID-19, we cannot schedule in-person events on the Berkeley campus through December 2020. This workshop will take place online. It will be open to the public for online participation....

simons.berkeley.edu

0

7

72

Csaba Szepesvari

@CsabaSzepesvari

4 years

Tired of starring at the pages of the free pdf at ? Want to smell it, flip the pages? Visit the @CambridgeUP booth at #NeurIPS2020 or just head directly to for an incredible 30% discount! #BanditBook

1

4

72

Csaba Szepesvari

@CsabaSzepesvari

4 years

To the attention of grad students. New Mentor Session scheduled Who? Csaba Szepesvari When? Thu, 10 Dec 2020 18:00:00 GMT Description: phd advise and virtual cookies Details about event:

1

11

72

Csaba Szepesvari

@CsabaSzepesvari

3 years

More awesome RL content; Reinforcement Learning, Bit by Bit by Xiuyuan (Lucy) Lu (DeepMind) Date / Time: Lecture 1: 9:30 AM - 10:30 AM (PT), April 20th (Tuesday) Lecture 2: 10:30 AM - 11:30 AM (PT), April 23rd (Friday) (Stanford RL forum!)

2

17

69

Csaba Szepesvari

@CsabaSzepesvari

4 years

It's here! This weekend, a fully online, pre-ICML, soothing "RL for real life" 2x3 hours virtual conference! Fantastic invited speakers & panel, moderators. Prepare and submit your questions in advance!!! All credit should go to my incredible coorganizers.

Yuxi Li

@yuxili99

4 years

Welcome to RL for Real Life Virtual Conference, June 27-28. , co-organized with @gabepsilon , Alborz Geramifard, Omer Gottesman, @LihongLi20 , Anusha Nagabandi, Zhiwei (Tony) Qin, @CsabaSzepesvari With two panels on general RL and RL+healthcare topics.

1

16

47

0

9

68

Csaba Szepesvari

@CsabaSzepesvari

5 years

Bandits going strong at UofA! 32 seats in the classroom all taken on the day when they became available.

3

1

69

Csaba Szepesvari

@CsabaSzepesvari

3 months

Now that the #COLT2024 decisions are out, I'd like to announce a workshop that we are organize that will happen just before COLT. The workshop theme is RL Theory. All are welcome! Details here: Please spread the word!

2

20

68

Csaba Szepesvari

@CsabaSzepesvari

5 years

Illustration, slightly edited to protect anonymity: "paper feels incremental ..putting together well-known ideas in a straightforward manner." What can I say? Previous work missed even these. And straightforward once done. Reviewer also admitted not reading the proof. Great job?!

Scott Niekum

@scottniekum

5 years

ICML review rant: The ML community is screwed if we keep insisting that scientific inquiry about known algorithms isn't "novel" (even if it leads to major new capabilities / SoTA), but that engineering yet another new, incremental algorithm that we know nothing about is great.

24

201

1K

0

6

66

Csaba Szepesvari

@CsabaSzepesvari

4 years

Any tips on what to write as a broader impact statement for theory papers to be sent to NeuroIPS? #powerofmath #poweroftheory

10

3

65

Csaba Szepesvari

@CsabaSzepesvari

2 years

1/x Our department has 2 Assistant Professor positions in AI/ML and one in Theoretical Computing Science. Here are the job ads. Our department is a super fun, collegial place. Ads:

1

16

63

Csaba Szepesvari

@CsabaSzepesvari

3 years

The moment when the hope that review quality can be improved appears to be fading into the void.. But: #NeverGiveUp #ICML2022

5

3

64

Csaba Szepesvari

@CsabaSzepesvari

5 years

New post on the inescapable appeal of Bayesian methods in the context of adversarial bandits. Or how Bayesian methods can help the agnostic. Hint: Minimax theorems open wormhole between distant corners of the universe.

Bayesian/minimax duality for adversarial bandits

The Bayesian approach to learning starts by choosing a prior probability distribution over the unknown parameters of the world. Then, as the learner makes observation, the prior is updated using Ba…

banditalgs.com

0

17

63

Csaba Szepesvari

@CsabaSzepesvari

2 years

One day before reviews are due for Phase 1 at #ICML2022 , 50% of the reviewers have submitted zero reviews. The review load for this phase is <=2 papers and there were 19 days for writing these <=2 reviews. What percentage of reviewers will submit all of their reviews in time?

50-69

166

70-89

254

90-100

114

just relax Csaba

407

14

2

61

Csaba Szepesvari

@CsabaSzepesvari

4 years

Asking for a friend: A student wants to pick up intuition about Bregman divergences and their use in convex optimization/online learning. There are lots of excellent texts out there, but is there one that is strong on providing intuition? 1/x

5

4

61

Csaba Szepesvari

@CsabaSzepesvari

3 years

"What information to seek, how to seek that information, and what information to retain?" What else is there to know? A principled approach to this problem will be presented tomorrow by DeepMind's Xiuyuan Lu. Last RL Theory Seminar before the summer break!

0

7

60

Csaba Szepesvari

@CsabaSzepesvari

5 years

New favourite quote:)

Computer Science

@CompSciFact

10 years

'Just because you've implemented something doesn't mean you understand it.' -- Brian Cantwell Smith

5

169

126

0

2

59

Csaba Szepesvari

@CsabaSzepesvari

3 years

Super proud of Tor and Andras! It's a delight to have them in the team! The paper can be access from here:

Improved Regret for Zeroth-Order Stochastic Convex Bandits

We present an efficient algorithm for stochastic bandit convex optimisation with no assumptions on smoothness or strong convexity and for which the regret is...

proceedings.mlr.press

Google DeepMind

@GoogleDeepMind

3 years

Huge congratulations to Tor and Andras! Their paper “Improved Regret for Zeroth-Order Stochastic Convex Bandits” was recently recognised for a best paper runner-up award by the flagship learning theory conference, COLT: 1/

7

55

322

1

4

57

Csaba Szepesvari

@CsabaSzepesvari

3 years

Exactly what the program committee needs to know! Thanks Mike! :-D

2

0

58

Csaba Szepesvari

@CsabaSzepesvari

2 years

I got many good comments, suggestions and I have significantly expanded the list. I am quite pleased with the result, RL seems to be doing quite well. Very nice applications and more in the works! Thanks everyone!

Csaba Szepesvari

@CsabaSzepesvari

2 years

This semester I'll teach an undergraduate "intro to RL" course at the UofA. For the first lecture, I collected some exciting, recent, impactful applications of RL. Link to the relevant slides: I thought this may be worthwhile to share.

27

100

760

0

5

57

Csaba Szepesvari

@CsabaSzepesvari

3 years

I am delighted to invite everyone tomorrow for the first RL Theory Seminar talk of 2021 by Andrea Zanette. Andrea will explain to us why and how batch reinforcement learning can be much harder than online RL. For details check out

0

11

56

Csaba Szepesvari

@CsabaSzepesvari

4 years

NeurIPS experience: Does anyone enjoy moving around a silly avatar with the speed of a snail in oversized rooms to get to specific posters?

9

0

56

Csaba Szepesvari

@CsabaSzepesvari

1 year

Wow, I just discovered this treat: Moritz Hardt and Ben Recht: "Patterns, predictions, and actions". I will surely recommend this for my students or whoever starts with this subject! Very cool. Thank you @beenwrekt !

3

2

56

Csaba Szepesvari

@CsabaSzepesvari

2 years

My typical day..

Peyman Milanfar

@docmilanfar

2 years

On the first page of my (1993) PhD Thesis. Still true.

13

111

1K

0

54

Csaba Szepesvari

@CsabaSzepesvari

4 years

Improper learning? Who would do that? Is not that bad by definition? Not even proper? Come to our seminar to find out what Max Simchowitz thinks about improper learning for non-stochastic control!

RL Theory Virtual Seminars

@RLtheory

4 years

Our next talk: 06/30: Max Simchowitz (UC Berkeley) "Improper Learning for Non-Stochastic Control" For details, please see the website:

0

9

29

1

8

52

Csaba Szepesvari

@CsabaSzepesvari

4 years

@thegautamkamath When I was a PhD student, a few times I was quite discourage by some reviews. SIAM J. Opt told me in 2000 that exploration in finite MDPs is old-fashioned:) Soon enough though, I learned not to pay attention to failures or rejections and focused on positives. ==>

2

1

51

Csaba Szepesvari

@CsabaSzepesvari

5 years

Cool universality argument for SGD with FF neuralnets: Take any learning alg A for learning Boolean functions without noise from a sample of size n. Then there is a NN architecture G(A,n) such that SGD+G(A,n)+Any reasonable loss with sequential processing "implements" A.

Dimitris Papailiopoulos

@DimitrisPapail

5 years

A tour de force by Abbe & Sandon, "Any function distribution that can be learned from samples in poly-time can also be learned by a poly-size neural net trained with SGD on a poly-time initialization with poly-steps" + "[this] does not hold for GD"

1

20

101

1

7

48

Csaba Szepesvari

@CsabaSzepesvari

4 years

@neu_rips being featured in @marcgbellemare 's talk (awesome talk Marc, by the way!! congrats again for all those involved!!). But Twitter does work, eh?

1

0

49

Csaba Szepesvari

@CsabaSzepesvari

7 years

I am very excited to announce that I am joining Deepmind, taking a two year leave. I will miss people in Edmonton, but you should visit!

2

4

49

Csaba Szepesvari

@CsabaSzepesvari

4 years

@beenwrekt You mean no progress? Nah.. Btw, I like the style of some of these old papers that describe some unbaked idea for what they are, not trying to oversell them, making them look bigger than what they are (eg a heuristic is a heuristic..). Papers of this type won't make it today.

2

1

48

Csaba Szepesvari

@CsabaSzepesvari

4 years

One last time; official ad is out.

Alberta Machine Intelligence Institute | AI for good and for all

As one of Canada’s top AI centres, we bridge world-leading research and industry adoption.

www.amii.ca

0

16

48

Csaba Szepesvari

@CsabaSzepesvari

4 years

You must see this, new webpage! ..after the service I have previously used to compile my publications-page stopped working (dire times..), put together in a day with the help of and

4

0

47

Csaba Szepesvari

@CsabaSzepesvari

4 years

@yisongyue Research is done in many small steps. You may think something goes unnoticed, but it may have influenced someone, who gets a new idea, writes another small thing. This leads to the next thing. Wait 20 years, the many little things add up and a much cleaner, deeper ==>

1

2

47

Csaba Szepesvari

@CsabaSzepesvari

4 years

..and next week we take a break to let the "Deep RL meets theory" workshop to take the stage! Check out the program at: Do not forget to put all these events in your calendar! The most convenient way to do this is to go here:

RL Theory Virtual Seminars

@RLtheory

4 years

We are glad to announce that we are now officially part of the "Theory of RL" program at the Simons Institute! See our updated schedule that now includes two new speakers and the RL theory workshops at @SimonsInstitute .

1

5

66

0

8

46

Csaba Szepesvari

@CsabaSzepesvari

1 year

Aaditya Ramdas (not on twitter; good for him) is coediting a special issue for MLJ on "Conformal Prediction and Distribution-Free Uncertainty Quantification". Deadline Nov 30. Consider submitting if you have something! I will be looking forward to see what comes out of this!

2

5

43

Csaba Szepesvari

@CsabaSzepesvari

3 years

A frequent issue in batch RL is that evaluation methods are biased and the size of the bias is unknown. Come and join us tomorrow to learn from Yi Su about how to build optimizers that do almost as well as if the bias was known! For details:

1

11

45

Csaba Szepesvari

@CsabaSzepesvari

4 years

@Maggiemakar @zacharylipton For those who like books, I also love the Anthony-Bartlett book While it is quite short, it explains soo much about how SLT has evolved over the years!

0

6

44

Csaba Szepesvari

@CsabaSzepesvari

1 year

RL Theory Seminars is pleased to present a talk by Yujia Jin (Stanford) tomorrow on "VOQL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation". For further details, check out

RL theory seminars - Next Seminar

Tuesday 21st MAY 2024, 5PM UTC Speaker: Ruiqi Zhang (UC Berkeley) Title: Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data Paper: https://arxiv.org/abs/2307.0...

sites.google.com

0

6

44

Csaba Szepesvari

@CsabaSzepesvari

3 years

Representation learning and exploration in RL together? Aditya Modi got you covered! Details? Well, you should come to the next talk! For details visit:

0

2

43

Csaba Szepesvari

@CsabaSzepesvari

1 year

Very happy for this! What a spectacular future for @UAlberta / @UAlbertaCS and @AmiiThinks !

Nathan Sturtevant

@nathansttt

1 year

A packed house to hear @BFlanaganUofA from the @UAlberta and @AmiiThinks announce that 20 new faculty will be hired in AI across campus in the next 3 years, with 5 of these positions in CS.

1

14

65

1

0

43

Csaba Szepesvari

@CsabaSzepesvari

3 years

Advice for people thinking of registering an email address at CMT or other similar reviewing systems: Register an email that is NOT associated with your school/workplace. School and workplace change. Then you will end up with multiple identities, which is not what you want:)

2

1

43

Csaba Szepesvari

@CsabaSzepesvari

11 months

Proud of my colleagues, winning an IJCAI distinguished paper award! Go @GoogleDeepMind @UAlbertaCS @AmiiThinks !

Marcus Hutter

@mhutter42

11 months

What do you get when you cross modern Machine Learning with good old-fashioned Search? An IJCAI distinguished paper award 🙂 for Levin Tree Search with Context Models:

6

32

183

1

2

41

Csaba Szepesvari

@CsabaSzepesvari

2 years

@pcastr SOMs are an awesome example of how curiosity driven research looks like. Neither neuroscience, nor solving any real problem. Yet, one can still write books about SOMs, think about them in various ways, etc. Sg to remember when judging relevance while reviewing!

2

1

42

Csaba Szepesvari

@CsabaSzepesvari

2 years

I hope everyone enjoyed ICLR. As promised, RL Theory seminars are back and we are super lucky to have Kwang-Sung Jun fixing our bad ideas about how to use Boltzmann exploration via the help of the mysterious "Maillard sampling" idea. Intrigued? Check out

0

8

42

Csaba Szepesvari

@CsabaSzepesvari

4 years

Why do we use softmax to represent policies? Could we use some other "transfer" function? Which one? Pros/cons? Come to see our posters to hear about the gravitational pull of softmax and how physicist are always right! I can't guarantee to be up at the time of the oral though:)

Reinforcement Learning and Artificial Intelligence

@rlai_lab

4 years

Come hear Jincheng Mei, Chenjun Xiao, @daibond_alpha , @LihongLi20 , @CsabaSzepesvari , Dale Schuurmans talk about "Escaping the Gravitational Pull of Softmax" on Tuesday. Oral: 0715–0730 MST Poster: 10–12pm MST Link: #NeurIPS2020

0

1

16

1

4

42

Csaba Szepesvari

@CsabaSzepesvari

4 years

Ladies and gentlemen! We are delighted to give you OPPO, optimistic policy optimization (very much related to the previous talk by the way!) to achieve efficient and effective exploration with linear function approximation in finite horizon MDPs as presented by Zhuoran Yang!

RL Theory Virtual Seminars

@RLtheory

4 years

Our next talk: 09/22: Zhuoran Yang (Princeton) "Provably Efficient Exploration in Policy Optimization" For details, please see the website:

0

8

42

0

5

42

Csaba Szepesvari

@CsabaSzepesvari

4 years

Our chance to stay positive during these dire times is to attend Simon's seminar tomorrow where I hope we learn that despite all other signs RL is not much harder than bandits. Long live RL, long live bandits!

RL Theory Virtual Seminars

@RLtheory

4 years

Our next talk: 11/24: Simon S. Du (University of Washington) "Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon" For details, please see the website:

0

11

72

1

3

39

Csaba Szepesvari

@CsabaSzepesvari

4 years

Huge congratulations to my colleagues at @DeepMind ! This is a really awesome achievement!

Google DeepMind

@GoogleDeepMind

4 years

In a major scientific breakthrough, the latest version of #AlphaFold has been recognised as a solution to one of biology's grand challenges - the “protein folding problem”. It was validated today at #CASP14 , the biennial Critical Assessment of protein Structure Prediction (1/3)

134

3K

10K

0

41

Csaba Szepesvari

@CsabaSzepesvari

4 years

Please join us and Matthieu to hear about breaking news about how averaging and regularization work together to make your RL algorithms go faster!

RL Theory Virtual Seminars

@RLtheory

4 years

Reminder: this talk is coming up tomorrow! ***Note that the talk starts at 4PM UTC, one hour earlier than our regular time slot*** Public YouTube link: Sign up for the talk on Google Meet:

2

8

28

0

3

41

Csaba Szepesvari

@CsabaSzepesvari

4 years

Gentle reminder, this talk is happening tomorrow! I hope to see many of you there:)

RL Theory Virtual Seminars

@RLtheory

4 years

Our next talk: 06/16: Niao He (UIUC) "A Unified Switching System Perspective and O.D.E. Analysis of Q-Learning Algorithms" For details, please see the website:

0

9

41

0

9

40

Csaba Szepesvari

@CsabaSzepesvari

4 years

It is a great pleasure to have Fei Feng from UCLA speaking at our next seminar. Join us to learn about how to combine RL and unsupervised learning and keep everything provably efficient!

RL Theory Virtual Seminars

@RLtheory

4 years

Our next talk: 07/07: Fei Feng (UCLA) "Provably Efficient Exploration for RL with Unsupervised Learning" For details, please see the website:

0

17

75

0

5

40

Csaba Szepesvari

@CsabaSzepesvari

4 years

Join us on Tuesday to hear from Mengdi about the latest and greatest lower and upper bounds in off-policy evaluation with linear function approximation!

RL Theory Virtual Seminars

@RLtheory

4 years

Our next talk: 08/04: Mengdi Wang (Princeton / DeepMind) "Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation" For details, please see the website:

1

14

55

0

6

40

Csaba Szepesvari

@CsabaSzepesvari

2 years

Huge improvements for the sample complexity of RL for representation learning in low-rank (linear) MDPs! How? Why? Really? Come check out the seminar of Masatoshi Uehara tomorrow! For details follow this link:

RL theory seminars - Next Seminar

Tuesday 21st MAY 2024, 5PM UTC Speaker: Ruiqi Zhang (UC Berkeley) Title: Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data Paper: https://arxiv.org/abs/2307.0...

sites.google.com

0

2

39

Csaba Szepesvari

@CsabaSzepesvari

4 years

We are delighted to have Shie give the next RL Theory Virtual Seminar. I hope to see many of you online at the seminar.

RL Theory Virtual Seminars

@RLtheory

4 years

Our next talk: 06/09: Shie Mannor (Technion) "Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs" For details, please see the website:

1

13

63

0

4

40

Csaba Szepesvari

@CsabaSzepesvari

2 years

@ylecun @nanjiang_cs Perhaps better to focus on what needs to be done than on who is doing it or whether we call it RL or anything else. But I am glad you recognize that some sort of planning with models (or not?) will be needed! We are on the same page with this one. And Merry Christmas!! 2/2

2

1

39

Csaba Szepesvari

@CsabaSzepesvari

3 years

Pessimism is back on stage! Join the RL Theory Seminars tomorrow to hear from Paria Rashidinejad about *more reasons* of why being pessimistic in the batch RL setting is actually good. Fast rates? Adaptive optimality? Pessimism delivers!

0

5

38

Csaba Szepesvari

@CsabaSzepesvari

5 years

To the attention of strong final year PhD students, junior faculty in CS/Theory/..! Excellent opportunity to stay at Berkeley while the 'Theory of RL' and other programs are happening. Please pass it along to relevant candidates.

0

5

38

Csaba Szepesvari

@CsabaSzepesvari

4 years

Yours truly talks RL.. Thanks @TalkRLPodcast /Robin for having me!!

TalkRL Podcast

@TalkRLPodcast

4 years

Episode 10 @CsabaSzepesvari of DeepMind shares his views on Bandits, Adversaries, PUCT in AlphaGo / AlphaZero / MuZero, AGI and RL, what is timeless, and more!

0

10

48

0

2

38

Csaba Szepesvari

@CsabaSzepesvari

3 years

We are glad to welcome Tadashi! Btw, I still have some openings for postdocs. PM me if you are interested in theoretical foundations of RL, and, more broadly decision making (stay tuned!), or you know someone who could be good!

1

5

36

Csaba Szepesvari

@CsabaSzepesvari

6 years

Yep, good one! We could do more of this: "AI as a field is starving for a few carefully documented failures. [..] I can learn more by just being told why a technique won't work than by being made to read between the lines."

Shakir Mohamed

@shakir_za

6 years

#SundayClassicPaper 📜: McDermott (1976) 'Artificial Intelligence Meets Natural Stupidity'. As we critique our own field, it is useful to see what recurs from the critique of the past. The critique on 'Wishful Mnemomics' seems still relevant.

2

60

191

1

4

36

Csaba Szepesvari

@CsabaSzepesvari

4 years

@MarlosCMachado Great for them! While international universities are great, we should not forget that local universities can also be great. I did all my studies in Hungary and I don't regret this the tiniest a bit! I met wonderful, dedicated, caring, knowledgable profs there, which meant a lot!

3

0

36

Csaba Szepesvari

@CsabaSzepesvari

3 years

In RL being optimistic is often the "right thing" when learning interactively. But what happens in the batch case? Perhaps pessimism is then the best? Come join us next Tuesday to learn the answer and more from Ying Jin! For details check out

0

9

36

Csaba Szepesvari

@CsabaSzepesvari

3 years

Exploration! The hunt for the "right" characterization of sample efficiently learnable RL problem classes is not over yet! Enter the Bellman eluder dimension, which subsumes all that came before, as Quinghua Liu will kindly explain to all of us who care.

0

6

35

Csaba Szepesvari

@CsabaSzepesvari

4 years

@RandomlyWalking my students!

1

36