"If there is not folly in the world, then the world itself is folly. You must understand that mistakes are not always regrets." - Paul Tobin, Bandette🤠
This semester I'll teach an undergraduate "intro to RL" course at the UofA. For the first lecture, I collected some exciting, recent, impactful applications of RL. Link to the relevant slides:
I thought this may be worthwhile to share.
Yours truly and his coauthor Tor Lattimore happily present the near-final draft of their upcoming bandit book at The pdf will stay free. In this phase we welcome reader comments. The book will be printed by
#CambrideUniversityPress
. Please share:)
Interested in hearing about the theoretical foundations of RL from a multidisciplinary perspective (CS, control, stats, OR)? If so, join us at the (all virtual) RL Theory Bootcamp at the Simons Institute next week. Lectures in the morning and the afternoon ==>
Glad to announce the "Theory of RL" program at the Simons Institute in the Fall of 2020. DM me if you are interested!
@SebastienBubeck
@EmmaBrunskill
Alan Malek
@SeanMeyn
Ambuj Tewari and Mengdi Wang are my awesome coorganizers.
Is RL used in real applications? If so, how and where? And if not, why not and how can this be fixed? Join our excellent panelists and speakers at the half-day RL2 workshop organized at
@icmlconf
or submit a paper to present your views.
I feel very much honoured to be selected for this role. To make the best of this job, hive mind of ML people on twitter, if you have any ideas about how to improve ICML, drop me a message (or just respond to this tweet).
Just for counterbalancing, hats off to those reviewers who are still doing a great job! I know that you are out there and while your numbers could be diminishing, we need you to keep doing what you do (post inspired by reading actual good reviews doing my editorial job).
Advice for future reviews: An important question to ask when figuring out whether to recommend accept or reject is "How difficult it is to fix the issues I found?" If very difficult, the paper can't be saved. If not too difficult, there is no reason to reject the paper.
Heinrich Hertz after proving the existence of radio waves stated that "it's of no use whatsoever" and regarding the applications of the discovery: "Nothing, I guess"
Our department is hiring theoreticians working on ML! If you are on the job market for faculty positions and have a strong track record in theory, this may be your dream job! Why apply? Read on.. 1/x
This sounded like a crazy idea two weeks ago, but here we go!
@RLtheory
is the account to follow! Thanks for the speakers who already accepted our invitations! I hope the community will like this series!
excited to announce a new series of virtual seminars on
~~~REINFORCEMENT LEARNING THEORY~~~
we've set this up with
@CiaraPikeBurke
and
@CsabaSzepesvari
to keep track of all the advances of this fast-paced field. hope others will also find it useful!
I have a duty to spread the truth:
"Don't worry about the overall importance of the problem; work on it if it looks interesting. I think there's a sufficient correlation between interest and importance.
— David Blackwell"
And remember:
Please share: The newly created "Foundations team" of
@DeepMindAI
have openings for research scientists with strong theoretical background, and an unstoppable interest in pushing the boundaries of AI and machine learning. PM me if you are interested.
#ICML2018
Tomorrow we will have Martha White! She will talk about "Policy Gradient Methods as Approximate Policy Iteration: Advantages and Open Questions". Talks open to anyone! Join here:
The
@rlai_lab
Tea Time Talks return! Hosted by Amii’s Chief Scientific Advisory Dr. Richard S. Sutton, the 20-minute talks are delivered by students, faculty and guests, and range from ideas starting to take root to finished projects.
#AI
#ML
#RL
@roydanroy
Of course, can't compete with Dan, but I am also still looking for postdocs -- right down in Edmonton, driving distance to the rockies. Awesome hikes, climbs, kayaking, .. + I can promise interesting RL theory problems and a fast paced environment:)
Venting.
Reviewer: The paper is bad because of X, Y and Z.
Rebuttal: You are wrong on X, Y and Z + detailed explanation.
Reviewer: I maintain my score. The paper is bad (no explanation given).
How is this ever an acceptable behavior? Why does a reviewer think this is fine?
@peter_richtarik
's recent post gave me this idea: As next year yours truly will be partially responsible for reviewing quality at ICML, and you just got your first round of reviews back from named conference, vent for me. I promise to listen.
@jasondeanlee
He skipped this. Vitanyi & Li's book, or article below gives you the answer. In one formulation, see attached pic, one has that maximum likelihood for a large large class of distributions over one-way infinite sequences is implemented by Kolm-compression
This is a mini water treatment plant that will be used to optimize the water treatment process using reinforcement learning. It's really awesome to see this happening in Alberta!
The third and final workshop in the RL theory program starts tomorrow. The topic is batch RL (sorry
@jacobmbuckman
) and simulation-based optimization. All are welcome! The workshop will stream on Youtube. To join on zoom, you need to register.
Offline RL is cool, but will it ever work? Next Tuesday, Yunzong Xu (MIT) will put the nail into the coffin of offline RL by showing us the proof of the correctness of a 2019 conjecture by Chen and Jiang that predicted bad bad news for offline RL.
While some moments are pretty bleak (CMT mishaps), it warms my heart to see how many people care about
@icmlconf
. Thank you reviewers and other program committee members and I am looking forward to working with you in the coming year.
#NevernendingReviewingSeason
What makes a review good? (1) Objective; (2) helps the decision maker; (3) helps the authors; (4) polite. Constructive criticism is the expression. Constructive, not destructive.
Happy to report that it seems chances are really high that we'll record and will post the lectures online. I'll test the tech on Friday to see whether it is able to track me as I zip from board to board.
Hello World! This account will share the latest news and updates about what the Reinforcement Learning and Artificial Intelligence (RLAI) Lab at the University of Alberta is up to. Let’s figure out intelligence!
With some glitches, but we are done with the first of the series. Never knew so many people care about RL theory, yay! Great talk Chi Jin! Awesome audience! Next one can only be smoother:) Sign up here if you have not signed up yet:
@thegautamkamath
I grind for my students. And for the love of science and knowledge:) It's not rational, but I can't help it. I am not sure whether this sound honest, but I really never cared about anything but my students and the joy I get from learning new things and connecting to others
Unsolicited student email: "This is my second reminder. I believe your research team is one of the best positions for me to continue my studies, I would be thankful if you could respond to my initial email." (The student never carefully checked my homepage.) Go figure!
We often hear about the theory-practice gap. At this workshop we will take a thorough look at this. Is there a gap? What is the nature of the gap? Who made it? Is it good to have the gap? If not, how to close it? I think this is super important for the healthiness of the field!
.. and we will finish every day with a bonus talk which brings in the perspective of some particular application. For registration (no fees, just to receive the zoom link) and further details, visit the bootcamp website.
Tired of starring at the pages of the free pdf at ? Want to smell it, flip the pages? Visit the
@CambridgeUP
booth at
#NeurIPS2020
or just head directly to for an incredible 30% discount!
#BanditBook
To the attention of grad students.
New Mentor Session scheduled
Who? Csaba Szepesvari
When? Thu, 10 Dec 2020 18:00:00 GMT
Description: phd advise and virtual cookies
Details about event:
More awesome RL content; Reinforcement Learning, Bit by Bit by Xiuyuan (Lucy) Lu (DeepMind)
Date / Time:
Lecture 1: 9:30 AM - 10:30 AM (PT), April 20th (Tuesday)
Lecture 2: 10:30 AM - 11:30 AM (PT), April 23rd (Friday)
(Stanford RL forum!)
It's here! This weekend, a fully online, pre-ICML, soothing "RL for real life" 2x3 hours virtual conference! Fantastic invited speakers & panel, moderators. Prepare and submit your questions in advance!!! All credit should go to my incredible coorganizers.
Welcome to RL for Real Life Virtual Conference, June 27-28. , co-organized with
@gabepsilon
, Alborz Geramifard, Omer Gottesman,
@LihongLi20
, Anusha Nagabandi, Zhiwei (Tony) Qin,
@CsabaSzepesvari
With two panels on general RL and RL+healthcare topics.
Now that the
#COLT2024
decisions are out, I'd like to announce a workshop that we are organize that will happen just before COLT. The workshop theme is RL Theory. All are welcome! Details here: Please spread the word!
Illustration, slightly edited to protect anonymity: "paper feels incremental ..putting together well-known ideas in a straightforward manner." What can I say? Previous work missed even these. And straightforward once done. Reviewer also admitted not reading the proof. Great job?!
ICML review rant: The ML community is screwed if we keep insisting that scientific inquiry about known algorithms isn't "novel" (even if it leads to major new capabilities / SoTA), but that engineering yet another new, incremental algorithm that we know nothing about is great.
1/x Our department has 2 Assistant Professor positions in AI/ML and one in Theoretical Computing Science. Here are the job ads. Our department is a super fun, collegial place. Ads:
New post on the inescapable appeal of Bayesian methods in the context of adversarial bandits. Or how Bayesian methods can help the agnostic. Hint: Minimax theorems open wormhole between distant corners of the universe.
One day before reviews are due for Phase 1 at
#ICML2022
, 50% of the reviewers have submitted zero reviews. The review load for this phase is <=2 papers and there were 19 days for writing these <=2 reviews. What percentage of reviewers will submit all of their reviews in time?
Asking for a friend: A student wants to pick up intuition about Bregman divergences and their use in convex optimization/online learning. There are lots of excellent texts out there, but is there one that is strong on providing intuition? 1/x
"What information to seek, how to seek that information, and what information to retain?" What else is there to know? A principled approach to this problem will be presented tomorrow by DeepMind's Xiuyuan Lu. Last RL Theory Seminar before the summer break!
Huge congratulations to Tor and Andras! Their paper “Improved Regret for Zeroth-Order Stochastic Convex Bandits” was recently recognised for a best paper runner-up award by the flagship learning theory conference, COLT: 1/
I got many good comments, suggestions and I have significantly expanded the list. I am quite pleased with the result, RL seems to be doing quite well. Very nice applications and more in the works! Thanks everyone!
This semester I'll teach an undergraduate "intro to RL" course at the UofA. For the first lecture, I collected some exciting, recent, impactful applications of RL. Link to the relevant slides:
I thought this may be worthwhile to share.
I am delighted to invite everyone tomorrow for the first RL Theory Seminar talk of 2021 by Andrea Zanette. Andrea will explain to us why and how batch reinforcement learning can be much harder than online RL. For details check out
Wow, I just discovered this treat:
Moritz Hardt and Ben Recht: "Patterns, predictions, and actions". I will surely recommend this for my students or whoever starts with this subject! Very cool. Thank you
@beenwrekt
!
Improper learning? Who would do that? Is not that bad by definition? Not even proper? Come to our seminar to find out what Max Simchowitz thinks about improper learning for non-stochastic control!
@thegautamkamath
When I was a PhD student, a few times I was quite discourage by some reviews. SIAM J. Opt told me in 2000 that exploration in finite MDPs is old-fashioned:) Soon enough though, I learned not to pay attention to failures or rejections and focused on positives. ==>
Cool universality argument for SGD with FF neuralnets: Take any learning alg A for learning Boolean functions without noise from a sample of size n. Then there is a NN architecture G(A,n) such that SGD+G(A,n)+Any reasonable loss with sequential processing "implements" A.
A tour de force by Abbe & Sandon,
"Any function distribution that can be
learned from samples in poly-time can also be learned by a poly-size neural net trained with
SGD on a poly-time initialization with poly-steps" + "[this] does not hold for GD"
@neu_rips
being featured in
@marcgbellemare
's talk (awesome talk Marc, by the way!! congrats again for all those involved!!). But Twitter does work, eh?
@beenwrekt
You mean no progress? Nah.. Btw, I like the style of some of these old papers that describe some unbaked idea for what they are, not trying to oversell them, making them look bigger than what they are (eg a heuristic is a heuristic..). Papers of this type won't make it today.
You must see this, new webpage! ..after the service I have previously used to compile my publications-page stopped working (dire times..), put together in a day with the help of and
@yisongyue
Research is done in many small steps. You may think something goes unnoticed, but it may have influenced someone, who gets a new idea, writes another small thing. This leads to the next thing. Wait 20 years, the many little things add up and a much cleaner, deeper ==>
..and next week we take a break to let the "Deep RL meets theory" workshop to take the stage! Check out the program at:
Do not forget to put all these events in your calendar! The most convenient way to do this is to go here:
We are glad to announce that we are now officially part of the "Theory of RL" program at the Simons Institute!
See our updated schedule that now includes two new speakers and the RL theory workshops at
@SimonsInstitute
.
Aaditya Ramdas (not on twitter; good for him) is coediting a special issue for MLJ on "Conformal Prediction and Distribution-Free Uncertainty Quantification". Deadline Nov 30. Consider submitting if you have something! I will be looking forward to see what comes out of this!
A frequent issue in batch RL is that evaluation methods are biased and the size of the bias is unknown. Come and join us tomorrow to learn from Yi Su about how to build optimizers that do almost as well as if the bias was known!
For details:
@Maggiemakar
@zacharylipton
For those who like books, I also love the Anthony-Bartlett book
While it is quite short, it explains soo much about how SLT has evolved over the years!
RL Theory Seminars is pleased to present a talk by Yujia Jin (Stanford) tomorrow on "VOQL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation". For further details, check out
Representation learning and exploration in RL together? Aditya Modi got you covered! Details? Well, you should come to the next talk! For details visit:
A packed house to hear
@BFlanaganUofA
from the
@UAlberta
and
@AmiiThinks
announce that 20 new faculty will be hired in AI across campus in the next 3 years, with 5 of these positions in CS.
Advice for people thinking of registering an email address at CMT or other similar reviewing systems: Register an email that is NOT associated with your school/workplace. School and workplace change. Then you will end up with multiple identities, which is not what you want:)
What do you get when you cross modern Machine Learning with good old-fashioned Search?
An IJCAI distinguished paper award 🙂 for
Levin Tree Search with Context Models:
@pcastr
SOMs are an awesome example of how curiosity driven research looks like. Neither neuroscience, nor solving any real problem. Yet, one can still write books about SOMs, think about them in various ways, etc. Sg to remember when judging relevance while reviewing!
I hope everyone enjoyed ICLR. As promised, RL Theory seminars are back and we are super lucky to have Kwang-Sung Jun fixing our bad ideas about how to use Boltzmann exploration via the help of the mysterious "Maillard sampling" idea. Intrigued? Check out
Why do we use softmax to represent policies? Could we use some other "transfer" function? Which one? Pros/cons? Come to see our posters to hear about the gravitational pull of softmax and how physicist are always right! I can't guarantee to be up at the time of the oral though:)
Reinforcement Learning and Artificial Intelligence
Ladies and gentlemen! We are delighted to give you OPPO, optimistic policy optimization (very much related to the previous talk by the way!) to achieve efficient and effective exploration with linear function approximation in finite horizon MDPs as presented by Zhuoran Yang!
Our chance to stay positive during these dire times is to attend Simon's seminar tomorrow where I hope we learn that despite all other signs RL is not much harder than bandits. Long live RL, long live bandits!
Our next talk:
11/24: Simon S. Du (University of Washington)
"Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon"
For details, please see the website:
In a major scientific breakthrough, the latest version of
#AlphaFold
has been recognised as a solution to one of biology's grand challenges - the “protein folding problem”. It was validated today at
#CASP14
, the biennial Critical Assessment of protein Structure Prediction (1/3)
Reminder: this talk is coming up tomorrow!
***Note that the talk starts at 4PM UTC, one hour earlier than our regular time slot***
Public YouTube link:
Sign up for the talk on Google Meet:
Our next talk:
06/16: Niao He (UIUC)
"A Unified Switching System Perspective and O.D.E. Analysis of Q-Learning Algorithms"
For details, please see the website:
It is a great pleasure to have Fei Feng from UCLA speaking at our next seminar. Join us to learn about how to combine RL and unsupervised learning and keep everything provably efficient!
Join us on Tuesday to hear from Mengdi about the latest and greatest lower and upper bounds in off-policy evaluation with linear function approximation!
Our next talk:
08/04: Mengdi Wang (Princeton / DeepMind)
"Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation"
For details, please see the website:
Huge improvements for the sample complexity of RL for representation learning in low-rank (linear) MDPs! How? Why? Really? Come check out the seminar of Masatoshi Uehara tomorrow! For details follow this link:
Our next talk:
06/09: Shie Mannor (Technion)
"Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs"
For details, please see the website:
@ylecun
@nanjiang_cs
Perhaps better to focus on what needs to be done than on who is doing it or whether we call it RL or anything else. But I am glad you recognize that some sort of planning with models (or not?) will be needed! We are on the same page with this one. And Merry Christmas!! 2/2
Pessimism is back on stage! Join the RL Theory Seminars tomorrow to hear from Paria Rashidinejad about *more reasons* of why being pessimistic in the batch RL setting is actually good. Fast rates? Adaptive optimality? Pessimism delivers!
To the attention of strong final year PhD students, junior faculty in CS/Theory/..! Excellent opportunity to stay at Berkeley while the 'Theory of RL' and other programs are happening. Please pass it along to relevant candidates.
Episode 10
@CsabaSzepesvari
of DeepMind shares his views on Bandits, Adversaries, PUCT in AlphaGo / AlphaZero / MuZero, AGI and RL, what is timeless, and more!
We are glad to welcome Tadashi! Btw, I still have some openings for postdocs. PM me if you are interested in theoretical foundations of RL, and, more broadly decision making (stay tuned!), or you know someone who could be good!
Yep, good one! We could do more of this: "AI as a field is starving for a few carefully documented failures. [..] I can learn more by just being told why a technique won't work than by being made to read between the lines."
#SundayClassicPaper
📜: McDermott (1976) 'Artificial Intelligence Meets Natural Stupidity'. As we critique our own field, it is useful to see what recurs from the critique of the past. The critique on 'Wishful Mnemomics' seems still relevant.
@MarlosCMachado
Great for them! While international universities are great, we should not forget that local universities can also be great. I did all my studies in Hungary and I don't regret this the tiniest a bit! I met wonderful, dedicated, caring, knowledgable profs there, which meant a lot!
In RL being optimistic is often the "right thing" when learning interactively. But what happens in the batch case? Perhaps pessimism is then the best? Come join us next Tuesday to learn the answer and more from Ying Jin!
For details check out
Exploration! The hunt for the "right" characterization of sample efficiently learnable RL problem classes is not over yet! Enter the Bellman eluder dimension, which subsumes all that came before, as Quinghua Liu will kindly explain to all of us who care.