Aldo Pacchiano Profile Banner
Aldo Pacchiano Profile
Aldo Pacchiano

@aldopacchiano

1,144
Followers
443
Following
12
Media
596
Statuses

AI research at Broad Institute and Boston University. Mexicano 🇲🇽

Boston, MA, USA
Joined September 2010
Don't wanna be here? Send us removal request.
Pinned Tweet
@aldopacchiano
Aldo Pacchiano
1 year
An overview of Online Model Selection results for contextual bandits and RL. Presented at the UCL Statistical Science Seminar.
0
4
40
@aldopacchiano
Aldo Pacchiano
9 months
(1/2) In 2024 I will be joining Boston University as an Assistant Professor in Computing and Data Sciences (CDS). Seeking Ph.D. students passionate about sequential decision making, reinforcement learning, and/or algorithmic fairness.
10
33
272
@aldopacchiano
Aldo Pacchiano
8 months
I am looking for postdocs to join my group at Boston University in the summer / fall of 2024 with interest in sequential decision making, RL, bandits. Both theory and experimental backgrounds apply. Some topics of interest: decision making with FMs, meta learning, RLHF, safe RL.
0
18
93
@aldopacchiano
Aldo Pacchiano
11 months
[1/3] A couple of papers [4] in Sequential Decision Making accepted to #Neurips2023 ! See you in New Orleans: 1. "In-Context Decision-Making from Supervised Pretraining" (spotlight) - Sequential Decision Making and Transformers.
2
0
69
@aldopacchiano
Aldo Pacchiano
7 months
Our paper "Improving Offline RL by Blending Heuristics" will be presented at #ICLR2024 as a spotlight contribution! Joint work with Sinong Chen, Andrey Kolobov and Ching-An Cheng ( @chinganc_rl ). Will post more info soon.
0
5
46
@aldopacchiano
Aldo Pacchiano
7 months
Our work "Data-Driven Online Model Selection With Regret Guarantees " will be @ #AISTATS2024 ! Our algorithms satisfy data dependent model selection bounds and are very simple and beautiful! Joint work with Christoph Dann @chrodan and Claudio Gentile.
0
8
42
@aldopacchiano
Aldo Pacchiano
1 year
(1/2) In this work we introduce the Decision Pretrained Transformer that uses supervised pre-training for in-context learning for sequential decision making scenarios such as RL and bandits. Interestingly the learning procedure DPT produces has connections to posterior sampling!
@ofirnachum
Ofir Nachum
1 year
Say you have a bunch of logged data of agents (eg PPO) learning various RL tasks. How should you distill this data into a single agent that can quickly learn new tasks? Simple autoregressive modeling would give you a learner no better than the agents it trained from....
Tweet media one
4
51
294
1
7
41
@aldopacchiano
Aldo Pacchiano
2 years
Information about the optimal value function can be used to reduce the effective state space size during exploration in RL problems. In this Neurips 2022 paper we flesh out this intuition and provide a simple value clipping algorithmic recipe to achieve these improved bounds.
@svlevine
Sergey Levine
2 years
In theory RL is intractable w/o exploration bonuses. In practice, we rarely use them. What's up with that? Critical to practical RL is reward shaping, but there is little theory about it. Our new paper analyzes sample complexity w/ shaped rewards: Thread:
Tweet media one
5
39
228
0
3
39
@aldopacchiano
Aldo Pacchiano
2 years
Rewrote our online model selection paper from Neurips 2020. It should be easier to read now! If you want to learn of simple ways of combining base algorithms and obtain rates scaling with the best one take a look.
2
2
38
@aldopacchiano
Aldo Pacchiano
2 years
We rewrote our Reinforcement Learning with trajectory preference feedback paper. It should be readable now! We prove sample complexity bounds for an RL model where the feedback comes from comparing full trajectories. With @AadirupaSaha and Jonathan Lee
0
4
36
@aldopacchiano
Aldo Pacchiano
7 months
Our Neurips 2023 paper on Experiment Planning with Function Approximation is on arXiv now! Joint work with Jonathan Lee and Emma Brunskill ( @EmmaBrunskill ).
@aldopacchiano
Aldo Pacchiano
9 months
(1/5) In experiment planning a learner uses a set of unlabeled contexts to build a sequence of policies used to collect reward signals during deployment. An experiment planner cannot react adaptively to the rewards received during data collection.
1
0
5
0
0
31
@aldopacchiano
Aldo Pacchiano
1 year
[1/n] Our pre-print is finally up! We introduce the dissimilarity dimension that among other things can be used to derive sharper bounds than with the eluder dimension for optimistic least squares algorithms with function approximation.
1
15
31
@aldopacchiano
Aldo Pacchiano
2 years
New guarantees for parallel contextual bandits under eluder dimension. Our techniques for analyzing parallel learning under eluder port to RL with function approximation. In silico experiments on semi-conductor data and biological sequence design.
2
8
32
@aldopacchiano
Aldo Pacchiano
2 years
(1/2) In this very preliminary work we introduce a model for an important set of transfer RL problems based on the concept of Undo Maps. We propose a distribution matching algorithm to solve transfer RL problems that can be modeled in this formalism.
1
4
30
@aldopacchiano
Aldo Pacchiano
2 years
Great opportunity! Internships @MSFTResearch NYC next summer :)
@MiroDudik
Miro Dudik
2 years
Applications for PhD internships in AI at @MSFTResearch NYC are now out! Please come work with @jordan_t_ash , Dylan Foster, Akshay Krishnamurthy, Alex Lamb, @JohnCLangford , Dipendra Misra, Lekan Molu, @criticalneuro , Rob Schapire, Cyril Zhang.
4
49
222
0
1
24
@aldopacchiano
Aldo Pacchiano
1 year
[1/n] When faced with multiple hyperparameter choices for the same algorithmic template or even different algorithms, determining the best option to maximize a reward function becomes crucial. This scenario is common in reinforcement learning tasks,
1
2
25
@aldopacchiano
Aldo Pacchiano
1 year
If you want to know more about Online Model Selection. I will be giving a talk at the UCL Statistical Science seminar next week :)
@stats_UCL
UCL Statistical Science
1 year
Next week's seminar will take place next Thurs 2nd March 14:00-15:00. The speaker will be Aldo Pacchiano (Microsoft Research, NYC). @aldopacchiano ONLINE ONLY Link to join online: contact Dr. Emma Simpson (emma.simpson @ucl .ac.uk)
Tweet media one
0
0
9
0
1
23
@aldopacchiano
Aldo Pacchiano
2 years
Presenting two posters today at 11 am @NeurIPS ! 1) Best of Both Worlds Model Selection. 2) Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity. Come say hi!
1
1
23
@aldopacchiano
Aldo Pacchiano
6 months
I’ll be giving a talk today at 10:30 about the dissimilarity dimension and its relation to optimistic algorithms @ science center @ Harvard. If anyone is in the area feel free to pass by!
0
0
23
@aldopacchiano
Aldo Pacchiano
1 year
This is Tomorrow :)
@stats_UCL
UCL Statistical Science
1 year
Next week's seminar will take place next Thurs 2nd March 14:00-15:00. The speaker will be Aldo Pacchiano (Microsoft Research, NYC). @aldopacchiano ONLINE ONLY Link to join online: contact Dr. Emma Simpson (emma.simpson @ucl .ac.uk)
Tweet media one
0
0
9
0
1
20
@aldopacchiano
Aldo Pacchiano
8 months
We are presenting this work today #NeurIPS !
@ofirnachum
Ofir Nachum
1 year
Say you have a bunch of logged data of agents (eg PPO) learning various RL tasks. How should you distill this data into a single agent that can quickly learn new tasks? Simple autoregressive modeling would give you a learner no better than the agents it trained from....
Tweet media one
4
51
294
0
1
21
@aldopacchiano
Aldo Pacchiano
3 years
In this Neurips 2021 paper () we study a class of classification problems exemplified by the bank loan problem, where a lender decides whether or not to issue a loan. The lender only observes whether a customer will repay a loan if the loan is issued. (1/n)
3
3
21
@aldopacchiano
Aldo Pacchiano
2 years
(1/2) Our work on "Neural Design for Genetic Perturbation Experiments" was accepted to ICLR 2023 as a spotlight presentation. Here we introduce several methods for exploration using optimistic diverse predictions with Neural Networks.
3
2
19
@aldopacchiano
Aldo Pacchiano
2 years
Our Multi-Player Multi-Armed bandit paper will be presented at ALT 2023. We introduce an algorithm for the no-sensing setting that achieves logarithmic rates when the collision reward may be non-zero. w P. Bartlett and M. Jordan.
0
2
19
@aldopacchiano
Aldo Pacchiano
3 years
(1/3 ) Happy to finally post this paper on arxiv! In this work we propose an algorithm with logarithmic instance dependent regret guarantees for the Multi-Player Multi-Armed bandit problem.
3
3
19
@aldopacchiano
Aldo Pacchiano
4 years
Selecting for the right model class in reinforcement learning and bandits is important to find a good policy. We provide a new algorithmic approach for model selection based on the principle of regret balancing that guarantees adaptation to the best model:
1
1
18
@aldopacchiano
Aldo Pacchiano
2 years
(1/3) New preprint! In this work we introduce the formal study of the FineTune RL paradigm, where the learner has access to an offline dataset and is also allowed online deployments in order to find an almost optimal policy. Joint with Andrew Wagenmaker
1
0
16
@aldopacchiano
Aldo Pacchiano
1 year
Our work will be presented at ICML 2023. See you in Hawaii! Joint with Andrew Wagenmaker.
@aldopacchiano
Aldo Pacchiano
2 years
(1/3) New preprint! In this work we introduce the formal study of the FineTune RL paradigm, where the learner has access to an offline dataset and is also allowed online deployments in order to find an almost optimal policy. Joint with Andrew Wagenmaker
1
0
16
1
0
17
@aldopacchiano
Aldo Pacchiano
1 year
[1/5] Thanks to everyone that visited our poster on Thursday! *Joint work with Andrew Wagenmaker. Today I'll be presenting a couple of workshop papers. A list below:
1
0
17
@aldopacchiano
Aldo Pacchiano
7 months
(1/4) Finally finished writing our journal-length paper "Contextual Bandits with Stage-wise Constraints". In this work we study the anytime constraint satisfaction scenario in contextual bandits with a reward and a cost function.
1
0
15
@aldopacchiano
Aldo Pacchiano
10 months
Our next speaker will be Stephen McAleer ( @McaleerStephen ) who will be talking about general virtual agents via LLMs!
@BU_CDS
BU Computing & Data Sciences
10 months
"Big thanks to everyone who joined our Machine Learning Symposium's inaugural talk! Special kudos to @BUQuestrom Prof. Jinglong Zhao for delivering an outstanding session on Adaptive Neyman allocation," CDS @aldopacchiano . Check out the fall'23 ML lineup:
Tweet media one
0
2
9
0
2
12
@aldopacchiano
Aldo Pacchiano
2 years
MSR NYC is hiring postdocs to start next Summer. It is a great opportunity and a fabulous lab! :)
@MiroDudik
Miro Dudik
2 years
PhD candidates in ML/AI: @MSFTResearch NYC is hiring several postdocs in general ML, especially in theoretical ML, interactive ML (including RL and active learning), and NLP (including applications of RL). Deadline: 𝐃𝐞𝐜𝐞𝐦𝐛𝐞𝐫 𝟗
5
71
214
0
3
14
@aldopacchiano
Aldo Pacchiano
2 years
Happy to finally post my new model selection work with @chrodan and Claudio Gentile . In this work we ask whether it is possible to achieve best of both worlds and model selection rates simultaneously. (1/4)
1
2
13
@aldopacchiano
Aldo Pacchiano
2 years
Presenting a posters today at 11 am @NeurIPS ! "Learning General World Models in a Handful of Reward-Free Deployments" Come say hi!
0
0
13
@aldopacchiano
Aldo Pacchiano
8 months
Manifesting emergent behavior with @BrandoHablando 🇲🇽 #NeurIPS2023
Tweet media one
0
1
13
@aldopacchiano
Aldo Pacchiano
2 years
@ilyasut I am not sure this is the necessary explanation. It is similar to saying the success we've had building planes that fly -since they use wings- explains how birds fly. The explanation you mention may hold but it isn't knowlable without bio research.
1
0
11
@aldopacchiano
Aldo Pacchiano
2 years
Excited to present our paper “Towards an Understanding of Default Policies in Multitask Policy Optimization” at AISTATS 2022. Feel free to pass by our poster tomorrow!
@ted_moskovitz
Ted Moskovitz
2 years
Excited to say that our #AISTATS2022 paper “Towards an Understanding of Default Policies in Multitask Policy Optimization” was given an Honorable Mention for Best Paper! If you’re interested in hearing more (or are very bored), stop by our poster tomorrow at 4:30 BST 1/
2
8
34
0
0
12
@aldopacchiano
Aldo Pacchiano
4 years
@pcastr @iclr_conf @MarlosCMachado @marcgbellemare @agarwl_ Nice! In this ICML 2020 paper () we introduced Behavior Guided RL based on the concept of behavioral embeddings to guide policy optimization. It would be great to explore the connections between our two works.  @kchorolab @jparkerholder @robinphysics
3
0
12
@aldopacchiano
Aldo Pacchiano
2 years
New preprint with fantastic co-authors Jonathan Lee , Weihao Kong , Vidya Muthukumar and Emma Brunskill ( @EmmaBrunskill ) on sub-linear estimation of optimal policy values in Contextual Linear Bandits.
0
1
11
@aldopacchiano
Aldo Pacchiano
8 months
We are presenting our work "A Unified Model and Dimension for Interactive Estimation" today #Neurips2023 . Here we introduce the dissimilarity dimension. Among other things sharper than the eluder dimension in the analysis of optimistic algorithms. Poster 2008 at 10:45 am.
@aldopacchiano
Aldo Pacchiano
1 year
[1/n] Our pre-print is finally up! We introduce the dissimilarity dimension that among other things can be used to derive sharper bounds than with the eluder dimension for optimistic least squares algorithms with function approximation.
1
15
31
0
0
11
@aldopacchiano
Aldo Pacchiano
9 months
Great!!
@RL_Conference
RL_Conference
9 months
Thrilled to announce the first annual Reinforcement Learning Conference @RL_Conference , which will be held at UMass Amherst August 9-12! RLC is the first strongly peer-reviewed RL venue with proceedings, and our call for papers is now available: .
Tweet media one
4
95
239
0
0
11
@aldopacchiano
Aldo Pacchiano
1 year
New short story:
0
0
10
@aldopacchiano
Aldo Pacchiano
1 year
[4/5] 3) Undo Maps: A Tool for Adapting Policies to Perceptual Distortions. New Frontiers in Learning, Control, and Dynamical Systems - Joint work with Abhi Gupta, Ted Moskovitz @ted_moskovitz and David Alvarez-Melis. Link:
1
3
9
@aldopacchiano
Aldo Pacchiano
4 years
Great work by @ted_moskovitz , @MichaelArbel , @fhuszar , @ArthurGretton extending Behavior Guided RL (BGRL) with the use of Wasserstein Natural gradients! Soon to be presented at @iclr_conf .
@ted_moskovitz
Ted Moskovitz
4 years
happy to say our paper was accepted @iclr_conf ! we hope anyone interested in RL or optimization will find it interesting. we’ve released our implementation of WNPG (), and should have WNES out soon as well!
3
9
30
0
2
9
@aldopacchiano
Aldo Pacchiano
1 year
¡Felicidades a los participantes de México por quedar en lugar 14avo por países en la Olimpiada Internacional de Matemáticas! 🇲🇽
Tweet media one
3
1
10
@aldopacchiano
Aldo Pacchiano
9 months
(2/2) Here is the link to the CDS PhD program application [] due Dec 15.
0
0
10
@aldopacchiano
Aldo Pacchiano
2 years
This is really good
@AvivTamar1
Aviv Tamar
2 years
Enough games. The RL field needs to mature. New blog post with @shiemannor
16
72
391
1
1
10
@aldopacchiano
Aldo Pacchiano
11 months
[2/3] 2. "A Unified Model and Dimension for Interactive Estimation" - We introduce the dissimilarity dimension, sharper than Eluder for OLS. 3. "Experiment Planning with Function Approximation" - fnc approx algorithms for experiment planning and lower bounds.
1
0
9
@aldopacchiano
Aldo Pacchiano
10 months
This is today! Stephen @McaleerStephen will give a talk at BU CDS!
@BU_CDS
BU Computing & Data Sciences
10 months
Happening Today: "Toward General Virtual Agents" talk with Stephen McAleer, @CarnegieMellon . Learn more about the topic and the Machine Learning Symposium: @aldopacchiano
Tweet media one
0
1
3
0
1
8
@aldopacchiano
Aldo Pacchiano
11 months
[3/3] 4. "Anytime Model Selection in Linear Bandits" - Scenario where it is possible to obtain a log number of models dependence for model selection. More detailed threads to follow!
1
0
8
@aldopacchiano
Aldo Pacchiano
9 months
(1/2) En 2024 voy a empezar como profesor asistente en la facultad de Computing and Data Sciences (CDS) en Boston University. Busco estudiantes de doctorado con intereses en sequential decision making, reinforcement learning, y algorithmic fairness.
1
0
7
@aldopacchiano
Aldo Pacchiano
8 months
@McaleerStephen About 4. The problems that we have been trying to model and solve via RL aren’t going away. We should think whether RL is the right model for all of them and if not come up with different ones. Understanding RL should not be the research objective, but instead solve problems.
0
0
7
@aldopacchiano
Aldo Pacchiano
2 years
@thanhnguyentang Depends on the nature of the work itself. If it is very technical it probably will get a better audience/reviews at COLT/ALT. If it is about fleshing out a 'simple' idea and connecting it to practical problems it is probably more suitable for Neurips/ICML.
1
0
7
@aldopacchiano
Aldo Pacchiano
2 years
(1/3) Parallel deployment in adaptive environments requires algorithms that not only reduce uncertainty and exploit high reward regions (when a reward signal is present) but also produce diverse exploration policies.
@YingchenX
Yingchen Xu
2 years
Interested in learning general world models at scale? 🌍 Check out our new #NeurIPS2022 paper to find out! Paper: Website: [1/N]
3
42
161
2
1
7
@aldopacchiano
Aldo Pacchiano
2 years
Good advice!
@j_foerst
Jakob Foerster
2 years
I drafted a quick "How to" guide for writing ML papers. I hope this will be useful (if a little late!) for #NeurIPS2022 . Happy paper writing and best of luck!!
24
274
1K
0
0
6
@aldopacchiano
Aldo Pacchiano
1 year
:)
@ilijabogunovic
Ilija Bogunovic
1 year
Take a look at our ReAlML website to access an exciting collection of recorded talks on real-world experiment design, active learning, and RL! Visit to start exploring! @mutny_ml @willieneis
Tweet media one
1
6
27
0
0
6
@aldopacchiano
Aldo Pacchiano
2 years
@zhengyaojiang there are some works that are trying to approximate provable forms of exploration in NN domains. It is usually hard to port those ideas. See for example:
2
0
5
@aldopacchiano
Aldo Pacchiano
1 year
@shortstein Depends on how the impossibility result works. If it is of the form “there is a pathological instance where this is impossible” then certainly it is possible that for typical instances things aren’t that hard. In some of these cases I could see an argument for experiments.
0
0
6
@aldopacchiano
Aldo Pacchiano
11 months
Submissions are still open for ISAIM 2024. The conference will take place in Florida, January 2024. There will be a deep RL special session organized by @abhishekunique7 and @zhaoran_wang !
@DDiochnos
Dimitris Diochnos
11 months
I have advertised the International Symposium on Artificial Intelligence and Mathematics ( #ISAIM ) 2024 through mailing lists and appropriate Google Groups but it is probably best if there is a post here as well. Brief description below. 👇 🧵
1
3
6
0
1
4
@aldopacchiano
Aldo Pacchiano
2 years
nice!
@FeryalMP
Feryal
2 years
I’m super excited to share our work on AdA: An Adaptive Agent capable of hypothesis-driven exploration which solves challenging unseen tasks with just a handful of experience, at a similar timescale to humans. See the thread for more details 👇 [1/N]
25
266
1K
0
0
5
@aldopacchiano
Aldo Pacchiano
4 years
Imposing fairness constraints on the output of a model is a way to correct for fairness imbalances. In this work - - we build on the framework of Wasserstein Fair classification that permits distributional constraints on the shape of the model predictions.
1
0
5
@aldopacchiano
Aldo Pacchiano
9 months
(1/5) In experiment planning a learner uses a set of unlabeled contexts to build a sequence of policies used to collect reward signals during deployment. An experiment planner cannot react adaptively to the rewards received during data collection.
1
0
5
@aldopacchiano
Aldo Pacchiano
2 years
This summer, Chicago, TTIC
@BeyondrlT
BeyondRL TTIC
2 years
** Workshop, TTIC, July 13-15th: Online decision-making and real-world applications ** -) Why is it challenging to deploy online decision-making alg. in real-world problems?🤨 -) Which models describe these challenges?🤔 -) What is the path towards making RL be practical?😲
3
4
26
0
0
5
@aldopacchiano
Aldo Pacchiano
8 months
Manifestando comportamiento emergente con @BrandoHablando 🇲🇽 #NeurIPS2023
Tweet media one
0
0
5
@aldopacchiano
Aldo Pacchiano
1 year
This is really good!
@cong_ml
Cong Lu
1 year
RL agents🤖need a lot of data, which they usually need to gather themselves. But does that data need to be real? Enter *Synthetic Experience Replay*, leveraging recent advances in #GenerativeAI in order to vastly upsample⬆️ an agent’s training data! [1/N]
5
37
184
1
0
5
@aldopacchiano
Aldo Pacchiano
8 months
@chris_hitzel @ElanRosenfeld Another factor I believe is that for many international students (at least in the US) it is really hard to take a riskier route. Failure may mean you have to leave the country. There isn’t a “let’s just do great research and if it fails I’ll do something else”.
1
0
5
@aldopacchiano
Aldo Pacchiano
4 years
@agarwl_ Nice! In this ICML 2020 paper () we introduced Behavior Guided RL based on the concept of behavioral embeddings to guide policy optimization. It would be great to explore the connections between our two works. @kchorolab @jparkerholder @robinphysics
0
1
5
@aldopacchiano
Aldo Pacchiano
8 months
@KylerCora amazing image
0
0
3
@aldopacchiano
Aldo Pacchiano
3 years
@adjiboussodieng The issue is that PhD admissions in the US doesn't value just coursework. They now require you to have a lot of research experience even before starting.
0
0
5
@aldopacchiano
Aldo Pacchiano
4 years
Our new work on Model Selection for RL and bandits with some amazing collaborators!
@EmmaBrunskill
Emma Brunskill
4 years
Using the right function class in RL is important for learning a high-value policy but learning speed/regret typically worsens as the class complexity grows. We give a new RL alg that takes a set of models & has regret that adapts to the best model size
1
7
88
0
0
5
@aldopacchiano
Aldo Pacchiano
4 years
Have you ever wondered how to find diverse solutions in non-convex landscapes in RL and SL? Here we introduce Ridge Rider, a method that relies on curvature information to ride through the optimization landscape and find multiple qualitatively distinct optima.
@j_foerst
Jakob Foerster
4 years
The gradient is a locally greedy direction. Where do you get if you follow the eigenvectors of the Hessian instead? Our new paper, “Ridge Rider” (), explores how to do this and what happens in a variety of (toy) problems (if you dare to do so),.. Thread 1/N
Tweet media one
4
71
585
0
0
4
@aldopacchiano
Aldo Pacchiano
1 year
Great opportunity! Richard is great
@XingyouSong
Richard Song
1 year
My team is looking to hire a student researcher (20% capacity, 3 months) to see how far we can take LMs to perform optimization/AutoML. If you're already in team matching, DM me or email xingyousong @google .com if interested! Link:
1
15
85
0
0
4
@aldopacchiano
Aldo Pacchiano
11 months
Hace unos ayeres me tocó estar en las discusiones que llevaron a la organización de la primer RIIAA. Me da gusto ver que el evento no solo se ha mantenido sino que ha crecido. ¡La lista de ponentes se ve espectacular!
@pcastr
Pablo Samuel Castro
11 months
📢¡latin americans in AI!📢 pre-register before october 13th to attend a conference/summer-school in quito next february with some incredible speakers (pictured below). don't think you can afford it? we have travel grants, so apply now!
Tweet media one
0
22
53
0
0
4
@aldopacchiano
Aldo Pacchiano
1 year
Cool work!
@AkankshaSaran
Akanksha Saran
1 year
In our recent paper accepted at #ICLR2023 , we propose IGL-P, a personalized reward learning algorithm for the Interaction-Grounded Learning (IGL) paradigm. Our approach is well-suited to alleviate hand-defined reward engineering for recommender systems.
1
7
63
0
0
4
@aldopacchiano
Aldo Pacchiano
2 years
Que buena final. Felicidades Argentina y felicidades Messi.
0
1
4
@aldopacchiano
Aldo Pacchiano
2 years
@david_rolnick Variations around criminality would also be interesting.
0
0
4
@aldopacchiano
Aldo Pacchiano
8 months
Amazing!
@EfroniYonathan
Yonathan Efroni
8 months
🤖Call for RL internship🤖 The Applied Reinforcement Learning team at Meta is hiring research intern. If you're curious about exploring different aspects of RL and its applications in large-scale systems, please apply here:
3
14
118
0
0
4
@aldopacchiano
Aldo Pacchiano
3 years
@CsabaSzepesvari @peter_richtarik What about having a system where reviewers are reviewed? It may be good to have either a public reviewer score or if a reviewer is judged to have made a very bad job, prohibit this person from submitting to next year's conference.
0
0
4
@aldopacchiano
Aldo Pacchiano
2 years
@zhengyaojiang there has been recent cool work by some deep mind folks in this direction and @misovalko
2
0
4
@aldopacchiano
Aldo Pacchiano
1 year
[8/n] In summary: the eluder dimension does not appropriately capture the informativeness of the set of candidate optima, while the dissimilarity dimension does. Joint work with Nataly Brukhim, Miroslav Dudík and Robert Schapire.
0
0
4
@aldopacchiano
Aldo Pacchiano
2 years
Ahi vayan a escuchar a Andrés. Es muy buen muchacho.
@algekalipso
Captain Pleasure, Andrés Gómez Emilsson
2 years
Estoy en la Ciudad de México. Mañana voy a dar una plática presencial en el CCH SUR :D
Tweet media one
8
4
39
1
1
4
@aldopacchiano
Aldo Pacchiano
4 years
Our work on using determinants to encourage diversity in reinforcement learning :)
@jparkerholder
Jack Parker-Holder
4 years
My attempt to explain why you should use determinants to measure population diversity (TL;DR: it ensures all agents are distinct):
1
0
6
1
0
4
@aldopacchiano
Aldo Pacchiano
4 years
;)
@ykilcher
Yannic Kilcher 🇸🇨
4 years
I'm just so amazed at how people continue to come up with new variants of research about bandits.
0
1
28
0
0
4
@aldopacchiano
Aldo Pacchiano
3 years
New paper with @niladrichat , Peter Bartlett and Michael Jordan called "On the Theory of Reinforcement Learning with Once-per-Episode Feedback": . It was very interesting for us to think about non-Markovian reward models for reinforcement learning!!
@niladrichat
Niladri Chatterji
3 years
New paper with @aldopacchiano , Peter Bartlett and Michael Jordan called "On the Theory of Reinforcement Learning with Once-per-Episode Feedback": . It was very interesting for us to think about non-Markovian reward models for reinforcement learning (1/2.)
1
2
30
0
0
4
@aldopacchiano
Aldo Pacchiano
2 years
A mighty workshop indeed!
@jparkerholder
Jack Parker-Holder
2 years
Looking forward to seeing all the creative ideas submitted to this workshop! Submit by September 22nd 😀
0
1
20
0
0
4
@aldopacchiano
Aldo Pacchiano
2 years
Good stuff!
@SOURADIPCHAKR18
Souradip Chakraborty
2 years
Reward Shaping is a common practice for Sparse RL, but lacks theoretical guarantees (most) & needs expertise. At @corl_conf #CoRL2022 , we present HTRON: Heavy-Tailed Adaptive Reinforce Algorithm for Sparse Navigation Robotics tasks @kaweer_ @amritsinghbedi3 @robobzbz @dmanocha
Tweet media one
1
2
10
1
0
4
@aldopacchiano
Aldo Pacchiano
2 years
@hardmaru @StableDiffusion I am not so certain. The connection between the prompt and the image is so tenuously related to the inner world of the artist that I am not sure it can be called art. It all feels like saying Julius II was a great artist because he commissioned the Sistine Chapel.
2
0
3
@aldopacchiano
Aldo Pacchiano
1 year
This is the best spam email I have ever received 🤣🤣
Tweet media one
2
0
3
@aldopacchiano
Aldo Pacchiano
2 years
(2/2) Finally we test our algorithms in simple simulation environments. Joint work with Abhi Gupta, @ted_moskovitz and @elmelis .
0
0
3
@aldopacchiano
Aldo Pacchiano
1 year
[3/n] To understand why the eluder dimension does not fully capture the behavior of optimistic algorithms consider a function class where the action space equals the interval [0,1]. All the functions in this class have optima at x = 1/4 and x' = 3/4.
Tweet media one
1
0
3
@aldopacchiano
Aldo Pacchiano
1 year
[2/n] We do this by studying a framework called interactive estimation where the goal is to estimate a target from its “similarity” to points queried by the learner. Our framework unifies two learning models: statistical-query learning and structured bandits.
1
0
3
@aldopacchiano
Aldo Pacchiano
8 months
This will be amazing!
@abhishekunique7
Abhishek Gupta
8 months
In other news, we are organizing a special session at ISAIM 2024 () on Deep RL: Bridging Theory and Practice! on Jan 8th with @zhaoran_wang , and great speakers! Your submissions are welcome! Please see the website for call details
0
3
6
0
0
1
@aldopacchiano
Aldo Pacchiano
2 years
very interesting indeed!
@zhengyaojiang
Zhengyao Jiang
2 years
It's interesting that current RL methods learn exploitation from data but the exploration is still mostly based on hard-coded rules (noise around optimal policy/maximizing state entropy etc.), even though efficient exploration is more difficult.
4
7
63
0
0
3
@aldopacchiano
Aldo Pacchiano
3 years
@OmarRivasplata The mighty Omarian bound!
1
0
3