Aldo Pacchiano @aldopacchiano Twitter profile

Pinned Tweet

Aldo Pacchiano

1 year

An overview of Online Model Selection results for contextual bandits and RL. Presented at the UCL Statistical Science Seminar.

2 March 2023: Aldo Pacchiano (Microsoft Research)

Online Model Selection: the principle of regret balancingWe will introduce the problem of online model selection where a learner is to select among a set of ...

www.youtube.com

0

4

40

Last Seen Profiles

@GaryGoodspeed92

@umememumemem

@lojayoucom

@PricelessJoi

@ismaelrondaa

@renjunsito3o

@CX_1103

@days69_bar

@mymr_miu

@bokeplokalmalam

@whoislavish

@Mikmagik_

@ElTaliban87

@Qatar_Customs

@Spotifydedi

@gaabzers

@Kicking_Mules

@TORACO_TIGERS

@izzie_price

@GaryBankhead

@Yuta_Owner

@dreadtks

@herts_golf

@Javiberlin2020

@Derrick55096

@TuckerDunn9

@GartlyDM

@LeoLim76797906

@DaveAbberger

@ibubohay2

@DerrickHornSr1

@SierraClubMN

@jen_williams

@cole35352381

@DundeeFootball

@GarethBebb

Aldo Pacchiano

@aldopacchiano

9 months

(1/2) In 2024 I will be joining Boston University as an Assistant Professor in Computing and Data Sciences (CDS). Seeking Ph.D. students passionate about sequential decision making, reinforcement learning, and/or algorithmic fairness.

10

33

272

Aldo Pacchiano

@aldopacchiano

8 months

I am looking for postdocs to join my group at Boston University in the summer / fall of 2024 with interest in sequential decision making, RL, bandits. Both theory and experimental backgrounds apply. Some topics of interest: decision making with FMs, meta learning, RLHF, safe RL.

0

18

93

Aldo Pacchiano

@aldopacchiano

11 months

[1/3] A couple of papers [4] in Sequential Decision Making accepted to #Neurips2023 ! See you in New Orleans: 1. "In-Context Decision-Making from Supervised Pretraining" (spotlight) - Sequential Decision Making and Transformers.

2

0

69

Aldo Pacchiano

@aldopacchiano

1 year

On my way to ICML 2023. See you there! Presenting joint work with Andrew Wagenmaker (see below) and a couple of workshop papers. More info to follow.

Leveraging Offline Data in Online Reinforcement Learning

Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must...

arxiv.org

0

4

49

Aldo Pacchiano

@aldopacchiano

7 months

Our paper "Improving Offline RL by Blending Heuristics" will be presented at #ICLR2024 as a spotlight contribution! Joint work with Sinong Chen, Andrey Kolobov and Ching-An Cheng ( @chinganc_rl ). Will post more info soon.

Improving Offline RL by Blending Heuristics

We propose Heuristic Blending (HUBL), a simple performance-improving technique for a broad class of offline RL algorithms based on value bootstrapping. HUBL modifies the Bellman operators used in...

arxiv.org

0

5

46

Aldo Pacchiano

@aldopacchiano

7 months

Our work "Data-Driven Online Model Selection With Regret Guarantees " will be @ #AISTATS2024 ! Our algorithms satisfy data dependent model selection bounds and are very simple and beautiful! Joint work with Christoph Dann @chrodan and Claudio Gentile.

Data-Driven Online Model Selection With Regret Guarantees

We consider model selection for sequential decision making in stochastic environments with bandit feedback, where a meta-learner has at its disposal a pool of base learners, and decides on the fly...

arxiv.org

0

8

42

Aldo Pacchiano

@aldopacchiano

1 year

(1/2) In this work we introduce the Decision Pretrained Transformer that uses supervised pre-training for in-context learning for sequential decision making scenarios such as RL and bandits. Interestingly the learning procedure DPT produces has connections to posterior sampling!

Ofir Nachum

@ofirnachum

1 year

Say you have a bunch of logged data of agents (eg PPO) learning various RL tasks. How should you distill this data into a single agent that can quickly learn new tasks? Simple autoregressive modeling would give you a learner no better than the agents it trained from....

4

51

294

1

7

41

Aldo Pacchiano

@aldopacchiano

2 years

Information about the optimal value function can be used to reduce the effective state space size during exploration in RL problems. In this Neurips 2022 paper we flesh out this intuition and provide a simple value clipping algorithmic recipe to achieve these improved bounds.

Sergey Levine

@svlevine

2 years

In theory RL is intractable w/o exploration bonuses. In practice, we rarely use them. What's up with that? Critical to practical RL is reward shaping, but there is little theory about it. Our new paper analyzes sample complexity w/ shaped rewards: Thread:

5

39

228

0

3

39

Aldo Pacchiano

@aldopacchiano

2 years

Rewrote our online model selection paper from Neurips 2020. It should be easier to read now! If you want to learn of simple ways of combining base algorithms and obtain rates scaling with the best one take a look.

Model Selection in Contextual Stochastic Bandit Problems

We study bandit model selection in stochastic environments. Our approach relies on a meta-algorithm that selects between candidate base algorithms. We develop a meta-algorithm-base algorithm...

arxiv.org

2

38

Aldo Pacchiano

@aldopacchiano

2 years

We rewrote our Reinforcement Learning with trajectory preference feedback paper. It should be readable now! We prove sample complexity bounds for an RL model where the feedback comes from comparing full trajectories. With @AadirupaSaha and Jonathan Lee

0

4

36

Aldo Pacchiano

@aldopacchiano

7 months

Our Neurips 2023 paper on Experiment Planning with Function Approximation is on arXiv now! Joint work with Jonathan Lee and Emma Brunskill ( @EmmaBrunskill ).

Experiment Planning with Function Approximation

We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms -- for...

arxiv.org

Aldo Pacchiano

@aldopacchiano

9 months

(1/5) In experiment planning a learner uses a set of unlabeled contexts to build a sequence of policies used to collect reward signals during deployment. An experiment planner cannot react adaptively to the rewards received during data collection.

1

0

5

0

31

Aldo Pacchiano

@aldopacchiano

1 year

[1/n] Our pre-print is finally up! We introduce the dissimilarity dimension that among other things can be used to derive sharper bounds than with the eluder dimension for optimistic least squares algorithms with function approximation.

A Unified Model and Dimension for Interactive Estimation

We study an abstract framework for interactive learning called interactive estimation in which the goal is to estimate a target from its "similarity'' to points queried by the learner. We...

arxiv.org

1

15

31

Aldo Pacchiano

@aldopacchiano

2 years

New guarantees for parallel contextual bandits under eluder dimension. Our techniques for analyzing parallel learning under eluder port to RL with function approximation. In silico experiments on semi-conductor data and biological sequence design.

Parallelizing Contextual Bandits

Standard approaches to decision-making under uncertainty focus on sequential exploration of the space of decisions. However, \textit{simultaneously} proposing a batch of decisions, which leverages...

arxiv.org

2

8

32

Aldo Pacchiano

@aldopacchiano

2 years

(1/2) In this very preliminary work we introduce a model for an important set of transfer RL problems based on the concept of Undo Maps. We propose a distribution matching algorithm to solve transfer RL problems that can be modeled in this formalism.

Transfer RL via the Undo Maps Formalism

Transferring knowledge across domains is one of the most fundamental problems in machine learning, but doing so effectively in the context of reinforcement learning remains largely an open...

arxiv.org

1

4

30

Aldo Pacchiano

@aldopacchiano

2 years

Great opportunity! Internships @MSFTResearch NYC next summer :)

Miro Dudik

@MiroDudik

2 years

Applications for PhD internships in AI at @MSFTResearch NYC are now out! Please come work with @jordan_t_ash , Dylan Foster, Akshay Krishnamurthy, Alex Lamb, @JohnCLangford , Dipendra Misra, Lekan Molu, @criticalneuro , Rob Schapire, Cyril Zhang.

4

49

222

0

1

24

Aldo Pacchiano

@aldopacchiano

1 year

[1/n] When faced with multiple hyperparameter choices for the same algorithmic template or even different algorithms, determining the best option to maximize a reward function becomes crucial. This scenario is common in reinforcement learning tasks,

Data-Driven Online Model Selection With Regret Guarantees

We consider model selection for sequential decision making in stochastic environments with bandit feedback, where a meta-learner has at its disposal a pool of base learners, and decides on the fly...

arxiv.org

1

2

25

Aldo Pacchiano

@aldopacchiano

1 year

If you want to know more about Online Model Selection. I will be giving a talk at the UCL Statistical Science seminar next week :)

UCL Statistical Science

@stats_UCL

1 year

Next week's seminar will take place next Thurs 2nd March 14:00-15:00. The speaker will be Aldo Pacchiano (Microsoft Research, NYC). @aldopacchiano ONLINE ONLY Link to join online: contact Dr. Emma Simpson (emma.simpson @ucl .ac.uk)

0

9

0

1

23

Aldo Pacchiano

@aldopacchiano

2 years

Presenting two posters today at 11 am @NeurIPS ! 1) Best of Both Worlds Model Selection. 2) Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity. Come say hi!

1

23

Aldo Pacchiano

@aldopacchiano

6 months

I’ll be giving a talk today at 10:30 about the dissimilarity dimension and its relation to optimistic algorithms @ science center @ Harvard. If anyone is in the area feel free to pass by!

0

23

Aldo Pacchiano

@aldopacchiano

1 year

This is Tomorrow :)

UCL Statistical Science

@stats_UCL

1 year

Next week's seminar will take place next Thurs 2nd March 14:00-15:00. The speaker will be Aldo Pacchiano (Microsoft Research, NYC). @aldopacchiano ONLINE ONLY Link to join online: contact Dr. Emma Simpson (emma.simpson @ucl .ac.uk)

0

9

0

1

20

Aldo Pacchiano

@aldopacchiano

8 months

We are presenting this work today #NeurIPS !

Ofir Nachum

@ofirnachum

1 year

Say you have a bunch of logged data of agents (eg PPO) learning various RL tasks. How should you distill this data into a single agent that can quickly learn new tasks? Simple autoregressive modeling would give you a learner no better than the agents it trained from....

4

51

294

0

1

21

Aldo Pacchiano

@aldopacchiano

3 years

In this Neurips 2021 paper () we study a class of classification problems exemplified by the bank loan problem, where a lender decides whether or not to issue a loan. The lender only observes whether a customer will repay a loan if the loan is issued. (1/n)

3

21

Aldo Pacchiano

@aldopacchiano

2 years

(1/2) Our work on "Neural Design for Genetic Perturbation Experiments" was accepted to ICLR 2023 as a spotlight presentation. Here we introduce several methods for exploration using optimistic diverse predictions with Neural Networks.

3

2

19

Aldo Pacchiano

@aldopacchiano

2 years

Our Multi-Player Multi-Armed bandit paper will be presented at ALT 2023. We introduce an algorithm for the no-sensing setting that achieves logarithmic rates when the collision reward may be non-zero. w P. Bartlett and M. Jordan.

An Instance-Dependent Analysis for the Cooperative Multi-Player...

We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We propose the first algorithm that achieves logarithmic regret for this problem when the collision...

arxiv.org

0

2

19

Aldo Pacchiano

@aldopacchiano

3 years

(1/3 ) Happy to finally post this paper on arxiv! In this work we propose an algorithm with logarithmic instance dependent regret guarantees for the Multi-Player Multi-Armed bandit problem.

3

19

Aldo Pacchiano

@aldopacchiano

4 years

Selecting for the right model class in reinforcement learning and bandits is important to find a good policy. We provide a new algorithmic approach for model selection based on the principle of regret balancing that guarantees adaptation to the best model:

1

18

Aldo Pacchiano

@aldopacchiano

2 years

(1/3) New preprint! In this work we introduce the formal study of the FineTune RL paradigm, where the learner has access to an offline dataset and is also allowed online deployments in order to find an almost optimal policy. Joint with Andrew Wagenmaker

Leveraging Offline Data in Online Reinforcement Learning

Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must...

arxiv.org

1

0

16

Aldo Pacchiano

@aldopacchiano

1 year

Our work will be presented at ICML 2023. See you in Hawaii! Joint with Andrew Wagenmaker.

Aldo Pacchiano

@aldopacchiano

2 years

(1/3) New preprint! In this work we introduce the formal study of the FineTune RL paradigm, where the learner has access to an offline dataset and is also allowed online deployments in order to find an almost optimal policy. Joint with Andrew Wagenmaker

1

0

16

1

0

17

Aldo Pacchiano

@aldopacchiano

1 year

[1/5] Thanks to everyone that visited our poster on Thursday! *Joint work with Andrew Wagenmaker. Today I'll be presenting a couple of workshop papers. A list below:

Leveraging Offline Data in Online Reinforcement Learning

Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must...

arxiv.org

1

0

17

Aldo Pacchiano

@aldopacchiano

1 year

Nice new work!

A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently. Existing...

arxiv.org

0

1

15

Aldo Pacchiano

@aldopacchiano

7 months

(1/4) Finally finished writing our journal-length paper "Contextual Bandits with Stage-wise Constraints". In this work we study the anytime constraint satisfaction scenario in contextual bandits with a reward and a cost function.

Contextual Bandits with Stage-wise Constraints

We study contextual bandits in the presence of a stage-wise constraint (a constraint at each round), when the constraint must be satisfied both with high probability and in expectation. Obviously...

arxiv.org

1

0

15

Aldo Pacchiano

@aldopacchiano

3 years

(1/4) Joint new work with some CMU folks called “Sample Efficient Reinforcement Learning in Continuous State Spaces: A Perspective Beyond Linearity”:

Sample Efficient Reinforcement Learning In Continuous State...

Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces. By contrast, the majority of theoretical RL literature...

arxiv.org

2

3

14

Aldo Pacchiano

@aldopacchiano

10 months

Our next speaker will be Stephen McAleer ( @McaleerStephen ) who will be talking about general virtual agents via LLMs!

BU Computing & Data Sciences

@BU_CDS

10 months

"Big thanks to everyone who joined our Machine Learning Symposium's inaugural talk! Special kudos to @BUQuestrom Prof. Jinglong Zhao for delivering an outstanding session on Adaptive Neyman allocation," CDS @aldopacchiano . Check out the fall'23 ML lineup:

0

2

9

0

2

12

Aldo Pacchiano

@aldopacchiano

2 years

MSR NYC is hiring postdocs to start next Summer. It is a great opportunity and a fabulous lab! :)

Miro Dudik

@MiroDudik

2 years

PhD candidates in ML/AI: @MSFTResearch NYC is hiring several postdocs in general ML, especially in theoretical ML, interactive ML (including RL and active learning), and NLP (including applications of RL). Deadline: 𝐃𝐞𝐜𝐞𝐦𝐛𝐞𝐫 𝟗

5

71

214

0

3

14

Aldo Pacchiano

@aldopacchiano

2 years

Happy to finally post my new model selection work with @chrodan and Claudio Gentile . In this work we ask whether it is possible to achieve best of both worlds and model selection rates simultaneously. (1/4)

Best of Both Worlds Model Selection

We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds")...

arxiv.org

1

2

13

Aldo Pacchiano

@aldopacchiano

2 years

Presenting a posters today at 11 am @NeurIPS ! "Learning General World Models in a Handful of Reward-Free Deployments" Come say hi!

0

13

Aldo Pacchiano

@aldopacchiano

8 months

Manifesting emergent behavior with @BrandoHablando 🇲🇽 #NeurIPS2023

0

1

13

Aldo Pacchiano

@aldopacchiano

1 year

@Dilip_Arumugam @natolambert @JohnCLangford @nanjiang_cs @CsabaSzepesvari @neu_rips Recent related works that come to mind: 1) @WenSun1 , @ayush_sekhari 2) Interaction-Grounded-Learning. For example, @AkankshaSaran , @tengyangx , @PaulMineiro 3) @AadirupaSaha 's work. For example,

Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

We study the $K$-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two d...

proceedings.mlr.press

1

0

12

Aldo Pacchiano

@aldopacchiano

2 years

@ilyasut I am not sure this is the necessary explanation. It is similar to saying the success we've had building planes that fly -since they use wings- explains how birds fly. The explanation you mention may hold but it isn't knowlable without bio research.

1

0

11

Aldo Pacchiano

@aldopacchiano

2 years

Excited to present our paper “Towards an Understanding of Default Policies in Multitask Policy Optimization” at AISTATS 2022. Feel free to pass by our poster tomorrow!

Ted Moskovitz

@ted_moskovitz

2 years

Excited to say that our #AISTATS2022 paper “Towards an Understanding of Default Policies in Multitask Policy Optimization” was given an Honorable Mention for Best Paper! If you’re interested in hearing more (or are very bored), stop by our poster tomorrow at 4:30 BST 1/

2

8

34

0

12

Aldo Pacchiano

@aldopacchiano

4 years

@pcastr @iclr_conf @MarlosCMachado @marcgbellemare @agarwl_ Nice! In this ICML 2020 paper () we introduced Behavior Guided RL based on the concept of behavioral embeddings to guide policy optimization. It would be great to explore the connections between our two works. @kchorolab @jparkerholder @robinphysics

3

0

12

Aldo Pacchiano

@aldopacchiano

2 years

New preprint with fantastic co-authors Jonathan Lee , Weihao Kong , Vidya Muthukumar and Emma Brunskill ( @EmmaBrunskill ) on sub-linear estimation of optimal policy values in Contextual Linear Bandits.

Estimating Optimal Policy Value in General Linear Contextual Bandits

In many bandit problems, the maximal reward achievable by a policy is often unknown in advance. We consider the problem of estimating the optimal policy value in the sublinear data regime before...

arxiv.org

0

1

11

Aldo Pacchiano

@aldopacchiano

8 months

We are presenting our work "A Unified Model and Dimension for Interactive Estimation" today #Neurips2023 . Here we introduce the dissimilarity dimension. Among other things sharper than the eluder dimension in the analysis of optimistic algorithms. Poster 2008 at 10:45 am.

Aldo Pacchiano

@aldopacchiano

1 year

[1/n] Our pre-print is finally up! We introduce the dissimilarity dimension that among other things can be used to derive sharper bounds than with the eluder dimension for optimistic least squares algorithms with function approximation.

1

15

31

0

11

Aldo Pacchiano

@aldopacchiano

9 months

Great!!

RL_Conference

@RL_Conference

9 months

Thrilled to announce the first annual Reinforcement Learning Conference @RL_Conference , which will be held at UMass Amherst August 9-12! RLC is the first strongly peer-reviewed RL venue with proceedings, and our call for papers is now available: .

4

95

239

0

11

Aldo Pacchiano

@aldopacchiano

1 year

New short story:

0

10

Aldo Pacchiano

@aldopacchiano

1 year

[4/5] 3) Undo Maps: A Tool for Adapting Policies to Perceptual Distortions. New Frontiers in Learning, Control, and Dynamical Systems - Joint work with Abhi Gupta, Ted Moskovitz @ted_moskovitz and David Alvarez-Melis. Link:

Transfer RL via the Undo Maps Formalism

Transferring knowledge across domains is one of the most fundamental problems in machine learning, but doing so effectively in the context of reinforcement learning remains largely an open...

arxiv.org

1

3

9

Aldo Pacchiano

@aldopacchiano

2 years

@ylecun @nanjiang_cs Our paper here addresses some of these :) ... thought you may find it interesting .. .w @YingchenX @philipjohnball @_oleh S.Roberts @_rockt @egrefen

Learning General World Models in a Handful of Reward-Free Deployments

Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization,...

arxiv.org

0

10

Aldo Pacchiano

@aldopacchiano

4 years

Great work by @ted_moskovitz , @MichaelArbel , @fhuszar , @ArthurGretton extending Behavior Guided RL (BGRL) with the use of Wasserstein Natural gradients! Soon to be presented at @iclr_conf .

Ted Moskovitz

@ted_moskovitz

4 years

happy to say our paper was accepted @iclr_conf ! we hope anyone interested in RL or optimization will find it interesting. we’ve released our implementation of WNPG (), and should have WNES out soon as well!

3

9

30

0

2

9

Aldo Pacchiano

@aldopacchiano

1 year

¡Felicidades a los participantes de México por quedar en lugar 14avo por países en la Olimpiada Internacional de Matemáticas! 🇲🇽

3

1

10

Aldo Pacchiano

@aldopacchiano

9 months

(2/2) Here is the link to the CDS PhD program application [] due Dec 15.

0

10

Aldo Pacchiano

@aldopacchiano

2 years

This is really good

Aviv Tamar

@AvivTamar1

2 years

Enough games. The RL field needs to mature. New blog post with @shiemannor

16

72

391

1

10

Aldo Pacchiano

@aldopacchiano

2 years

@svlevine @neu_rips @JesseFarebro Hi all … this is our just theory paper that perhaps is a tiny bit more like your approach than hybrid. Glad to see more attention given to fine tune RL.

Leveraging Offline Data in Online Reinforcement Learning

Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must...

arxiv.org

1

9

Aldo Pacchiano

@aldopacchiano

11 months

[2/3] 2. "A Unified Model and Dimension for Interactive Estimation" - We introduce the dissimilarity dimension, sharper than Eluder for OLS. 3. "Experiment Planning with Function Approximation" - fnc approx algorithms for experiment planning and lower bounds.

1

0

9

Aldo Pacchiano

@aldopacchiano

10 months

This is today! Stephen @McaleerStephen will give a talk at BU CDS!

BU Computing & Data Sciences

@BU_CDS

10 months

Happening Today: "Toward General Virtual Agents" talk with Stephen McAleer, @CarnegieMellon . Learn more about the topic and the Machine Learning Symposium: @aldopacchiano

0

1

3

0

1

8

Aldo Pacchiano

@aldopacchiano

11 months

[3/3] 4. "Anytime Model Selection in Linear Bandits" - Scenario where it is possible to obtain a log number of models dependence for model selection. More detailed threads to follow!

1

0

8

Aldo Pacchiano

@aldopacchiano

9 months

(1/2) En 2024 voy a empezar como profesor asistente en la facultad de Computing and Data Sciences (CDS) en Boston University. Busco estudiantes de doctorado con intereses en sequential decision making, reinforcement learning, y algorithmic fairness.

1

0

7

Aldo Pacchiano

@aldopacchiano

8 months

@McaleerStephen About 4. The problems that we have been trying to model and solve via RL aren’t going away. We should think whether RL is the right model for all of them and if not come up with different ones. Understanding RL should not be the research objective, but instead solve problems.

0

7

Aldo Pacchiano

@aldopacchiano

2 years

@thanhnguyentang Depends on the nature of the work itself. If it is very technical it probably will get a better audience/reviews at COLT/ALT. If it is about fleshing out a 'simple' idea and connecting it to practical problems it is probably more suitable for Neurips/ICML.

1

0

7

Aldo Pacchiano

@aldopacchiano

2 years

(1/3) Parallel deployment in adaptive environments requires algorithms that not only reduce uncertainty and exploit high reward regions (when a reward signal is present) but also produce diverse exploration policies.

Yingchen Xu

@YingchenX

2 years

Interested in learning general world models at scale? 🌍 Check out our new #NeurIPS2022 paper to find out! Paper: Website: [1/N]

3

42

161

2

1

7

Aldo Pacchiano

@aldopacchiano

2 years

Good advice!

Jakob Foerster

@j_foerst

2 years

I drafted a quick "How to" guide for writing ML papers. I hope this will be useful (if a little late!) for #NeurIPS2022 . Happy paper writing and best of luck!!

24

274

1K

0

6

Aldo Pacchiano

@aldopacchiano

1 year

:)

Ilija Bogunovic

@ilijabogunovic

1 year

Take a look at our ReAlML website to access an exciting collection of recorded talks on real-world experiment design, active learning, and RL! Visit to start exploring! @mutny_ml @willieneis

1

6

27

0

6

Aldo Pacchiano

@aldopacchiano

2 years

@zhengyaojiang there are some works that are trying to approximate provable forms of exploration in NN domains. It is usually hard to port those ideas. See for example:

2

0

5

Aldo Pacchiano

@aldopacchiano

1 year

@shortstein Depends on how the impossibility result works. If it is of the form “there is a pathological instance where this is impossible” then certainly it is possible that for typical instances things aren’t that hard. In some of these cases I could see an argument for experiments.

0

6

Aldo Pacchiano

@aldopacchiano

11 months

Submissions are still open for ISAIM 2024. The conference will take place in Florida, January 2024. There will be a deep RL special session organized by @abhishekunique7 and @zhaoran_wang !

Dimitris Diochnos

@DDiochnos

11 months

I have advertised the International Symposium on Artificial Intelligence and Mathematics ( #ISAIM ) 2024 through mailing lists and appropriate Google Groups but it is probably best if there is a post here as well. Brief description below. 👇 🧵

1

3

6

0

1

4

Aldo Pacchiano

@aldopacchiano

2 years

nice!

Feryal

@FeryalMP

2 years

I’m super excited to share our work on AdA: An Adaptive Agent capable of hypothesis-driven exploration which solves challenging unseen tasks with just a handful of experience, at a similar timescale to humans. See the thread for more details 👇 [1/N]

25

266

1K

0

5

Aldo Pacchiano

@aldopacchiano

4 years

Imposing fairness constraints on the output of a model is a way to correct for fairness imbalances. In this work - - we build on the framework of Wasserstein Fair classification that permits distributional constraints on the shape of the model predictions.

1

0

5

Aldo Pacchiano

@aldopacchiano

9 months

(1/5) In experiment planning a learner uses a set of unlabeled contexts to build a sequence of policies used to collect reward signals during deployment. An experiment planner cannot react adaptively to the rewards received during data collection.

1

0

5

Aldo Pacchiano

@aldopacchiano

2 years

This summer, Chicago, TTIC

BeyondRL TTIC

@BeyondrlT

2 years

** Workshop, TTIC, July 13-15th: Online decision-making and real-world applications ** -) Why is it challenging to deploy online decision-making alg. in real-world problems?🤨 -) Which models describe these challenges?🤔 -) What is the path towards making RL be practical?😲

3

4

26

0

5

Aldo Pacchiano

@aldopacchiano

8 months

Manifestando comportamiento emergente con @BrandoHablando 🇲🇽 #NeurIPS2023

0

5

Aldo Pacchiano

@aldopacchiano

1 year

This is really good!

Cong Lu

@cong_ml

1 year

RL agents🤖need a lot of data, which they usually need to gather themselves. But does that data need to be real? Enter *Synthetic Experience Replay*, leveraging recent advances in #GenerativeAI in order to vastly upsample⬆️ an agent’s training data! [1/N]

5

37

184

1

0

5

Aldo Pacchiano

@aldopacchiano

8 months

@chris_hitzel @ElanRosenfeld Another factor I believe is that for many international students (at least in the US) it is really hard to take a riskier route. Failure may mean you have to leave the country. There isn’t a “let’s just do great research and if it fails I’ll do something else”.

1

0

5

Aldo Pacchiano

@aldopacchiano

4 years

@agarwl_ Nice! In this ICML 2020 paper () we introduced Behavior Guided RL based on the concept of behavioral embeddings to guide policy optimization. It would be great to explore the connections between our two works. @kchorolab @jparkerholder @robinphysics

0

1

5

Aldo Pacchiano

@aldopacchiano

8 months

@KylerCora amazing image

0

3

Aldo Pacchiano

@aldopacchiano

3 years

@adjiboussodieng The issue is that PhD admissions in the US doesn't value just coursework. They now require you to have a lot of research experience even before starting.

0

5

Aldo Pacchiano

@aldopacchiano

4 years

Our new work on Model Selection for RL and bandits with some amazing collaborators!

Emma Brunskill

@EmmaBrunskill

4 years

Using the right function class in RL is important for learning a high-value policy but learning speed/regret typically worsens as the class complexity grows. We give a new RL alg that takes a set of models & has regret that adapts to the best model size

1

7

88

0

5

Aldo Pacchiano

@aldopacchiano

4 years

Have you ever wondered how to find diverse solutions in non-convex landscapes in RL and SL? Here we introduce Ridge Rider, a method that relies on curvature information to ride through the optimization landscape and find multiple qualitatively distinct optima.

Jakob Foerster

@j_foerst

4 years

The gradient is a locally greedy direction. Where do you get if you follow the eigenvectors of the Hessian instead? Our new paper, “Ridge Rider” (), explores how to do this and what happens in a variety of (toy) problems (if you dare to do so),.. Thread 1/N

4

71

585

0

4

Aldo Pacchiano

@aldopacchiano

1 year

Great opportunity! Richard is great

Richard Song

@XingyouSong

1 year

My team is looking to hire a student researcher (20% capacity, 3 months) to see how far we can take LMs to perform optimization/AutoML. If you're already in team matching, DM me or email xingyousong @google .com if interested! Link:

1

15

85

0

4

Aldo Pacchiano

@aldopacchiano

11 months

Hace unos ayeres me tocó estar en las discusiones que llevaron a la organización de la primer RIIAA. Me da gusto ver que el evento no solo se ha mantenido sino que ha crecido. ¡La lista de ponentes se ve espectacular!

Pablo Samuel Castro

@pcastr

11 months

📢¡latin americans in AI!📢 pre-register before october 13th to attend a conference/summer-school in quito next february with some incredible speakers (pictured below). don't think you can afford it? we have travel grants, so apply now!

0

22

53

0

4

Aldo Pacchiano

@aldopacchiano

1 year

Cool work!

Akanksha Saran

@AkankshaSaran

1 year

In our recent paper accepted at #ICLR2023 , we propose IGL-P, a personalized reward learning algorithm for the Interaction-Grounded Learning (IGL) paradigm. Our approach is well-suited to alleviate hand-defined reward engineering for recommender systems.

1

7

63

0

4

Aldo Pacchiano

@aldopacchiano

2 years

Que buena final. Felicidades Argentina y felicidades Messi.

0

1

4

Aldo Pacchiano

@aldopacchiano

2 years

@david_rolnick Variations around criminality would also be interesting.

0

4

Aldo Pacchiano

@aldopacchiano

8 months

Amazing!

Yonathan Efroni

@EfroniYonathan

8 months

🤖Call for RL internship🤖 The Applied Reinforcement Learning team at Meta is hiring research intern. If you're curious about exploring different aspects of RL and its applications in large-scale systems, please apply here:

3

14

118

0

4

Aldo Pacchiano

@aldopacchiano

3 years

@CsabaSzepesvari @peter_richtarik What about having a system where reviewers are reviewed? It may be good to have either a public reviewer score or if a reviewer is judged to have made a very bad job, prohibit this person from submitting to next year's conference.

0

4

Aldo Pacchiano

@aldopacchiano

2 years

@zhengyaojiang there has been recent cool work by some deep mind folks in this direction and @misovalko

2

0

4

Aldo Pacchiano

@aldopacchiano

1 year

[8/n] In summary: the eluder dimension does not appropriately capture the informativeness of the set of candidate optima, while the dissimilarity dimension does. Joint work with Nataly Brukhim, Miroslav Dudík and Robert Schapire.

0

4

Aldo Pacchiano

@aldopacchiano

2 years

Ahi vayan a escuchar a Andrés. Es muy buen muchacho.

Captain Pleasure, Andrés Gómez Emilsson

@algekalipso

2 years

Estoy en la Ciudad de México. Mañana voy a dar una plática presencial en el CCH SUR :D

8

4

39

1

4

Aldo Pacchiano

@aldopacchiano

4 years

Our work on using determinants to encourage diversity in reinforcement learning :)

Jack Parker-Holder

@jparkerholder

4 years

My attempt to explain why you should use determinants to measure population diversity (TL;DR: it ensures all agents are distinct):

1

0

6

1

0

4

Aldo Pacchiano

@aldopacchiano

4 years

;)

Yannic Kilcher 🇸🇨

@ykilcher

4 years

I'm just so amazed at how people continue to come up with new variants of research about bandits.

0

1

28

0

4

Aldo Pacchiano

@aldopacchiano

3 years

New paper with @niladrichat , Peter Bartlett and Michael Jordan called "On the Theory of Reinforcement Learning with Once-per-Episode Feedback": . It was very interesting for us to think about non-Markovian reward models for reinforcement learning!!

Niladri Chatterji

@niladrichat

3 years

New paper with @aldopacchiano , Peter Bartlett and Michael Jordan called "On the Theory of Reinforcement Learning with Once-per-Episode Feedback": . It was very interesting for us to think about non-Markovian reward models for reinforcement learning (1/2.)

1

2

30

0

4

Aldo Pacchiano

@aldopacchiano

2 years

A mighty workshop indeed!

Jack Parker-Holder

@jparkerholder

2 years

Looking forward to seeing all the creative ideas submitted to this workshop! Submit by September 22nd 😀

0

1

20

0

4

Aldo Pacchiano

@aldopacchiano

2 years

Good stuff!

Souradip Chakraborty

@SOURADIPCHAKR18

2 years

Reward Shaping is a common practice for Sparse RL, but lacks theoretical guarantees (most) & needs expertise. At @corl_conf #CoRL2022 , we present HTRON: Heavy-Tailed Adaptive Reinforce Algorithm for Sparse Navigation Robotics tasks @kaweer_ @amritsinghbedi3 @robobzbz @dmanocha

1

2

10

1

0

4

Aldo Pacchiano

@aldopacchiano

2 years

@hardmaru @StableDiffusion I am not so certain. The connection between the prompt and the image is so tenuously related to the inner world of the artist that I am not sure it can be called art. It all feels like saying Julius II was a great artist because he commissioned the Sistine Chapel.

2

0

3

Aldo Pacchiano

@aldopacchiano

1 year

This is the best spam email I have ever received 🤣🤣

2

0

3

Aldo Pacchiano

@aldopacchiano

2 years

(2/2) Finally we test our algorithms in simple simulation environments. Joint work with Abhi Gupta, @ted_moskovitz and @elmelis .

0

3

Aldo Pacchiano

@aldopacchiano

1 year

[3/n] To understand why the eluder dimension does not fully capture the behavior of optimistic algorithms consider a function class where the action space equals the interval [0,1]. All the functions in this class have optima at x = 1/4 and x' = 3/4.

1

0

3

Aldo Pacchiano

@aldopacchiano

1 year

[2/n] We do this by studying a framework called interactive estimation where the goal is to estimate a target from its “similarity” to points queried by the learner. Our framework unifies two learning models: statistical-query learning and structured bandits.

1

0

3

Aldo Pacchiano

@aldopacchiano

8 months

This will be amazing!

Abhishek Gupta

@abhishekunique7

8 months

In other news, we are organizing a special session at ISAIM 2024 () on Deep RL: Bridging Theory and Practice! on Jan 8th with @zhaoran_wang , and great speakers! Your submissions are welcome! Please see the website for call details

0

3

6

0

1

Aldo Pacchiano

@aldopacchiano

2 years

very interesting indeed!

Zhengyao Jiang

@zhengyaojiang

2 years

It's interesting that current RL methods learn exploitation from data but the exploration is still mostly based on hard-coded rules (noise around optimal policy/maximizing state entropy etc.), even though efficient exploration is more difficult.

4

7

63

0

3

Aldo Pacchiano

@aldopacchiano

3 years

@OmarRivasplata The mighty Omarian bound!

1

0

3