Arthur Douillard @Ar_Douillard Twitter profile | Pikagi

Pikagi

Arthur Douillard

@Ar_Douillard

3,382

Followers

1,872

Following

393

Media

3,181

Statuses

Modular & Distributed Learning for LLMs @ DeepMind, Continual Learning PhD @ Sorbonne

London, England

https://t.co/TS3sVptZ1x

Joined January 2016

Don't wanna be here? Send us removal request.

Pinned Tweet

@Ar_Douillard

Arthur Douillard

4 months

I'm super excited to release DiPaCo, a new kind of mixture of experts, that can scale engineering-wise to data centers across the entire world! A few words about it in this thread 🧵

@_akhaliq

AK

4 months

Google presents DiPaCo Distributed Path Composition Progress in machine learning (ML) has been fueled by scaling neural network models. This scaling has been enabled by ever more heroic feats of engineering, necessary for accommodating ML approaches that require high

Tweet media one

2

26

161

9

37

187

Last Seen Profiles

@Big_Chief_Will

@stw_pdg

@luminblaz

@eumaximamax

@XKL64YCeb45F4

@un_qlf

@l_caddig

@jamesgmcgregor

@abumnier2002

@Jyy6LyQfC589149

@otsuki2ki

@Go_Jover

@Jack52900802

@jandakembangstw

@AxRiff

@AoTWiki

@TheGloryOfPi

@kenneth86650761

@papi9676

@matiasiandiehl

@pabcanavesio

@igarashiAWP

@Anthony_IV

@monslv5

@avyLaMMFf41644

@300global68460

@Webcamscalient1

@dinaprizzty1

@2tIkx

@K2vgi1QV0287780

@davidnahimana22

@0pYbm

@CdPlatformu07

@latenighttaIk

@sonam4721

@ClioTei

@Ar_Douillard

Arthur Douillard

3 years

I've released my course on deep learning for computer vision! It includes slides, google colab, and Anki decks for all 6 topics I'm covering. We code from the basics (backprop from scratch) to the SotA (transformers & MLP-mixer). Feedback appreciated!

Tweet media one

14

166

684

@Ar_Douillard

Arthur Douillard

3 years

Vision transformers are more biased towards shapes (as humans are) than Convolutional Networks:

Tweet media one

8

142

666

@Ar_Douillard

Arthur Douillard

2 years

I am excited to share that, after my PhD 👨‍🎓, I will join @DeepMind this summer as a Research Scientist in the Continual Learning team led by Marc'Aurelio Ranzato! 🎉

28

7

473

@Ar_Douillard

Arthur Douillard

8 months

🚨 We released our work on data parallelism for language models *distributed* across the entire world! 🧵Thread below 👇

Tweet card media

DiLoCo: Distributed Low-Communication Training of Language Models

Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected...

@_akhaliq

AK

8 months

DiLoCo: Distributed Low-Communication Training of Language Models paper page: Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of

Tweet media one

11

54

316

17

69

380

@Ar_Douillard

Arthur Douillard

3 years

Github + VSCode on your browser = 🤯 Just add "1s" before the ".com", and tada! Here is an example with our Continual Learning library Continuum:

Tweet media one

9

85

382

@Ar_Douillard

Arthur Douillard

2 years

Main topic in NeurIPS parties is GPT-4. The rumors are wild

11

19

273

@Ar_Douillard

Arthur Douillard

1 year

🚨 My team at @DeepMind is looking for a Research Engineer in Efficient Large-Scale Learning! 👉 ❓ Unprecedented scale + efficiently adaptation to new tasks 📚 Distributed large-scale learning and continual learning!

5

38

254

@Ar_Douillard

Arthur Douillard

2 years

Google released a few days ago Minerva (), a language model (PaLM) that solves highschool math problems. Funny part: the prompt includes "I hope it is correct"!

Tweet media one

10

34

251

@Ar_Douillard

Arthur Douillard

3 months

Something is cooking 🕵️‍♂️

Tweet media one

9

8

220

@Ar_Douillard

Arthur Douillard

2 years

From being a computer lover, to being a Doctor in computer science

Tweet media one

Tweet media two

9

2

162

@Ar_Douillard

Arthur Douillard

3 years

Ok, I've learn today that there is a 'inference_mode' context manager that does the 'no_grad' job, but with added speed. Seen from the Grokking Pytorch:

Tweet media one

6

26

151

@Ar_Douillard

Arthur Douillard

3 years

🎄It's christmas' time, so we recently added plenty of datasets for continual learning! +50 datasets for classification & segmentation +7 different continual scenarios 👉 And surprise, we now support HuggingFace's NLP datasest! 👇🧵

Tweet media one

1

21

147

@Ar_Douillard

Arthur Douillard

10 days

No code too

@LeopolisDream

Alex Yanko 🇺🇦

12 days

Welcome the new architecture: Terminator No residuals, no dot product attention, no normalization...

Tweet media one

16

135

815

5

3

130

@Ar_Douillard

Arthur Douillard

3 years

PixMix: merging images with fractals Leads to models more robust to corruption, adversary attacks, with better calibration, etc. than the other baselines (MixUp, CutMix, CutOut).

Tweet media one

2

25

102

@Ar_Douillard

Arthur Douillard

3 years

PoolFormer: replacing self-attention / spatial mlp / fourier transform by a simple average pooling. - Less operations (each pooling reducing number of tokens by 2x) - As good as other "meta-former" Are we going to reinvent convnets?

Tweet media one

1

27

99

@Ar_Douillard

Arthur Douillard

2 years

Tweet card media

A Closer Look at Large Language Models Emergent Abilities | Notion

Yao Fu, [email protected]

yaofu.notion.site

0

15

97

@Ar_Douillard

Arthur Douillard

1 month

The Future of Large Language Model Pre-training is Federated, amen to that!

Tweet card media

The Future of Large Language Model Pre-training is Federated

Generative pre-trained large language models (LLMs) have demonstrated impressive performance over a wide range of tasks, thanks to the unprecedented amount of data they have been trained on. As...

4

20

88

@Ar_Douillard

Arthur Douillard

3 years

I love when fixing a bug in my neural network degrades its performance...

1

8

82

@Ar_Douillard

Arthur Douillard

3 years

I've got my christmas present early: More than 11k unique visitors on my deep learning for computer vision course! 🤗 I'm so happy! 👉 2022's goal: recording a video for each lesson

Tweet media one

0

14

82

@Ar_Douillard

Arthur Douillard

3 years

I've submitted my first paper ever at CVPR2020 and got rejected, it was hard. But I'm happy to announce that my third paper, PLOP, has been accepted to #CVPR2021 ! Code will be released soon!

@Ar_Douillard

Arthur Douillard

4 years

New work from Y.Chen, A.Dapogny, @quobbe , and myself. We tackle Continual Semantic Segmentation by introducing a novel distillation loss exploiting local & global details, and an uncertainty-based pseudo-labeling handling background shift (We are PLOP)

Tweet media one

1

13

35

4

7

76

@Ar_Douillard

Arthur Douillard

1 month

My team is looking for a research engineer in New York! Our recent efforts include DiLoCo (distributed learning) and DiPaCo (distributed mixture of experts). Those projects that I've co-led, were the most exciting projects I've contributed, and i can tell you one thing: there

4

5

72

@Ar_Douillard

Arthur Douillard

2 years

The first transformer designed for Continual Learning in Computer Vision has been accepted to #CVPR2022 ! 🥳 Using a dynamic approach, it forgets less than previous ensembling methods while using fewer parameters. 💻: 📕: 🧵👇

Tweet media one

4

16

72

@Ar_Douillard

Arthur Douillard

6 years

@ncremins GDPR is coming.

0

26

69

@Ar_Douillard

Arthur Douillard

4 years

Tired of implementing the many data settings of Continual Learning? @TLesort & I present Continuum! A Pytorch library that enables you in a few lines to have a Continual dataset: MNIST, PermutedMNIST, CIFAR10/CIFAR100, ImageNet, CORe50, and many more!

Tweet media one

1

27

68

@Ar_Douillard

Arthur Douillard

2 years

Great! I'm finishing my PhD in June, and CVPR 2022 will be my only opportunities to have in-person conference of my whooole PhD

@CVPR

#CVPR2024

2 years

Message from our #CVPR2022 Program Chairs: Unless the epidemiological situation changes drastically, CVPR 2022 will be in person, with an online option for those who cannot travel. Information on visa letters will be sent to authors in the next few days.

0

22

227

3

1

66

@Ar_Douillard

Arthur Douillard

4 years

I'm proud to present you my first ever paper uploaded recently on arXiv: "Small Task Incremental Learning" We design a novel distillation loss that outperform previous SotA by a large margin, especially on 50 tasks of only 1 class!

Tweet media one

7

23

67

@Ar_Douillard

Arthur Douillard

1 year

My last paper as a phd student 🤩

@mlia_isir

MLIA

1 year

Pending #CVPR2023 in June, we are pleased to share our 4 accepted papers (3/4) "CoMFormer: Continual Learning in Semantic and Panoptic Segmentation" by @fcdl94 , @quobbe , @Ar_Douillard preprint: Collab w/ Politecnico di Torino and @heuritechlab

Tweet media one

1

2

20

0

2

67

@Ar_Douillard

Arthur Douillard

2 months

Something I didn't fully realize during my PhD but now see: the extended bitter lesson is that hyperparameters are sometimes more important than a new architecture. So many research papers proposing new archi/losses/optimizers would get crushed by a well-tuned baseline.

@andrew_n_carr

Andrew Carr (e/🤸)

2 months

The DeepSeek-V2 paper was full of pretty amazing nuggets of wisdom. I spent the afternoon copying lots of their training setup into our model. Orange is previous and Blue is new with DeepSeek hyper parameters. Things that mattered most: 1. Warm up LR ratio 2. Batch ramp

Tweet media one

14

54

585

3

5

65

@Ar_Douillard

Arthur Douillard

2 years

Transformers for Small-Scale Datasets: - Tokenization with overlap between patches - Add pooling to reduce nb of tokens - Mask with -∞ the diagonal attention logits to avoid tokens attending on themselves - add learned temperature - improve all archi

Tweet media one

Tweet media two

1

6

63

@Ar_Douillard

Arthur Douillard

4 months

I think people don’t realize the progress in AI that happened in the last 5 years. Being close to the level of a junior dev isn’t impressive anymore?

@gonza_nardini

Gonza Nardini

4 months

@itsandrewgao Sounds a bit disappointing honestly. The requests were a bit hard, but a good AI should be able to solve these, they aren't exactly rocket science. I think most jr devs would be able to solve them One request it couldn't even complete and the other just deployed a buggy solution

5

0

19

7

1

59

@Ar_Douillard

Arthur Douillard

3 months

TPUs are pretty great tbh. One of the best move Google ever made

@a__tomala

Alex Tomala

3 months

Just use TPUs

14

4

125

3

2

59

@Ar_Douillard

Arthur Douillard

5 years

@AndrewYNg Another reason, even more crucial, is that AI researchers open source a lot their methods

1

4

54

@Ar_Douillard

Arthur Douillard

6 years

Venn diagram of the various subfields of #AI Source: The #DeepLearning book of Goodfellow, Bengio, & Courville. #DataScience #MachineLearning #ArtificialIntelligence

Tweet media one

0

26

47

@Ar_Douillard

Arthur Douillard

3 years

I’ve finished my 24h course today with my students. The latest chapter about computer vision’s future has been updated: transformer, mlp-mixer, pool-former, sam, etc. And there are tutos to code them from scratch + anki cards

@Ar_Douillard

Arthur Douillard

3 years

I've released my course on deep learning for computer vision! It includes slides, google colab, and Anki decks for all 6 topics I'm covering. We code from the basics (backprop from scratch) to the SotA (transformers & MLP-mixer). Feedback appreciated!

Tweet media one

14

166

684

1

2

49

@Ar_Douillard

Arthur Douillard

2 years

After 1.5 years we finally got our paper on object rehearsal for continual semantic segmentation accepted at TPAMI!

Tweet media one

1

9

49

@Ar_Douillard

Arthur Douillard

2 years

I’ve done 2 conferences in person, both in New Orleans, both in 2022. So far NeurIPS is soooo much more interesting than CVPR.

4

2

46

@Ar_Douillard

Arthur Douillard

3 years

#CVPR is in the top-5, per citation, of all venues 🤯 It says a lot about the rapid growth of the field, I barely keep up at reading papers published at top conferences in my niche domain

Tweet media one

3

13

48

@Ar_Douillard

Arthur Douillard

2 years

Come see at our poster presentation of DyTox, Continual transformer, this afternoon ! Poster 131b. #CVPR2022

Tweet media one

0

3

46

@Ar_Douillard

Arthur Douillard

2 years

That’s where Continual Learning could really shine: 1. Keep this model 2. Add continually new tasks 3. … 4. AGI?

@GoogleDeepMind

Google DeepMind

@GoogleDeepMind

2 years

Gato🐈a scalable generalist agent that uses a single transformer with exactly the same weights to play Atari, follow text instructions, caption images, chat with people, control a real robot arm, and more: Paper: 1/

95

1K

5K

4

7

44

@Ar_Douillard

Arthur Douillard

2 years

1 paper accepted at #CVPR2022 on continual transformer :) With @ramealexandre , Guillaume Couairon, and @quobbe . More details & code in the coming weeks.

2

1

45

@Ar_Douillard

Arthur Douillard

3 years

Amazing tutorial on reinforcement learning by @DeepMind at the @EEMLcommunity :

Tweet media one

1

7

44

@Ar_Douillard

Arthur Douillard

2 years

A concern I have in Continual Learning models, is that it's often hard to understand from the pdf paper if: - use rehearsal or not, if yes how much? - # params does model use vs baselines? - task id at test-time? - # of tasks? - pretrained? It's getting hard to compare models

9

1

44

@Ar_Douillard

Arthur Douillard

2 years

Eh, I guess my idea from two months ago was right. -->

@arankomatsuzaki

Aran Komatsuzaki

@arankomatsuzaki

2 years

Corrupted Image Modeling for Self-Supervised Visual Pre-Training ELECTRA-version of BeiT/MAE with CNN/ViT performs competitively with SotA on vision self-supervised learning.

Tweet media one

2

42

186

0

5

44

@Ar_Douillard

Arthur Douillard

7 months

I’m quite impressed by the number of people on this platform making threads to explain what is OpenAI’s Q*. I guess their time working on the blockchain made them prescient about AI.

5

3

42

@Ar_Douillard

Arthur Douillard

4 months

It's a bit of a vanity metric, but i'm super proud to have reach the 1000 citations mark 😀

Tweet media one

2

0

42

@Ar_Douillard

Arthur Douillard

2 years

👨‍🎓 My PhD thesis on Continual Learning for Computer Vision is now online! 📚 👉 I cover continual learning across img classification w/ metric learning & growing transformers, segmentation w/ distillation & efficient replay, and even zero-shot learning.

Tweet media one

0

3

41

@Ar_Douillard

Arthur Douillard

4 years

We have been accepted at ECCV2020 ! Thanks to my awesome coauthors @quobbe @DrEAVJr @CharlesOllion @ThomasR_Fr @heuritechlab @mlia_lip6

@Ar_Douillard

Arthur Douillard

4 years

I'm proud to present you my first ever paper uploaded recently on arXiv: "Small Task Incremental Learning" We design a novel distillation loss that outperform previous SotA by a large margin, especially on 50 tasks of only 1 class!

Tweet media one

7

23

67

5

16

40

@Ar_Douillard

Arthur Douillard

3 years

I'm presenting tomorrow Continuum, a light-weight library to do continual learning! Come watch Friday 3 April, 5.30PM CEST :) 📌 Eventbrite event: 📌 Miscrosoft Teams:

Tweet media one

1

7

38

@Ar_Douillard

Arthur Douillard

2 months

DiLoCo is refused to ICML 😢 On the one hand, I'm annoyed at one of the reviewer asking proof of convergence for our distributed training scheme at LLM scale. On the other hand, the program chair has wrote very well balanced conclusion, thanks for that!

@Ar_Douillard

Arthur Douillard

8 months

🚨 We released our work on data parallelism for language models *distributed* across the entire world! 🧵Thread below 👇

17

69

380

5

2

39

@Ar_Douillard

Arthur Douillard

2 years

I’m defending next monday my PhD! It’ll be live streamed on youtube: 13 June, 2PM CEST, 8AM New York time

Tweet card media

Arthur Douillard's PhD Defense

Neural networks now have excellent performance on many databases for image classification. However, these networks are trained only once. If one wishes to le...

www.youtube.com

@mlia_isir

MLIA

2 years

📢Thesis defense Happy to announce Arthur Douillard's @Ar_Douillard thesis defense next week! It will take place on Monday, June 13th at 2 p.m. Title: "Continual Learning for Computer Vision" Supervisors: Matthieu Cord @quobbe & Thomas Robert @ThomasR_Fr

Tweet media one

1

4

14

4

1

38

@Ar_Douillard

Arthur Douillard

2 years

Cool blogpost about using the note-taking app @obsdmd for research by @leocastorina : -> My notes in Obsidian are way less structured 😓

Tweet card media

How to Boost Your Productivity for Scientific Research Using Obsidian

Tools and workflows for managing your zettelkasten, projects, reading lists, notes, and inspiration during your PhD.

betterhumans.pub

2

7

38

@Ar_Douillard

Arthur Douillard

3 years

It may sound vain, but I reached today the 20 citations bar (not a lot compared to most of my Twitter feed, but a lot for me), and it's make me very happy. I'm glad my work is deemed useful by others, it gives purpose to all my failed experiments I guess 🙃

1

0

38

@Ar_Douillard

Arthur Douillard

3 months

@deliprao Not an influencer: in my opinion is better despite being “less pretty“

Tweet card media

Your Personalized AI Assistant.

Conversational and continuously learning, You.com enhances web search, writing, coding, digital art creation, and solving complex problems.

3

0

35

@Ar_Douillard

Arthur Douillard

5 years

I'm glad to announce that I'll start this July a PhD in #DeepLearning for computer vision at @LIP6_lab / @ScienceSorbonne under the supervision of @quobbe ! And I'll still work part-time with the great french startup @heuritechdata A little boy's dream of AI becomes reality!

2

5

36

@Ar_Douillard

Arthur Douillard

4 years

New work from Y.Chen, A.Dapogny, @quobbe , and myself. We tackle Continual Semantic Segmentation by introducing a novel distillation loss exploiting local & global details, and an uncertainty-based pseudo-labeling handling background shift (We are PLOP)

Tweet media one

1

13

35

@Ar_Douillard

Arthur Douillard

3 years

My cat ε and I are honored to be featured in today's @CVPR 's daily. Despite being virtual, so far I'm enjoying a lot this conf', I've learned a lot! #CVPR2021

Tweet media one

0

6

35

@Ar_Douillard

Arthur Douillard

2 years

@carrigmat And that you used a few hundred TPUs on JFT300M

0

2

34

@Ar_Douillard

Arthur Douillard

4 months

Language Models require twice less compute every 8 months, better than Moore's Law.

Tweet media one

1

4

32

@Ar_Douillard

Arthur Douillard

4 months

Everything everywhere all at once. Our long-term goal is to train a network across the entire world, using all the compute. Thus, we need to re-visit existing architectures to limit the communication overhead, memory limit, and inference speed. Current methods aren't enough!

Tweet media one

2

5

30

@Ar_Douillard

Arthur Douillard

3 years

Continuum now supports the Continual Learning CTRL benchmark of @TomVeniat @LudovicDenoyer @MarcRanzato @facebookai ! * 5 predefined CTRL datasets * easy to custom your own CTRL Code: Colab: Paper:

Tweet media one

2

11

31

@Ar_Douillard

Arthur Douillard

2 years

Excellent literature review on the loss landscape of neural network by @dam_nlp : -> 1. Wide Basins and Generalization 2. Intrinsic Dimensionality 3. Mode Connectivity 4. SGD Training Dynamics

Tweet media one

1

5

30

@Ar_Douillard

Arthur Douillard

4 years

I just saw a code base with a hyperparameter of 0.968. 0.968 Who does gridsearch so fine-grained?

9

2

30

@Ar_Douillard

Arthur Douillard

2 years

🚨 I'm excited about the release of 🏝️ NEVIS'22, a benchmark where we collected 106 datasets from the last 30 years of CV research! 🤖 Can you design a model to efficiently learn them all using forward transfer? 📜 My first paper while at @DeepMind 🥰

@GoogleDeepMind

Google DeepMind

@GoogleDeepMind

2 years

Introducing NEVIS’22, a new benchmark developed using 30 years of computer vision research. This provides an opportunity to explore how AI models can continually build on their knowledge to learn future tasks more efficiently. ➡️

3

48

214

0

12

30

@Ar_Douillard

Arthur Douillard

2 years

I’m going to NeurIPS! Fellow continual learners, let’s have a chat 😃

2

0

30

@Ar_Douillard

Arthur Douillard

4 months

Oh this is really cool! They train an encoder-decoder transformer to predict the search dynamics of A*, which resulted in a search with less steps than just using the classical A* algorithm.

Tweet media one

3

3

30

@Ar_Douillard

Arthur Douillard

5 years

@heuritechlab & I've just published a technical introduction to Incremental Learning with #DeepLearning ! Being able to learn continuously is an important features of any intelligent system, see what are the current strategies. @ContinualAI

Tweet card media

Learning Deep Neural Networks Incrementally

Humans learn continuously through their life. How could Deep Neural Networks do the same?

1

12

30

@Ar_Douillard

Arthur Douillard

2 years

Poster 2.0 @ NeurIPS

Tweet media one

0

0

30

@Ar_Douillard

Arthur Douillard

2 years

I’m presenting today at the Continual Learning workshop of #CVPR2022 both Dytox (dynamic transformer) and Saporta et al.’s MuHDI (continual adaptation). Come chat with me, poster #207a and #208a

Tweet media one

Tweet media two

1

1

29

@Ar_Douillard

Arthur Douillard

5 years

I've just published in @TDataScience a small #DeepLearning article about a research paper I liked!

@TDataScience

Towards Data Science

5 years

How To Be Confident In Your Neural Network Confidence

Tweet media one

Tweet media two

0

10

21

0

7

28

@Ar_Douillard

Arthur Douillard

2 years

I'm been reading this weekend on @OReillyMedia the early draft version of the book from @huggingface on NLP. Super interesting, and I've learned tons about NLP (QA, BigBird...) and made Anki cards about that. I'm eager to see the final book version once it's published!

Tweet media one

Tweet media two

0

3

29

@Ar_Douillard

Arthur Douillard

2 months

I’m honored to see our work on distributed training (DiLoCo) and distributed mixture of experts (DiPaCo) highlighted during ICLR’s keynote by @RaiaHadsell !

Tweet media one

Tweet media two

2

1

28

@Ar_Douillard

Arthur Douillard

3 years

I've been learning Chinese for a year (at a very slow pace) out of boredom of the lockdown. And today it has been useful, I can now understand issues raised on my github repo in chinese, 我很高兴!

Tweet media one

3

0

28

@Ar_Douillard

Arthur Douillard

2 years

@heidiann360 Powerpoint. Look at @ThomasR_Fr ’s figures

GitHub - ThomasRobertFr/deep-learning-figures: Figures I made during my PhD in Deep Learning, for...

Figures I made during my PhD in Deep Learning, for my models and for context - ThomasRobertFr/deep-learning-figures

0

0

27

@Ar_Douillard

Arthur Douillard

2 years

It’s was super exciting (and tiring) to finally present my work on Continual Transformer at today #CVPR2022 session!

Tweet media one

1

3

28

@Ar_Douillard

Arthur Douillard

6 months

We release the async extension of DiLoCo shared in November, led by our amazing intern @cranialxix ! 👀 TL;DR: we do distributed data-parallelism of a language model across the world, synchronized every 10-100 of steps, AND using heterogenous devices 🧵 below

@_akhaliq

AK

6 months

Google Deepmind present Asynchronous Local-SGD Training for Language Modeling paper page: Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more

Tweet media one

2

29

160

3

7

28

@Ar_Douillard

Arthur Douillard

6 years

Inspired by #themorningpaper of @adriancolyer : A reading of "A Few Useful Things to Know about Machine Learning" by Prof Domingos: #MachineLearning

2

6

26

@Ar_Douillard

Arthur Douillard

2 years

I haven’t yet got to CVPR that I already meet many researchers at the airport! So cool!

1

0

27

@Ar_Douillard

Arthur Douillard

5 months

I’ll be talking about our recent DiLoCo: how to train your network distributed across the world!

@CohereForAI

Cohere For AI

5 months

Join our community-led Geo Regional Asia on Wednesday, February 21st as they welcome @Ar_Douillard , Sr. Researcher at @GoogleDeepMind to discuss "DiLoCo: Distributed Low-Communication Training of Language Models." Learn more:

Tweet media one

1

1

7

0

3

27

@Ar_Douillard

Arthur Douillard

3 years

ICCV 2023 will be in Paris 🇫🇷 !

@CSProfKGD

Kosta Derpanis

3 years

Start making your travel plans. Upcoming #ComputerVision conferences (subject to change)

Tweet media one

Tweet media two

5

7

75

1

0

25

@Ar_Douillard

Arthur Douillard

3 months

Given how hard it is to get a visa in the US, Europe should really become the next AI powerhouse

1

0

25

@Ar_Douillard

Arthur Douillard

3 years

Our paper "Insights from the Future for Continual Learning" has been published at the CLVISION Workshop of #CVPR2021 ! We exploit zeroshot to incorpore future concepts in the current embeddings and minimize interference Paper: Code:

Tweet media one

2

5

24

@Ar_Douillard

Arthur Douillard

6 years

I'm very happy to announce that I've won with @RemiMeunier , Antoine Naulet, and @dataiku 2 of the 3 prizes offered for the @NATO Innovation Challenge! We pitched a solution using Dataiku's DSS and #DeepLearning for satellite imagery! #NATOiChall #DataScience #DeepLearning

Tweet media one

0

8

22

@Ar_Douillard

Arthur Douillard

3 months

I have a solution for them:

Tweet card media

DiLoCo: Distributed Low-Communication Training of Language Models

Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected...

@corbtt

Kyle Corbitt

3 months

Spoke to a Microsoft engineer on the GPT-6 training cluster project. He kvetched about the pain they're having provisioning infiniband-class links between GPUs in different regions. Me: "why not just colocate the cluster in one region?" Him: "Oh yeah we tried that first. We

227

785

6K

1

3

23

@Ar_Douillard

Arthur Douillard

2 years

I'll be giving a talk about our recent CVPR 2022 on Continual Transformer on Thursday 7th at 5:30 PM CET. Come join me to hear more about it! Thanks @v_lomonaco @jamessealesmith @ContinualAI for the invite :) Stream link:

1

6

22

@Ar_Douillard

Arthur Douillard

3 months

Cool & hard benchmark: OSWorld. Where you have to fill tasks on ubuntu that requires multiple steps planning, and potentially search over internet to solve them.

3

3

22

@Ar_Douillard

Arthur Douillard

2 months

I really like the device-level balancing loss: LLMs are now more about engineering than some abstract architecture. Balancing across experts make sense w.r.t to the ML performance, but at that scale the communication bottleneck is critical too

Tweet media one

0

3

22

@Ar_Douillard

Arthur Douillard

6 years

I'm glad to have been selected to compete for the #NATOiChall with @dataiku , some serious #DataScience innovation is been prepared to impress @NATO !

Tweet media one

0

3

20

@Ar_Douillard

Arthur Douillard

3 years

I'm presenting our work on using prior from the future in Continual Learning to make networks more selfless at this afternoon @CVPR @ContinualAI Workshop. "Insights from the Future for Continual Learning" Paper: #CVPR2021

Tweet media one

1

11

22

@Ar_Douillard

Arthur Douillard

1 year

I love Continual Learning, but what excites me the most about this conference is the pre-registration process. So many times in Deep Learning, "hypothesis" arrive after the experiments. What if we do it the proper way instead? More details soon 👀

@ContinualAI

ContinualAI

1 year

🚨Are you ready!?🚨 Today we're announcing the Continual AI Unconference (CLAI Unconf) to be held in October 2023! CLAI Unconf is virtual & multi-timezone, covering diverse CL topics with pre-registered papers & contributed talks! ➡️ Please share!

Tweet media one

3

31

67

0

2

22

@Ar_Douillard

Arthur Douillard

2 years

With GELU being the cool kid now, ReLU had to find a new job

Tweet media one

1

0

21

@Ar_Douillard

Arthur Douillard

6 years

Facebook just released DensePose, a #DeepLearning model and an associated dataset to map a 3d representation of human on rgb images!

0

13

18

@Ar_Douillard

Arthur Douillard

3 years

I've released the code of our #CVPR2021 's PLOP on Continual Segmentation! Code: Camera-ready pdf:

Tweet media one

2

5

20

@Ar_Douillard

Arthur Douillard

1 month

@_xjdr There is a simulation of DiLoCo here . Otherwise folks at @PrimeIntellect are working on an OOS version :)

Tweet card media

GitHub - google-deepmind/asyncdiloco

Contribute to google-deepmind/asyncdiloco development by creating an account on GitHub.

0

1

20

@Ar_Douillard

Arthur Douillard

10 days

Tweet media one

0

0

20

@Ar_Douillard

Arthur Douillard

1 year

Research Scientists are just a wrapper over Nvidia GPUs

@ericjang11

Eric Jang

1 year

Venture Capital is just a wrapper over Nvidia GPUs

3

4

81

0

0

20

@Ar_Douillard

Arthur Douillard

4 months

TL;DR: a experimental mixture of experts that can be trained across the world, with no limit engineering-wise on its size, while being able to be light-weight and fast at test-time. arXiv link:

Tweet card media

DiPaCo: Distributed Path Composition

Progress in machine learning (ML) has been fueled by scaling neural network models. This scaling has been enabled by ever more heroic feats of engineering, necessary for accommodating ML...

3

5

18

@Ar_Douillard

Arthur Douillard

10 months

So I just saw a in-person talk of Schmidhuber. No slides, just him talking and waving hands. Well, I have to say he's a really great speaker

0

0

20

@Ar_Douillard

Arthur Douillard

2 years

I’d say the same for Continual Learning, we should stop working on {Split, Permuted, Rotated}-MNIST. Bigger datasets like ImageNet (or mini/tiny/100-subset) and Core50 are more representative.

@aaron_defazio

Aaron Defazio

2 years

Optimization improvements on MNIST and CIFAR-10 rarely transfer to larger problems, the sooner we stop testing on them the better.

10

9

138

1

0

20

@Ar_Douillard

Arthur Douillard

5 months

This please. I’m very surprised how much this paper was retweet, while clearly there is a strong confounding factor: the paper quality itself.

@DamienTeney

Damien Teney

5 months

⚠️Correlation ≠ causation. The controls in this study only account for topic and publication venue. These "influencers" don't tweet random papers, and any skill they have in picking promising papers can explain why this study finds that they eventually accrue more citations.

5

6

55

2

4

20