Armand Joulin @armandjoulin Twitter profile | Pikagi

Pikagi

Armand Joulin

@armandjoulin

5,482

Followers

372

Following

8

Media

170

Statuses

principal researcher, @googledeepmind . ex director of emea at @metaai . led a few open projects: llama, fasttext, dino, and now gemma.

Joined February 2009

Don't wanna be here? Send us removal request.

Pinned Tweet

@armandjoulin

Armand Joulin

2 months

Are small models still undertrained? We are releasing a 2B model that beats GPT-3.5. The crazy part is that it was distill on only 2T tokens from a small model. Distillation is the future of LLMs with the growing availability of large and efficient open models!

Tweet media one

11

41

373

Last Seen Profiles

@fan327873844387

@luizgelias

@s_asumu1104

@BrianMi42260931

@QasmMjly94395

@Abduaziz_2030

@VVV0VVVV

@Tokyozilla

@jglia

@hyuckiesunnie

@Villagebrighton

@cruellabdx

@wtvday

@catia217

@kerricolby

@Hudahagawe

@itirafet_korkma

@BaekPago__

@staychatTY

@HomoHominum

@Bilon2Night

@TradeTheTrigger

@JUNIOR_JPN

@Kuchucho_16

@Furry_Bara_Gay

@harythilmy

@MarianoGiustino

@Bryan2kmartinez

@Lllk98967324

@VerumEstLiberta

@QtrQtr16516511

@4vgzi0FVbBd7bwK

@voezc

@mine_rosi

@MelihDoleneken

@Chelawuera

@armandjoulin

Armand Joulin

30 days

Weird way to learn that my company is announcing something on October 18th.

@dicnunz

dic

1 month

Rumor has it that an @OpenAI announcement is in the works for Thursday, October 17; unsure of the nature as it could be a GPT-4o model update or a public rollout of SearchGPT. One thing for sure, it will NOT be a new frontier model.

31

7

282

38

36

1K

@armandjoulin

Armand Joulin

1 year

We are releasing a series of visual features that are performant across pixel and image level tasks. We achieve this by training a 1b param VIT-g on a large diverse and curated dataset with no supervision, and distill it to smaller models. Everything is open-source.

@AIatMeta

AI at Meta

1 year

Announced by Mark Zuckerberg this morning — today we're releasing DINOv2, the first method for training computer vision models that uses self-supervised learning to achieve results matching or exceeding industry standards. More on this new work ➡️

92

895

4K

6

35

322

@armandjoulin

Armand Joulin

3 months

Gemma 2 27B is now the best open model while being 2.5x smaller than alternatives! This validates the work done by the team and Gemini. This is just the beginning 💙♊️

@lmsysorg

lmsys.org

3 months

We also collect more votes for Gemma-2-27B (now 5K+) for the past few days. Gemma-2 stays robust against Llama-3-70B, now the new best open model!

Tweet media one

4

25

142

8

43

233

@armandjoulin

Armand Joulin

1 year

Life update: I m joining GDM Paris. Ping me if you want to chat!

23

2

217

@armandjoulin

Armand Joulin

3 months

Gemma 2 is out! Available on AI Studio, @huggingface and @ollama . Kudos to them for the amazing logo they made for the release 💙

Tweet media one

7

29

214

@armandjoulin

Armand Joulin

2 years

Super excited to share new open LLMs from FAIR with our research community. Particularly, the LLaMA-13B is competitive with GPT-3, despite being 10x smaller.

@GuillaumeLample

Guillaume Lample @ ICLR 2024

@GuillaumeLample

2 years

Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters. LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B. The weights for all models are open and available at 1/n

Tweet media one

Tweet media two

Tweet media three

Tweet media four

173

1K

7K

2

26

207

@armandjoulin

Armand Joulin

7 months

We are releasing a first set of new models for the open community of developers and researchers.

@demishassabis

Demis Hassabis

7 months

We have a long history of supporting responsible open source & science, which can drive rapid research progress, so we’re proud to release Gemma: a set of lightweight open models, best-in-class for their size, inspired by the same tech used for Gemini

Tweet media one

182

351

2K

6

17

124

@armandjoulin

Armand Joulin

1 year

Our DINOv2 models are now under Apache 2.0 license. Thank you @MetaAI for making this change!

@AIatMeta

AI at Meta

1 year

To support innovation in computer vision, we’re making DINOv2 available under the Apache 2.0 license + releasing a collection of DINOv2-based dense prediction models for semantic image segmentation and monocular depth estimation. Try our updated demo ➡️

12

136

883

0

10

121

@armandjoulin

Armand Joulin

3 months

yes but,

Tweet media one

@reach_vb

Vaibhav (VB) Srivastav

3 months

Few understand that the real star of the release is 9B and not 27B Trained on 8 Trillion tokens with knowledge distillation (presumably with 27B as the teacher model) It blows the competition (<15 B) out of the water At Q6/ Q8, it's ~10GB VRAM, making it a powerful model for

Tweet media one

7

32

316

6

8

118

@armandjoulin

Armand Joulin

6 months

Fixed the fix.

Tweet media one

@jefrankle

Jonathan Frankle

6 months

Fixed it for you, @code_star

Tweet media one

4

8

90

6

9

115

@armandjoulin

Armand Joulin

16 days

A 9B open model that surpasses some of the best open models! Would have love to be the one claiming this win, this is massive! Congrats @yumeng0818 @xiamengzhou and @danqi_chen !

@GlennCameronjr

Glenn Cameron Jr

@GlennCameronjr

16 days

Princeton's Gemma 2 9B SimPO fine-tuned model is doing amazing on @lmsysorg 🤯📈 You can find it on @huggingface : Nice work @yumeng0818 @xiamengzhou & @danqi_chen

Tweet media one

5

31

177

2

32

106

@armandjoulin

Armand Joulin

6 months

When you realize that MLX was developed by only 4 people and you see what they achieved...

@awnihannun

Awni Hannun

6 months

MLX Swift and LLM example are updated. Generating text is faster. Get started: 4-bit Gemma 2B runs nicely on an iPhone 14:

10

36

274

3

7

110

@armandjoulin

Armand Joulin

3 months

So, close models are becoming cheap and small while open models are becoming big and expensive to run. I didn't see that coming.

13

4

109

@armandjoulin

Armand Joulin

3 months

Prefix-LMs are so natural for autoregressive image models. Even when if there is no text involved as @alaaelnouby shown in his SSL paper Link:

Tweet media one

@giffmana

Lucas Beyer (bl16)

3 months

First, it's a Prefix-LM. Full attention between image and prefix (=user input), auto-regressive only on suffix (=model output). The intuition is that this way, the image tokens can see the query and do task-dependent "thinking"; if it was full AR, they couldn't. Results agree:

Tweet media one

Tweet media two

4

12

102

3

12

65

@armandjoulin

Armand Joulin

2 months

can we keep the sota fixed for more than a day, pls?

Tweet media one

4

4

64

@armandjoulin

Armand Joulin

9 months

Our work on learning visual features with an LLM approach is finally out. All the scaling observations made on LLMs transfer to images! It was a pleasure to work under @alaaelnouby leadership on this project, and this concludes my fun (but short) time at Apple! 1/n

@alaa_nouby

Alaa El-Nouby

9 months

Excited to share AIM 🎯 - a set of large-scale vision models pre-trained solely using an autoregressive objective. We share the code & checkpoints of models up to 7B params, pre-trained for 1.2T patches (5B images) achieving 84% on ImageNet with a frozen trunk. (1/n) 🧵

Tweet media one

8

56

214

1

7

65

@armandjoulin

Armand Joulin

10 months

IMHO, Chinchilla is the most impactful paper in the recent development of open LLMs, and its relatively low citation counts shows how much this metric is broken.

@srush_nlp

Sasha Rush

10 months

I'm a bit obsessed with the Chinchilla paper. It has the largest ratio of "economic worth/idea complexity" of any paper in AI. If Google has locked it down, it's possible open-source would be a year or more behind.

8

19

327

3

7

56

@armandjoulin

Armand Joulin

10 months

Using parallel decoding to speed up inference of LLM: ✅ no need for a second model ✅ not finetuning ✅ negligible memory overhead

@giomonea

Giovanni Monea

10 months

🎉 Unveiling PaSS: Parallel Speculative Sampling 🚀 Need faster LLM decoding? 🔗 Check out our new 1-model speculative sampling algorithm based on parallel decoding with look-ahead tokens: 🤝 In collaboration with @armandjoulin and @EXGRV

7

10

52

0

5

31

@armandjoulin

Armand Joulin

4 months

@ylecun @_arohan_ If arxiv counts as publication reviewed by peers, then i think we all agree.

1

0

27

@armandjoulin

Armand Joulin

3 months

@ryoppippi Apple drops a model + code + dataset to reproduce. It s one of the most open work around for such a powerful model. Can we celebrate this?

0

3

26

@armandjoulin

Armand Joulin

2 months

💙🤍❤️

Tweet media one

1

0

19

@armandjoulin

Armand Joulin

3 months

@srush_nlp 9b and 2b are fully trained with distillation during pretraining. Finetuning start with standard sft (no distillation), then online distillation and finally rlhf.

3

0

13

@armandjoulin

Armand Joulin

6 months

@SashaMTL It was registered for a Shasha Lucciono.

1

0

11

@armandjoulin

Armand Joulin

10 months

@giffmana FAIR is still home to top tier computer vision like @imisra_ , @lvdmaaten , Christoph Feichtenhofer, Peter Dollar, Yaniv Taigman, @p_bojanowski . As @inkynumbers I think a lot of us joined 8-9yr ago and there are cycles in research careers.

0

0

11

@armandjoulin

Armand Joulin

3 months

@karpathy sometimes i like to think of very large model as just a smart way to run a massive parallel gradient descent in the parameter spaces of small models.

2

1

11

@armandjoulin

Armand Joulin

7 months

@abacaj We will look to improve our models in future iterations and any feedback will be appreciated (through DMs?). Mistral's models are amazing and if they work for you, all the best!

1

1

10