Armand Joulin Profile
Armand Joulin

@armandjoulin

5,482
Followers
372
Following
8
Media
170
Statuses

principal researcher, @googledeepmind . ex director of emea at @metaai . led a few open projects: llama, fasttext, dino, and now gemma.

Joined February 2009
Don't wanna be here? Send us removal request.
Pinned Tweet
@armandjoulin
Armand Joulin
2 months
Are small models still undertrained? We are releasing a 2B model that beats GPT-3.5. The crazy part is that it was distill on only 2T tokens from a small model. Distillation is the future of LLMs with the growing availability of large and efficient open models!
Tweet media one
11
41
373
@armandjoulin
Armand Joulin
30 days
Weird way to learn that my company is announcing something on October 18th.
@dicnunz
dic
1 month
Rumor has it that an @OpenAI announcement is in the works for Thursday, October 17; unsure of the nature as it could be a GPT-4o model update or a public rollout of SearchGPT. One thing for sure, it will NOT be a new frontier model.
31
7
282
38
36
1K
@armandjoulin
Armand Joulin
1 year
We are releasing a series of visual features that are performant across pixel and image level tasks. We achieve this by training a 1b param VIT-g on a large diverse and curated dataset with no supervision, and distill it to smaller models. Everything is open-source.
@AIatMeta
AI at Meta
1 year
Announced by Mark Zuckerberg this morning — today we're releasing DINOv2, the first method for training computer vision models that uses self-supervised learning to achieve results matching or exceeding industry standards. More on this new work ➡️
92
895
4K
6
35
322
@armandjoulin
Armand Joulin
3 months
Gemma 2 27B is now the best open model while being 2.5x smaller than alternatives! This validates the work done by the team and Gemini. This is just the beginning 💙♊️
@lmsysorg
lmsys.org
3 months
We also collect more votes for Gemma-2-27B (now 5K+) for the past few days. Gemma-2 stays robust against Llama-3-70B, now the new best open model!
Tweet media one
4
25
142
8
43
233
@armandjoulin
Armand Joulin
1 year
Life update: I m joining GDM Paris. Ping me if you want to chat!
23
2
217
@armandjoulin
Armand Joulin
3 months
Gemma 2 is out! Available on AI Studio, @huggingface and @ollama . Kudos to them for the amazing logo they made for the release 💙
Tweet media one
7
29
214
@armandjoulin
Armand Joulin
2 years
Super excited to share new open LLMs from FAIR with our research community. Particularly, the LLaMA-13B is competitive with GPT-3, despite being 10x smaller.
@GuillaumeLample
Guillaume Lample @ ICLR 2024
2 years
Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters. LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B. The weights for all models are open and available at 1/n
Tweet media one
Tweet media two
Tweet media three
Tweet media four
173
1K
7K
2
26
207
@armandjoulin
Armand Joulin
7 months
We are releasing a first set of new models for the open community of developers and researchers.
@demishassabis
Demis Hassabis
7 months
We have a long history of supporting responsible open source & science, which can drive rapid research progress, so we’re proud to release Gemma: a set of lightweight open models, best-in-class for their size, inspired by the same tech used for Gemini
Tweet media one
182
351
2K
6
17
124
@armandjoulin
Armand Joulin
1 year
Our DINOv2 models are now under Apache 2.0 license. Thank you @MetaAI for making this change!
@AIatMeta
AI at Meta
1 year
To support innovation in computer vision, we’re making DINOv2 available under the Apache 2.0 license + releasing a collection of DINOv2-based dense prediction models for semantic image segmentation and monocular depth estimation. Try our updated demo ➡️
12
136
883
0
10
121
@armandjoulin
Armand Joulin
3 months
yes but,
Tweet media one
@reach_vb
Vaibhav (VB) Srivastav
3 months
Few understand that the real star of the release is 9B and not 27B Trained on 8 Trillion tokens with knowledge distillation (presumably with 27B as the teacher model) It blows the competition (<15 B) out of the water At Q6/ Q8, it's ~10GB VRAM, making it a powerful model for
Tweet media one
7
32
316
6
8
118
@armandjoulin
Armand Joulin
6 months
Fixed the fix.
Tweet media one
@jefrankle
Jonathan Frankle
6 months
Fixed it for you, @code_star
Tweet media one
4
8
90
6
9
115
@armandjoulin
Armand Joulin
16 days
A 9B open model that surpasses some of the best open models! Would have love to be the one claiming this win, this is massive! Congrats @yumeng0818 @xiamengzhou and @danqi_chen !
@GlennCameronjr
Glenn Cameron Jr
16 days
Princeton's Gemma 2 9B SimPO fine-tuned model is doing amazing on @lmsysorg 🤯📈 You can find it on @huggingface : Nice work @yumeng0818 @xiamengzhou & @danqi_chen
Tweet media one
5
31
177
2
32
106
@armandjoulin
Armand Joulin
6 months
When you realize that MLX was developed by only 4 people and you see what they achieved...
@awnihannun
Awni Hannun
6 months
MLX Swift and LLM example are updated. Generating text is faster. Get started: 4-bit Gemma 2B runs nicely on an iPhone 14:
10
36
274
3
7
110
@armandjoulin
Armand Joulin
3 months
So, close models are becoming cheap and small while open models are becoming big and expensive to run. I didn't see that coming.
13
4
109
@armandjoulin
Armand Joulin
3 months
Prefix-LMs are so natural for autoregressive image models. Even when if there is no text involved as @alaaelnouby shown in his SSL paper Link:
Tweet media one
@giffmana
Lucas Beyer (bl16)
3 months
First, it's a Prefix-LM. Full attention between image and prefix (=user input), auto-regressive only on suffix (=model output). The intuition is that this way, the image tokens can see the query and do task-dependent "thinking"; if it was full AR, they couldn't. Results agree:
Tweet media one
Tweet media two
4
12
102
3
12
65
@armandjoulin
Armand Joulin
2 months
can we keep the sota fixed for more than a day, pls?
Tweet media one
4
4
64
@armandjoulin
Armand Joulin
9 months
Our work on learning visual features with an LLM approach is finally out. All the scaling observations made on LLMs transfer to images! It was a pleasure to work under @alaaelnouby leadership on this project, and this concludes my fun (but short) time at Apple! 1/n
@alaa_nouby
Alaa El-Nouby
9 months
Excited to share AIM 🎯 - a set of large-scale vision models pre-trained solely using an autoregressive objective. We share the code & checkpoints of models up to 7B params, pre-trained for 1.2T patches (5B images) achieving 84% on ImageNet with a frozen trunk. (1/n) 🧵
Tweet media one
8
56
214
1
7
65
@armandjoulin
Armand Joulin
10 months
IMHO, Chinchilla is the most impactful paper in the recent development of open LLMs, and its relatively low citation counts shows how much this metric is broken.
@srush_nlp
Sasha Rush
10 months
I'm a bit obsessed with the Chinchilla paper. It has the largest ratio of "economic worth/idea complexity" of any paper in AI. If Google has locked it down, it's possible open-source would be a year or more behind.
8
19
327
3
7
56
@armandjoulin
Armand Joulin
10 months
Using parallel decoding to speed up inference of LLM: ✅ no need for a second model ✅ not finetuning ✅ negligible memory overhead
@giomonea
Giovanni Monea
10 months
🎉 Unveiling PaSS: Parallel Speculative Sampling 🚀 Need faster LLM decoding? 🔗 Check out our new 1-model speculative sampling algorithm based on parallel decoding with look-ahead tokens: 🤝 In collaboration with @armandjoulin and @EXGRV
7
10
52
0
5
31
@armandjoulin
Armand Joulin
4 months
@ylecun @_arohan_ If arxiv counts as publication reviewed by peers, then i think we all agree.
1
0
27
@armandjoulin
Armand Joulin
3 months
@ryoppippi Apple drops a model + code + dataset to reproduce. It s one of the most open work around for such a powerful model. Can we celebrate this?
0
3
26
@armandjoulin
Armand Joulin
2 months
💙🤍❤️
Tweet media one
1
0
19
@armandjoulin
Armand Joulin
3 months
@srush_nlp 9b and 2b are fully trained with distillation during pretraining. Finetuning start with standard sft (no distillation), then online distillation and finally rlhf.
3
0
13
@armandjoulin
Armand Joulin
6 months
@SashaMTL It was registered for a Shasha Lucciono.
1
0
11
@armandjoulin
Armand Joulin
10 months
@giffmana FAIR is still home to top tier computer vision like @imisra_ , @lvdmaaten , Christoph Feichtenhofer, Peter Dollar, Yaniv Taigman, @p_bojanowski . As @inkynumbers I think a lot of us joined 8-9yr ago and there are cycles in research careers.
0
0
11
@armandjoulin
Armand Joulin
3 months
@karpathy sometimes i like to think of very large model as just a smart way to run a massive parallel gradient descent in the parameter spaces of small models.
2
1
11
@armandjoulin
Armand Joulin
7 months
@abacaj We will look to improve our models in future iterations and any feedback will be appreciated (through DMs?). Mistral's models are amazing and if they work for you, all the best!
1
1
10