TimDarcet Profile
TimDarcet

@TimDarcet

2,755
Followers
618
Following
99
Media
621
Statuses

PhD student, building big vision models @ INRIA & FAIR (Meta)

Joined March 2021
Don't wanna be here? Send us removal request.
Pinned Tweet
@TimDarcet
TimDarcet
1 year
1/ This week we released DINOv2: a series of general vision encoders pretrained without supervision. Good out-of-the-box performance on a variety of domains, matching or surpassing other publicly available encoders.
5
117
696
@TimDarcet
TimDarcet
1 year
Vision transformers need registers! Or at least, it seems they 𝘸𝘢𝘯𝘵 some… ViTs have artifacts in attention maps. It’s due to the model using these patches as “registers”. Just add new tokens (“[reg]”): - no artifacts - interpretable attention maps 🦖 - improved performances!
Tweet media one
43
327
2K
@TimDarcet
TimDarcet
11 months
DINOv2+registers=♥️ We are releasing code and checkpoints for DINOv2 augmented with registers and a slightly better training recipe. No more of those pesky artifacts! Simple one-liner, try it out: dinov2_vitg14_reg = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14_reg')
Tweet media one
12
42
491
@TimDarcet
TimDarcet
2 months
Still not sure why the ML community adopted conda instead of plain old virtualenv
60
2
324
@TimDarcet
TimDarcet
7 months
Mistral's "Le Chat" logo is a design masterclass The two dots make a smol cat
Tweet media one
4
11
241
@TimDarcet
TimDarcet
3 months
Bonus trick: you can remove the gradient reduction of the first backward (which is useless) by wrapping in no_sync() Remember to also include the forward pass in the no_sync context, else it does not work
Tweet media one
@gabriberton
Gabriele Berton
3 months
This simple pytorch trick will cut in half your GPU memory use / double your batch size (for real). Instead of adding losses and then computing backward, it's better to compute the backward on each loss (which frees the computational graph). Results will be exactly identical
Tweet media one
43
266
2K
1
18
240
@TimDarcet
TimDarcet
5 months
If you need a replacement for an example image in a CV paper, you know what to do
Tweet media one
@MikePFrank
Michael P. Frank has joined a startup!
5 months
Just FYI, computer vision papers submitted to IEEE that include this image of Ms. Forsén will no longer be considered for publication
Tweet media one
104
347
2K
7
11
166
@TimDarcet
TimDarcet
1 year
Intriguing new property: on some images, the different registers naturally adopt a “slot attention-like” behavior, each attending to a different object! Needless to say, this was never required of the model (or even encouraged). Cool future research direction!
Tweet media one
2
11
164
@TimDarcet
TimDarcet
5 months
In case some of you were (like me) curious about this stat for AI conferences: here it is for ICLR2024
Tweet media one
@guidosalva
Guido Salvaneschi
5 months
Statistics from @ICSE2024 . Authors submitting, *each*, 33, 27, 24, ... papers. Interactive dashboard:
Tweet media one
18
24
89
13
20
154
@TimDarcet
TimDarcet
4 months
ViT need registers got an outstanding paper award! Many thanks to the comittee for the honor
@iclr_conf
ICLR 2025
4 months
Announcing the #ICLR2024 Outstanding Paper Awards: Shoutout to the awards committee: @eunsolc , @katjahofmann , @liu_mingyu , @nanjiang_cs , @guennemann , @optiML , @tkipf , @CevherLIONS
3
53
303
6
10
153
@TimDarcet
TimDarcet
6 months
Hey! If you are using DINOv2, whether in a startup, in research or whatever, could you send me a DM? I want your feedback on the model. Reward for you? Simple: next model is gonna be 𝘦𝘷𝘦𝘯 𝘮𝘰𝘳𝘦 suited to your needs 👌
10
12
134
@TimDarcet
TimDarcet
1 year
Our hypothesis is: the model recognizes useless patches, discards the info in them, and uses them as 𝘢𝘨𝘨𝘳𝘦𝘨𝘢𝘵𝘰𝘳𝘴 𝘰𝘧 𝘨𝘭𝘰𝘣𝘢𝘭 𝘪𝘯𝘧𝘰𝘳𝘮𝘢𝘵𝘪𝘰𝘯.
2
7
132
@TimDarcet
TimDarcet
4 months
Current state of neurips abstract submissions This neurips is gonna be crazy
Tweet media one
@csinva
Chandan Singh
4 months
2024 update
Tweet media one
2
4
32
11
22
121
@TimDarcet
TimDarcet
5 months
With satellite imagery, it’s hard to get labels. Solution? DINOv2! WRI+Meta trained a satellite DINOv2 for tree height estimation. They created an interactive map of tree height of the whole globe (!) at 1-meter res (!): Quizz: Can you recognize this city?
Tweet media one
4
13
121
@TimDarcet
TimDarcet
1 year
What I mean when I say “registers”: additional learnable tokens (like the [CLS]), but these ones are not used at output. No additional info at input, not used at output: these tokens could seem useless!
Tweet media one
2
8
119
@TimDarcet
TimDarcet
4 months
echo "echo 'sleep 0.5' >> ~/.bashrc" >> ~/.bashrc
@y0b1byte
yobibyte
4 months
Every time a colleague of mine does not lock their laptop, I add something to their .bashrc. alias vim='nano' is a good one, but moving file to a random folder is even funnier. rm is too evil, don't do it!
4
1
28
7
6
113
@TimDarcet
TimDarcet
8 months
ICLR results are out so its bragging time: ViT need reg got an oral and very good scores (top-15), so that's cool. Thanks a lot to the reviewers who found it good If you want to try a model with registers, we published some DINOv2 checkpoints earlier:
@TimDarcet
TimDarcet
11 months
DINOv2+registers=♥️ We are releasing code and checkpoints for DINOv2 augmented with registers and a slightly better training recipe. No more of those pesky artifacts! Simple one-liner, try it out: dinov2_vitg14_reg = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14_reg')
Tweet media one
12
42
491
9
7
111
@TimDarcet
TimDarcet
9 months
PSA: when someone asks you a question including words such as "false positive rate", 𝗱𝗼 𝗻𝗼𝘁 𝗮𝗻𝘀𝘄𝗲𝗿 𝗿𝗶𝗴𝗵𝘁 𝗮𝘄𝗮𝘆. Simply state that you know your rights, and go on wikipedia to consult the 𝔐𝔞𝔡 𝕮𝔬𝔫𝔣𝔲𝔰𝔦𝔬𝔫 𝔐𝔞𝔱𝔯𝔦𝔵 𝔬𝔣 𝕳𝔢𝔩𝔩
Tweet media one
@jeremykauffman
Jeremy Kauffman 🦔
9 months
Fewer than 1 in 5 doctors can correctly answer a basic question about statistics
Tweet media one
488
826
8K
2
16
105
@TimDarcet
TimDarcet
1 year
But in fact, the model learns to use them. And they work quite well: a single register entirely fixes the attention maps, and gives a boost on downstream tasks. Adding more further increases the scores a bit. We improve upon DINOv2, which was already quite stronk 💪
Tweet media one
2
2
102
@TimDarcet
TimDarcet
1 year
Do check out the paper! It’s got much more detail than I can give here. Thanks to Maxime Oquab, Julien Mairal and Piotr Bojanowski who were patient enough to work with me, and competent enough to compensate for my mistakes 😅.
Tweet media one
3
2
94
@TimDarcet
TimDarcet
5 months
Actually the accept rate decreases monotonically with number of 1st author submissions: the more prolific the first author is, the lower the quality of their paper.
Tweet media one
@jon_barron
Jon Barron
5 months
The acceptance rate among aspiring ICLR2024 first authors who submitted >= 4 papers was 15%! Contrast that with the base acceptance rate that year: 30.5%. Unsettling.
4
5
43
2
13
96
@TimDarcet
TimDarcet
3 months
fuck your fancy personal page template im rawdoggin the html and you wont even make me use css
Tweet media one
8
3
95
@TimDarcet
TimDarcet
1 year
“Fine with me if you need global aggregators, but please don’t do this in my feature maps. I need those for downstream tasks! Here, have a few registers instead” - historical reconstruction of how it happened
Tweet media one
1
3
91
@TimDarcet
TimDarcet
6 months
Hey guys quick update vision transformers don't need registers after all brb gotta test some stuff
@liuzhuang1234
Zhuang Liu
6 months
LLMs are great, but their internals are less explored. I'm excited to share very interesting findings in paper “Massive Activations in Large Language Models” LLMs have very few internal activations with drastically outsized magnitudes, e.g., 100,000x larger than others. (1/n)
Tweet media one
32
169
1K
3
2
81
@TimDarcet
TimDarcet
1 year
This starts with a very simple observation: ~all ViTs have attention maps focused on a few seemingly random patches. DINO has clean attention maps, sure, but then why did the artifacts reappear in DINOv2? What 𝘢𝘳𝘦 these artifacts?
Tweet media one
1
4
81
@TimDarcet
TimDarcet
2 months
Okay this uiua thing is actually pretty fun
Tweet media one
@ludwigABAP
ludwig
3 months
uiua goes unbelievably hard wtf array-orientated, stack based, glyph programming language and now I wanna make the game of life in it this weekend
Tweet media one
14
7
222
2
5
81
@TimDarcet
TimDarcet
3 months
You may not like it, but this is what peak personal page looks like
@TimDarcet
TimDarcet
3 months
fuck your fancy personal page template im rawdoggin the html and you wont even make me use css
Tweet media one
8
3
95
12
3
78
@TimDarcet
TimDarcet
1 year
Thanks @_akhaliq and @arankomatsuzaki for featuring our paper! It's great to see it 1st on the trending list on HF papers 😁
@_akhaliq
AK
1 year
Vision Transformers Need Registers paper page: Transformers have recently emerged as a powerful tool for learning visual representations. In this paper, we identify and characterize artifacts in feature maps of both supervised and self-supervised ViT
Tweet media one
5
142
836
3
6
66
@TimDarcet
TimDarcet
1 year
We find a few properties of these artifacts. 1. They appear on patches with useless information (redundant to their neighbors). 2. They contain little information about the original patch. It “forgot” its original value!
Tweet media one
1
1
66
@TimDarcet
TimDarcet
7 months
Happy to share that DINOv2 was accepted at TMLR! A special thanks to the reviewers and action editor. I found the review process to be actually pleasant and constructive. I believe that right now TMLR is possibly the best place to publish in ML
@TmlrPub
Accepted papers at TMLR
8 months
DINOv2: Learning Robust Visual Features without Supervision Maxime Oquab, Timothée Darcet, Théo Moutakanni et al.. Action editor: Abhishek Kumar. #supervised #visual #features
1
18
118
1
6
63
@TimDarcet
TimDarcet
1 year
On the other hand, the output tokens seem to contain 𝗹𝗼𝘁𝘀 of global information. We probe on a few different classification datasets. We find that these tokens contain much more class information than other patch tokens, and almost as much as the [CLS]!
Tweet media one
1
0
63
@TimDarcet
TimDarcet
1 year
Do try out the new depth estimation parallax view, it's trippy
2
5
59
@TimDarcet
TimDarcet
4 months
Thanks to DINO's nice attention maps, the model's behavior is quite interpretable! That's really cool
Tweet media one
Tweet media two
@TimDarcet
TimDarcet
4 months
Another banger by @TheoMoutakanni : RayDINO, a DINO for chest X-ray. Excellent results on a ton of benchmarks with the frozen model, with great generalization and low bias. Check it out!
Tweet media one
1
5
43
2
11
57
@TimDarcet
TimDarcet
1 year
6/ With these capabilities emerge new interesting properties. A very nice one is the ability to perform semantic keypoint matching between images simply by matching the closest features. This works across very different domains !
Tweet media one
Tweet media two
Tweet media three
2
12
56
@TimDarcet
TimDarcet
10 months
Published my first paper, and my second one. I like them. I used to feel anxious about not being able to publish anything. It's getting better.
@ATMwithJacy
Jacy, LPC
10 months
BRAG ABOUT SOMETHING YOU’RE PROUD OF ACCOMPLISHING IN 2023 ✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨
1K
474
6K
2
0
53
@TimDarcet
TimDarcet
1 year
2/ As opposed to other recent SSL works, the goal is to provide vision encoders that work off-the-shelf, without any fine-tuning. In this setup, we improve significantly over previous SSL works, and even match or surpass CLIP-type models on a variety of tasks
Tweet media one
1
4
53
@TimDarcet
TimDarcet
2 months
Lmao they waited for the 405B release just to be able to 1-up it
@MistralAI
Mistral AI
2 months
126
362
2K
3
0
51
@TimDarcet
TimDarcet
4 months
Another banger by @TheoMoutakanni : RayDINO, a DINO for chest X-ray. Excellent results on a ton of benchmarks with the frozen model, with great generalization and low bias. Check it out!
Tweet media one
1
5
43
@TimDarcet
TimDarcet
11 months
Quite a few people have been asking me "can registers work with LLMs?" Here is a paper that says yes !
@arankomatsuzaki
Aran Komatsuzaki
11 months
Think before you speak: Training Language Models With Pause Tokens - Performing training and inference on LMs with a learnable pause token appended to the input prefix - Gains on 8 tasks, e,g, +18% on SQuAD
Tweet media one
16
177
925
3
3
40
@TimDarcet
TimDarcet
6 months
Tweet media one
1
0
38
@TimDarcet
TimDarcet
1 year
Big news on the DINOv2 side! - Apache2 license (commercial use) - Releasing the segmentation and depth heads - significantly updated demo, with keypoint matching! - New fairness evaluations on FACET
2
5
35
@TimDarcet
TimDarcet
4 months
The viennese street artists are a different breed
Tweet media one
0
2
36
@TimDarcet
TimDarcet
5 months
The biggest step change in the DINOv2 project was a skillful yolo run by Maxime yoloing is a dangerous but powerful weapon
@_jasonwei
Jason Wei
5 months
In AI research there is tremendous value in intuitions on what makes things work. In fact, this skill is what makes “yolo runs” successful, and can accelerate your team tremendously. However, there’s no track record on how good someone’s intuition is. A fun way to do this is
19
36
465
2
0
35
@TimDarcet
TimDarcet
5 months
In case you haven't got it yet: google scholar pdf reader extension for chrome is a must
Tweet media one
6
1
33
@TimDarcet
TimDarcet
7 months
Next week I'll be talking about registers, what they are and why we need them, at Cohere for AI! More info:
@CohereForAI
Cohere For AI
7 months
Next week on Wednesday, February 7th, our Geo-Regional Asia Group is excited to welcome Timothée Darcet, PhD student, building large vision models at @Meta AI (FAIR) & @Inria to present "Vision Transformers need Registers." Learn more:
Tweet media one
2
4
15
0
5
32
@TimDarcet
TimDarcet
1 year
Very clear and simple tutorial on how to use DINOv2 as an image featurizer. Check it out !
@NielsRogge
Niels Rogge
1 year
DINOv2, a SOTA ViT trained by @Meta on 142 million images, is now part of 🤗 Transformers! It's one of the strongest vision backbones at the moment, so I created a tutorial on training a linear classifier on top of it for semantic segmentation, using DINOv2's frozen features 1/2
Tweet media one
4
73
406
0
7
31
@TimDarcet
TimDarcet
10 months
Wait till they hear about selective checkpointing
@capetorch
Thomas Capelle
10 months
Gradient Checkpointing is the single most effective way of reducing GPU memory footprint. This thing is fantastic! Am I missing something, or is it that good?
10
3
101
2
0
30
@TimDarcet
TimDarcet
5 months
Okay caveat of my last post: maybe those are all middle authorship? Let's look at the same plot but only for _first_ and _last_ authors. First authors: (1/2)
Tweet media one
@TimDarcet
TimDarcet
5 months
In case some of you were (like me) curious about this stat for AI conferences: here it is for ICLR2024
Tweet media one
13
20
154
5
3
31
@TimDarcet
TimDarcet
4 months
@Ethan_smith_20 Contrastive loss in general push the model to use the whole space In DINOv2 we used the specific KoLeo loss, which pushes the embedding distribution towards higher entropy Higher entropy --> uniform distribution (on the hypersphere) --> full usage of the space
2
0
31
@TimDarcet
TimDarcet
3 months
Always check the image normalization! It can completely change results. eg CLIP uses its own specific norm, and openclip uses either the CLIP values or the inception values depending on the model. When in doubt, often you can check in timm
@gabriberton
Gabriele Berton
3 months
Notable models that use non-imagenet norm are Dust3r, OpenIBL, many image matching models, and some (many?) remote sensing models. This is an issue when you create a fair codebase to benchmark multiple models (where ideally you can simply swap the model to compute the results).
4
0
8
1
1
29
@TimDarcet
TimDarcet
2 months
Okay first DFN then that Apple is now the king of open-source datasets, both vision and NLP
@casper_hansen_
Casper Hansen
2 months
Apple released a 7B model that beats Mistral 7B - but the kicker is that they fully open sourced everything, also the pretraining dataset 🤯
29
502
3K
0
1
28
@TimDarcet
TimDarcet
3 months
Wait, are they doing patch size 40?? 170 is 1 [CLS] plus 13x13 patch tokens. Using padding, the smallest patch size you would need for that is 40. That's huge! Bigger than the old patch 32, that nobody does any more.
@y0b1byte
yobibyte
3 months
Very nice and interesting forensics
Tweet media one
1
2
35
2
1
27
@TimDarcet
TimDarcet
8 months
QRT-ing this for good measure. The most important thing I've learned about SSL is probably: experiments, experiments, experiments. If it doesn't work, experiment harder It's why big labs have such an unfair advantage
@TimDarcet
TimDarcet
8 months
@nickdaleburns @samsja19 With all that said I think it's not important whether something is contrastive or not. I understand the intuition "there may be similar samples in the batch so I don't want to push them away". But in SSL, intuitions are trash IMO. You just have to try the stuff
5
0
11
0
1
26
@TimDarcet
TimDarcet
1 year
> Dumps a 60 MMLU 7B as a magnet link > Refuses to elaborate > Leaves Uncomprehensibly based
Tweet media one
@MistralAI
Mistral AI
1 year
magnet:?xt=urn:btih:208b101a0f51514ecf285885a8b0f6fb1a1e4d7d&dn=mistral-7B-v0.1&tr=udp%3A%2F%%3A1337%2Fannounce&tr=https%3A%2F%%3A443%2Fannounce RELEASE ab979f50d7d406ab8d0b07d09806c72c
209
450
4K
2
0
26
@TimDarcet
TimDarcet
5 months
It's not exactly a Zipf law, but this distribution is still interesting
Tweet media one
2
0
25
@TimDarcet
TimDarcet
5 months
This begs the question: are these people contributing to reviews as much as they contribute to submissions? The review system is saturated. If people send lots of paper to it, they should contribute more to make it work.
@TimDarcet
TimDarcet
5 months
In case some of you were (like me) curious about this stat for AI conferences: here it is for ICLR2024
Tweet media one
13
20
154
4
4
24
@TimDarcet
TimDarcet
5 months
I usually say dɪno (dee-no) in french and daɪno (die-no) in english
@abursuc
Andrei Bursuc
5 months
Computer Vision folks, let's settle this. How do you pronounce DINO?
3
2
13
1
0
24
@TimDarcet
TimDarcet
9 months
I really don't like the pressure there is on "number of papers published". In France it's "defend after 3 years, as long as you published 1 paper". PhD students still publish excellent research. We should have incentives to publish fewer, better papers
@agihippo
yi 🦛
9 months
If publishing 3 papers is the bar for a PhD everyone will graduate in two quarters lol
3
0
16
1
0
24
@TimDarcet
TimDarcet
1 year
7/ What's the secret ingredient then? Well, the simplest answers are often the best. Most improvements come from scaling up, tuning carefully, stabilizing the training, efficient implementations... Might seem scientifically boring, but it’s absolutely crucial.
1
1
22
@TimDarcet
TimDarcet
3 months
MLPs are really cool, you can just look at the functions they define, and understand them Really recommend these discussions and the neural redshift paper
@francoisfleuret
François Fleuret
3 months
With x->x.clamp(min=-0.5, max=0.5) as non-linearity. So as expected it's not a question of piecewise linear vs. non-polynomial, but a question of creating sharp but local changes? s=1, s=10 @DamienTeney
Tweet media one
Tweet media two
2
0
12
0
3
21
@TimDarcet
TimDarcet
9 months
Tweet media one
@jeremyphoward
Jeremy Howard
9 months
he says he goes into "cuda mode" to write kernels. No music, lights off, no distractions. He wrote the 4bit kernel in one night.
21
23
724
0
0
20
@TimDarcet
TimDarcet
7 months
My drug is benchmark curves going up and I'm absolutely addicted this is not a meme send help
5
1
19
@TimDarcet
TimDarcet
1 year
@giffmana @MistralAI We kinda tried to contribute one brick to this with the DINOv2 release this year But in general I agree there's a big diff with NLP where there's new good foundation models every month rn
1
0
18
@TimDarcet
TimDarcet
8 months
Cool paper! Their encoders look quite strong. I'm happy to see ideas such as multicrop or iBOT being used. In my experience, it's free money
@_akhaliq
AK
8 months
Learning Vision from Models Rivals Learning Vision from Data paper page: introduce SynCLR, a novel approach for learning visual representations exclusively from synthetic images and synthetic captions, without any real data. We synthesize a large dataset
Tweet media one
5
91
444
0
2
18
@TimDarcet
TimDarcet
1 year
The keypoints demo is absolutely great, I'm happy we were able to finally release that
Tweet media one
1
1
17
@TimDarcet
TimDarcet
8 months
FAISS is absolutely standard for fast knn. Many approximate indices available, gpu acceleration made easy It was crucial to our dataset creation pipeline in DINOv2 Many thanks to @DouzeMatthijs , @hjegou and team!
@ducha_aiki
Dmytro Mishkin 🇺🇦
8 months
The Faiss library @DouzeMatthijs , Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, @hjegou tl;dr: the faiss and approximate kNN search overview
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
19
73
0
0
17
@TimDarcet
TimDarcet
5 months
Do not worry DINO fans: I am here to tell you that both pronunciation are officially valid
@abursuc
Andrei Bursuc
5 months
Sorry for DINO fans with French-Italian roots or preference for DI-stillation :)
1
0
5
1
0
17
@TimDarcet
TimDarcet
11 months
Tweet media one
@xhluca
Xing Han Lu
11 months
@ylecun @ClementDelangue Scikit-Learn --> Inria (Paris) Torch --> Idiap/EPFL (Switzerland) Theano --> Lisa/UdeM (Montreal) Keras --> François Chollet FAISS, DINO, DETR, LLAMA --> FAIR Paris Tokenizers, Optimum, Accelerate --> Huggingface Wonder if there's something different about speaking French..
0
1
32
0
0
17
@TimDarcet
TimDarcet
7 months
"Vision transformer need registers" Tomorrow, 4pm CET! Moreinf -->
@CohereForAI
Cohere For AI
7 months
Next week on Wednesday, February 7th, our Geo-Regional Asia Group is excited to welcome Timothée Darcet, PhD student, building large vision models at @Meta AI (FAIR) & @Inria to present "Vision Transformers need Registers." Learn more:
Tweet media one
2
4
15
1
0
16
@TimDarcet
TimDarcet
9 months
@XueFz Be careful of the citation tracker you use. Google scholar counts 791 with the right aggregation. I did expect more. But 791 is conceivable
Tweet media one
3
0
16
@TimDarcet
TimDarcet
5 months
4:22:37:12 Any% NMG 1.2.2.1 new WR
@_akhaliq
AK
5 months
How Good Are Low-bit Quantized LLaMA3 Models? Meta's LLaMA family has become one of the most powerful open-source Large Language Model (LLM) series. Notably, LLaMA3 models have recently been released and achieve impressive performance across various with super-large scale
Tweet media one
4
68
314
1
0
15
@TimDarcet
TimDarcet
1 year
11/ But giant models are impractical. To create smaller, portable models, we distilled the ViT-g into ViT-S, B and L (50x, 14x and 3x smaller). Distilling improves significantly over training from scratch! At these sizes, our distilled models beat all other models we tested.
Tweet media one
1
2
15
@TimDarcet
TimDarcet
5 months
I deleted the last plot, as I had made a stupid mistake. Here is the corrected version. Thanks @mihaidusmanu for pointing it out!
Tweet media one
@mihaidusmanu
Mihai Dusmanu
5 months
@TimDarcet Are you sure about the Y axis on this one? It feels off. And the X caption? Do I read it correctly that authors with >= 5 submissions as 1st author have an average of 4.6 papers accepted as 1st? Also according to the previous post there are only three samples at >= 5 submissions
1
0
1
1
1
15
@TimDarcet
TimDarcet
1 year
3/ TL;DR of results (on the benchmarks we tested) is Classification: DINOv2 ≥ CLIP Dense tasks (segmentation, depth): DINOv2 > CLIP Retrieval: DINOv2 > CLIP
1
3
15
@TimDarcet
TimDarcet
4 months
@p_bojanowski Congrats to you too!
Tweet media one
0
1
15
@TimDarcet
TimDarcet
7 months
Wow this new model is so much more interesting than GPT4
Tweet media one
@arthurmensch
Arthur Mensch
7 months
As a small surprise, we’re also releasing le Chat Mistral, a front-end demonstration of what Mistral models can do. Learn more on
22
55
465
0
0
15
@TimDarcet
TimDarcet
11 months
Just read the GAIA-1 paper, and I should've done it sooner. Feels like a big deal, I'm not aware of any previous work achieving such prediction capabilities / temporal consistency on videos. Actually bummed I didn't take the time to visit the @wayve_ai booth at ICCV
@alexgkendall
Alex Kendall
11 months
What’s exciting about #GAIA1 is that it's not just a simulation, but a full world model that understands how the world works Here's two different futures where a reversing car (1) pulls off the street or (2) pulls out suddenly requiring us to brake More of my fav examples in 🧵
18
60
409
0
0
14
@TimDarcet
TimDarcet
3 months
This is why I work ❤️
@alew3
Alessandro
4 months
🚀 Just launched to help pet owners reunite with their lost pets after Brazil's worst floods 🌊 Upload a picture of your pet, and our AI matches it with rescue shelter photos. Huge thanks to @aziontech and @LightningAI for their support! 🐾❤️ #AIForGood
2
5
12
0
0
15
@TimDarcet
TimDarcet
1 year
4/ For dense tasks, it makes sense: captions only capture image information up to a certain point, they miss local information. The masked image modeling loss in DINOv2 (from iBOT) gives it local understanding.
Tweet media one
2
1
14
@TimDarcet
TimDarcet
9 months
This result got attention, but I feel it deserves more. Similar approach to GAIA-1, and potentially very significant results. The big gap in this paper is the evaluations. Qualitative results are impressive, but we need some quantitative.
@YutongBAI1002
Yutong Bai
9 months
How far can we go with vision alone? Excited to reveal our Large Vision Model! Trained with 420B tokens, effective scalability, and enabling new avenues in vision tasks! (1/N) Kudos to @younggeng @Karttikeya_m @_amirbar , @YuilleAlan Trevor Darrell @JitendraMalikCV Alyosha Efros!
18
160
1K
0
0
14
@TimDarcet
TimDarcet
9 months
> 8 x 7B First open-source SOTA MoE?
Tweet media one
0
0
14
@TimDarcet
TimDarcet
1 year
5/ The good knn properties of DINO and the new KoLeo regularization combine to produce strong retrieval results. We were surprised by how good the metrics are! Even our ViT-S reaches better scores than any previously released model.
Tweet media one
1
1
14
@TimDarcet
TimDarcet
1 month
Wait wait guys is this... ???
Tweet media one
3
0
14
@TimDarcet
TimDarcet
2 months
Tweet media one
0
1
13
@TimDarcet
TimDarcet
4 months
Hey, quick update and apology: this graph is misleading. 22k is the number of abstract submitted, while last years 12k is the number of papers submitted. afaik last year there were 15.6k abstracts. If the proportion stays the same, there should be 17k papers submitted this year
@TimDarcet
TimDarcet
4 months
Current state of neurips abstract submissions This neurips is gonna be crazy
Tweet media one
11
22
121
1
0
13
@TimDarcet
TimDarcet
2 months
@untitled01ipynb All researchers and interns have access to quite a few V100s, as a base allocation To have a big allocation you need to justify it, usually as part of a big project
2
0
13
@TimDarcet
TimDarcet
9 months
In medical / bio images, annotations are very expensive (need very skilled annotators, eg doctors), so you have few labeled data. This is where SSL shines the brightest
@gshaikovski
George Shaikovski
9 months
We trained a self-supervised transformer on 1.5 million megapixel-scale whole slide images and DINOv2 framework, achieving state of the art on all public benchmarks. Now partnering with @MSFTResearch to go further.
0
7
14
1
0
13
@TimDarcet
TimDarcet
1 year
If DINOv2's activations are well correlated to the human brain's visual representations, it must mean we are doing something right. Thanks @adeli_hossein for this work!
@adeli_hossein
Hossein Adeli
1 year
Excited to share our submission to the Algonatus 2023 challenge with @Minni1031 and @KriegeskorteLab : “Predicting brain activity using Transformers” report: code:
1
6
33
1
2
13
@TimDarcet
TimDarcet
5 months
@RylanSchaeffer I think direct public shaming is not the most productive thing to do here, so I prefer not to. But it's pretty easy to reproduce this analysis yourself if you're interested
3
0
13
@TimDarcet
TimDarcet
10 months
DINOv2 being used in prod in @PlantNetProject is the good news of the day I did not expect. If you have never used it, try this app! It's simply amazing.
@HugoGresse
Hugo Gresse
11 months
En 2023, après plus d'un an de travail, nous avons ajouté les flores de tous les pays en utilisant le travail du @tdwg . Et nous avons aussi abandonné les réseaux de neurones convolutionnels (CNN) pour passer aux transformers, d'abord BEiT, puis DinoV2 après l'été.
1
1
8
0
2
13
@TimDarcet
TimDarcet
3 months
@y0b1byte In a neighbourhood of 0, the loss is O(eps), while the Lp regularization is O(eps^p) If p>1, the loss always win (L2 reg) If p<1, the regul always wins (0 is a local minimum) If p=1, the two can balance out, creating a local minimum if the loss's gradient is low enough
Tweet media one
0
0
13
@TimDarcet
TimDarcet
1 year
9/ There's a bit to be said about the dataset too: we automatically curate 1B image to 140M with a retrieval+deduplication pipeline. More diverse than Imagenet22k, while still having better image quality and balance than uncurated datasets (YFCC, LAION, IG2B...)
Tweet media one
1
3
13
@TimDarcet
TimDarcet
3 months
@y0b1byte @Wikipedia In particular for the confusion matrix
@TimDarcet
TimDarcet
9 months
PSA: when someone asks you a question including words such as "false positive rate", 𝗱𝗼 𝗻𝗼𝘁 𝗮𝗻𝘀𝘄𝗲𝗿 𝗿𝗶𝗴𝗵𝘁 𝗮𝘄𝗮𝘆. Simply state that you know your rights, and go on wikipedia to consult the 𝔐𝔞𝔡 𝕮𝔬𝔫𝔣𝔲𝔰𝔦𝔬𝔫 𝔐𝔞𝔱𝔯𝔦𝔵 𝔬𝔣 𝕳𝔢𝔩𝔩
Tweet media one
2
16
105
1
0
12
@TimDarcet
TimDarcet
5 months
Last authors: You can guess my opinion on those 2 plots. I don't think this way of submitting is the best contribution to the community
Tweet media one
2
4
12
@TimDarcet
TimDarcet
10 months
When did the muffin/chihuaha thing transition from a harmless meme to "famous computer vision problem"? I might be wrong, but I've never seen people showing experiments where this is a failure case. Here is what CLIP (from feb 21) predicts
Tweet media one
@xwang_lk
Xin Eric Wang
10 months
The famous "Chihuahua or Muffin" problem in computer vision is considered solved by GPT-4V on social media. But really? The answer is NO. GPT-4V cannot reason well about the same images in the original "Chihuahua or Muffin" grid when they are in a different layout. I
Tweet media one
42
145
896
2
2
12
@TimDarcet
TimDarcet
10 months
DINOv2 depth estimation looks quite robust to this kind of painted optical illusion, that's cool
Tweet media one
@ducha_aiki
Dmytro Mishkin 🇺🇦
10 months
FLORIDA: Fake-looking Real Images Dataset Ali Borji tl;dr: 510 real images looking like a fake.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
14
96
0
0
12
@TimDarcet
TimDarcet
5 months
The influence of eval formatting on results is always wild to me. Shouldn't we select ~100 standard "formats" and report the average on those? It should be much more robust imo
@clefourrier
Clémentine Fourrier 🍊 - is ooo!
5 months
Follow up "eval is fun" tweet: how much do scores change depending on prompt format choice? The score range for a given model is of 10 points! :D Prompt format on the x axis, all these evals look at the logprob of either "choice A/choice B..." or "A/B...".
Tweet media one
3
10
55
1
0
12