Nicolas Carion Profile
Nicolas Carion

@alcinos26

1,858
Followers
228
Following
11
Media
74
Statuses

Research Scientist at Meta. Previously, post-Doc @nyuniversity , PhD student @FacebookAI Paris. RL, Transformers, Computer Vision.

New-York
Joined October 2019
Don't wanna be here? Send us removal request.
Pinned Tweet
@alcinos26
Nicolas Carion
4 years
I'm proud to share our latest work on object detection using transformers. Check it out! Our github features a colab with minimal implementation to play with. (1/4) Blog post: Code: Paper:
@AIatMeta
AI at Meta
4 years
We are releasing Detection Transformers (DETR), an important new approach to object detection and panoptic segmentation. It’s the first object detection framework to successfully integrate Transformers as a central building block in the detection pipeline.
Tweet media one
12
500
2K
3
65
276
@alcinos26
Nicolas Carion
5 years
Happy to present our work on generalization in Multi-Agent RL at @NeurIPSConf next week. Spotlight at 4:35pm in track 3 and Poster #194 on Tuesday. Paper: Blog post: Source code:
@syhw
Gabriel Synnaeve
5 years
"A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning" by @alcinos26 et al. is going to be presented at NeurIPS next week, come talk to us!
0
11
32
0
30
95
@alcinos26
Nicolas Carion
4 years
Wow, someone made a web demo of DETR! Happy to see people playing around with it already 🙂 You can see for yourself that adding NMS doesn't improve DETR's detections. @plotlygraphs
@plotlygraphs
Plotly
4 years
We were super impressed by the new Detection Transformers (DETR) library, so we made this Dash GUI from start 2 finish in ~200 lines of Python Code: Demo: cc: @PyTorch @facebookai @szagoruyko5 @fvsmassa
Tweet media one
1
51
170
0
13
95
@alcinos26
Nicolas Carion
4 years
Awesome work by @HugoTouvron @quobbe @alexsablay @fvsmassa @hjegou , M. Douze to ablate and democratize transformers for image classification. No need for extra data, augmentation+distillation suffices! Comes with code built on @wightmanr 's excellent Timm library. Check it out!
@fvsmassa
Francisco Massa
4 years
I'm happy to share our latest work on training competitive Visual Transformers using only ImageNet. Code and pre-trained models: Paper: Check it out! (1/2)
2
28
170
1
12
49
@alcinos26
Nicolas Carion
3 years
Check out our latest work on multimodal understanding through modulated detection. We extend DETR to accept text as input, and show that this allows to tackle downstream vision+language tasks. 1/4
@ashkamath20
Aishwarya Kamath
3 years
Excited to share our work on Modulated Detection for End-to-End Multi-Modal Understanding. MDETR detects any objects that are mentioned in free-form text. 1/5 Website: Code: Paper: Colab:
Tweet media one
9
111
664
1
9
32
@alcinos26
Nicolas Carion
4 years
In Panoptic segmentation, our solution treats "things" (countable foreground objects) and "stuff" (background classes) exactly the same way. This novel unified approach allows the transformer to globally reason about both kind of entities. (4/4)
Tweet media one
1
1
21
@alcinos26
Nicolas Carion
4 years
We provide an end-to-end pipeline competitive with Faster-RCNN, without needing any custom layers or prior knowledge (NMS, anchors, ROI-align,…), in true deep-learning spirit. We obtain 42 AP with the same number of parameters, inference time, and half the compute (GFLOPS)(2/4)
Tweet media one
1
3
20
@alcinos26
Nicolas Carion
2 years
If you work on fine grained vison-language understanding or open world detection, check out our new benchmark TRICD! As the name implies, we expect it to fool the best current models 🙂
@ashkamath20
Aishwarya Kamath
2 years
Check out our new fine-grained vision and language understanding task (CPD) and associated benchmark - TRICD! 📢 Contextual Phrase Detection (CPD) is a single task that subsumes object detection and phrase grounding. Paper: Website:
Tweet media one
1
56
262
0
3
12
@alcinos26
Nicolas Carion
4 years
We show that the architecture and formulation can be easily extended to related tasks, such as panoptic segmentation where we achieve SOTA performance. We hope that a fully differentiable, easy to use detection pipeline will also benefit other downstream tasks (eg. VQA) (3/4)
Tweet media one
1
0
14
@alcinos26
Nicolas Carion
1 year
Amazing work from my colleagues at FAIR! A new open source (Apache 2.0) segmentation model that can segment literally *anything* (try the demo!) as well as a dataset of 1B (!!) segments on 10M images. By @kirillov_a_n @nikhilaravi and many others
@AIatMeta
AI at Meta
1 year
Today we're releasing the Segment Anything Model (SAM) — a step toward the first foundation model for image segmentation. SAM is capable of one-click segmentation of any object from any photo or video + zero-shot transfer to other segmentation tasks ➡️
142
2K
7K
1
1
12
@alcinos26
Nicolas Carion
3 years
Built using awesome libraries from @huggingface and @wightmanr on top of @PyTorch , huge props to the open-source ecosystem! It truly accelerates research and enables bolder ideas to be tried out.
0
3
11
@alcinos26
Nicolas Carion
1 year
@phillip_isola I find this graph misleading. Stable diffusion itself is trained on billions of paired data. Shouldn't the red curve be shifted by that much to the right?
1
0
8
@alcinos26
Nicolas Carion
4 years
I just gave this new tool a try, and my first impressions are very good! Could save a lot of time while doing literature review, and potentially uncover unforeseen links between research sub-communities. And the UI is pretty slick. Give it a go and see how it works for you :)
@ConnectedPapers
Connected Papers
4 years
After a long beta, we are launching! Connected Papers is a unique, visual tool to help researchers and applied scientists find and explore papers relevant to their field of work.
25
373
829
0
0
7
@alcinos26
Nicolas Carion
3 years
@giffmana Approved!
Tweet media one
1
1
7
@alcinos26
Nicolas Carion
1 year
@phillip_isola Training on SD output is a form of distillation. Similar to your argument, clip models trained on LAION also already exist. Is there any evidence that distillation from a generative model as you did is superior to distilling from a contrastive one, eg the aforementioned clip?
2
0
4
@alcinos26
Nicolas Carion
3 years
@giffmana We wanted to make sure the model can navigate your home country ;)
Tweet media one
1
0
5
@alcinos26
Nicolas Carion
4 years
@tdietterich @gileshooker What about dropping the paper format, and have the contributions look like Github Pull-Requests to said wiki? Would force authors to insert their work in existing knowledge without wasting efforts on re-exposing (eg background sect.) facts already better explained somewhere else.
0
0
5
@alcinos26
Nicolas Carion
4 years
Thank you all for your questions during the Q&A, unfortunately we didn't have time to cover all of them. Join us in the zoom poster session on Wednesday 6-8 am or 2-4 pm (utc+1) if you want to chat.
0
1
4
@alcinos26
Nicolas Carion
3 years
Last but not least, we show that vision+language pre-training offer a way to tackle the long tail of object detection, and show good performance on the rare categories of LVIS dataset, even with a fraction of the data. You can play around with your own images in our colab! 4/4
1
2
4
@alcinos26
Nicolas Carion
3 years
We show significant improvement over state-of-the-art on various tasks, such as Referring Expression Comprehension/Segmentation and grounded detection on Flickr30k Entities (+12% accuracy over previous SOTA) 2/4
Tweet media one
1
0
3
@alcinos26
Nicolas Carion
3 years
We also fine-tune it on GQA, and obtain competitive performance. Notably, the detection of the objects mentioned in the question offers some insights on the model's reasoning process. In the illustration, the question was "What color is the train?" 3/4
Tweet media one
1
1
4
@alcinos26
Nicolas Carion
3 years
@wightmanr @NielsRogge @facebookai ViT and variants struggle with the kind of resolutions we need in detection. SWIN, on the other hand, really delivers both with DETR variants and otherwise. See for eg
1
0
3
@alcinos26
Nicolas Carion
9 months
Dream team! Gab is an amazing advisor (speaking from experience), that's a great internship opportunity if you are interested in LLMs and/or code gen
@syhw
Gabriel Synnaeve
9 months
I'm hiring Master interns at FAIR Paris to work on code generation, to work with me and the awesome CodeGen team ( @b_roziere , @jadecopet , @jnsgehring , @adiyossLC et al.). We do Code Llama and research. Candidate at and send me an email or message.
11
50
236
0
1
2
@alcinos26
Nicolas Carion
2 years
@giffmana RL for Object Detection has been around for a while. You might want to cite "Learning Globally Optimized Object Detector via Policy Gradient" Rao et al,CVPR 2018 but IIRC there are others.
1
0
2
@alcinos26
Nicolas Carion
3 years
@francoisfleuret @chriswolfvision You still need to aggregate information from all nodes, hence infinite depth to handle arbitrarily long sequences. See for a set of tasks where vanilla transformers do poorly. Ironically, the proposed fix makes transformer look like RNNs again.
0
0
2
@alcinos26
Nicolas Carion
4 years
@neozero497 @Thom_Wolf Check out our DETR paper! We showed that for object detection, attention is indeed all you need.
1
0
1
@alcinos26
Nicolas Carion
3 years
@drew_jaegle @gberta227 @icmlconf Nice work! I'd like to point out that the arch is quite similar to our decoder in DETR, with the same high level goal (dim reduction), your latent array being ~equivalent to our object queries. Maybe you could consider mentioning it in the related work if you publish a revision?
1
0
1
@alcinos26
Nicolas Carion
3 years
@drew_jaegle @gberta227 @icmlconf Woops, my phone had cached the previous revision of the paper 😅 But I feel you, I wish related work and acknowledgements didn't count towards the page limit, I don't see any good reason to limit content in those.
0
0
1
@alcinos26
Nicolas Carion
4 years
@quocleix @hieupham789 @lmthang @ZihangDai @QizheXie Impressive results! I'm curious about the use of hard pseudo labels in this work, as it intuitively makes backprop more challenging. The NoisyStudent paper claimed hard and soft perform equally well, do you have any intuition why it's not the case here?
1
0
1
@alcinos26
Nicolas Carion
3 years
@donglixp @giffmana @NielsRogge @OpenAI That's interesting, thanks for sharing! I'm curious, have you also tried non-parametric methods (maybe Kmeans?) to tokenize the patches?
1
0
1
@alcinos26
Nicolas Carion
2 years
@giffmana It would interesting to investigate if the rl step qualitatively changes the predictions (does it meaningfully move the boxes? Does it improve calibration?) or if it's just hacking AP metric(eg spamming boxes always improve AP without improving "quality" or downstream usefulness)
1
0
1
@alcinos26
Nicolas Carion
2 years
@giffmana Hum so spamming it is?
1
0
1
@alcinos26
Nicolas Carion
4 years
0
0
1
@alcinos26
Nicolas Carion
4 years
@srush_nlp @henripal It doesn't help that the "official" pytorch named_tensors are a barely usable implementation that requires verbose ugliness to workaround the rough edges and glaring holes... Sadly the project seems mostly abandoned
0
0
1