Nicolas Carion @alcinos26 Twitter profile | Pikagi

Pikagi

Nicolas Carion

@alcinos26

1,858

Followers

228

Following

11

Media

74

Statuses

Research Scientist at Meta. Previously, post-Doc @nyuniversity , PhD student @FacebookAI Paris. RL, Transformers, Computer Vision.

New-York

https://t.co/s1on7A4N2A

Joined October 2019

Don't wanna be here? Send us removal request.

Pinned Tweet

@alcinos26

Nicolas Carion

4 years

I'm proud to share our latest work on object detection using transformers. Check it out! Our github features a colab with minimal implementation to play with. (1/4) Blog post: Code: Paper:

Tweet card media

GitHub - facebookresearch/detr: End-to-End Object Detection with Transformers

End-to-End Object Detection with Transformers. Contribute to facebookresearch/detr development by creating an account on GitHub.

@AIatMeta

AI at Meta

4 years

We are releasing Detection Transformers (DETR), an important new approach to object detection and panoptic segmentation. It’s the first object detection framework to successfully integrate Transformers as a central building block in the detection pipeline.

Tweet media one

12

500

2K

3

65

276

Last Seen Profiles

@selim_kurdistan

@zfzxm

@VeoAliens

@BccPod

@meirijinsai

@TyraNicholee

@Aklamio

@thatrose_

@joelmama25

@starryhwasa

@BeckerAdri22202

@RDsuban

@KhandharAmit

@poncowirejo

@Play21Mi

@FoundsFoun

@Swiper84379673

@meirijinsai

@msudairy

@waria_vio

@mika_marinka

@jellyfshcouncil

@BPBDblora

@bokeplokalmalam

@you_xing35836

@MoMo_photograph

@iam_bharat_87

@shadow69697

@mugen_sub_

@IrisSpectre

@SelkingYan85731

@Ducthanhtanlap

@Like_MICAH

@cukienaknikmati

@NNP_W_Light

@themdm_pod

@alcinos26

Nicolas Carion

4 years

We published the talk for our #ECCV2020 oral on DETR. Q&A session to follow Monday 24th, 12:30 pm UTC+1, come join us! Talk: Page: Code: With @fvsmassa , @szagoruyko5 , @kirillov_a_n , @syhw and Nicolas Usunier

Tweet card media

DETR - End to end object detection with transformers (ECCV2020)

This is the talk associated with the ECCV 2020 oral paper "End to end object detection using transformer" by Nicolas Carion, Francisco Massa, Gabriel Synnaev...

www.youtube.com

1

61

213

@alcinos26

Nicolas Carion

5 years

Happy to present our work on generalization in Multi-Agent RL at @NeurIPSConf next week. Spotlight at 4:35pm in track 3 and Poster #194 on Tuesday. Paper: Blog post: Source code:

@syhw

Gabriel Synnaeve

5 years

"A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning" by @alcinos26 et al. is going to be presented at NeurIPS next week, come talk to us!

0

11

32

0

30

95

@alcinos26

Nicolas Carion

4 years

Wow, someone made a web demo of DETR! Happy to see people playing around with it already 🙂 You can see for yourself that adding NMS doesn't improve DETR's detections. @plotlygraphs

@plotlygraphs

Plotly

4 years

We were super impressed by the new Detection Transformers (DETR) library, so we made this Dash GUI from start 2 finish in ~200 lines of Python Code: Demo: cc: @PyTorch @facebookai @szagoruyko5 @fvsmassa

Tweet media one

1

51

170

0

13

95

@alcinos26

Nicolas Carion

4 years

Awesome work by @HugoTouvron @quobbe @alexsablay @fvsmassa @hjegou , M. Douze to ablate and democratize transformers for image classification. No need for extra data, augmentation+distillation suffices! Comes with code built on @wightmanr 's excellent Timm library. Check it out!

@fvsmassa

Francisco Massa

4 years

I'm happy to share our latest work on training competitive Visual Transformers using only ImageNet. Code and pre-trained models: Paper: Check it out! (1/2)

2

28

170

1

12

49

@alcinos26

Nicolas Carion

3 years

Check out our latest work on multimodal understanding through modulated detection. We extend DETR to accept text as input, and show that this allows to tackle downstream vision+language tasks. 1/4

@ashkamath20

Aishwarya Kamath

3 years

Excited to share our work on Modulated Detection for End-to-End Multi-Modal Understanding. MDETR detects any objects that are mentioned in free-form text. 1/5 Website: Code: Paper: Colab:

Tweet media one

9

111

664

1

9

32

@alcinos26

Nicolas Carion

4 years

In Panoptic segmentation, our solution treats "things" (countable foreground objects) and "stuff" (background classes) exactly the same way. This novel unified approach allows the transformer to globally reason about both kind of entities. (4/4)

Tweet media one

1

1

21

@alcinos26

Nicolas Carion

4 years

We provide an end-to-end pipeline competitive with Faster-RCNN, without needing any custom layers or prior knowledge (NMS, anchors, ROI-align,…), in true deep-learning spirit. We obtain 42 AP with the same number of parameters, inference time, and half the compute (GFLOPS)(2/4)

Tweet media one

1

3

20

@alcinos26

Nicolas Carion

2 years

If you work on fine grained vison-language understanding or open world detection, check out our new benchmark TRICD! As the name implies, we expect it to fool the best current models 🙂

@ashkamath20

Aishwarya Kamath

2 years

Check out our new fine-grained vision and language understanding task (CPD) and associated benchmark - TRICD! 📢 Contextual Phrase Detection (CPD) is a single task that subsumes object detection and phrase grounding. Paper: Website:

Tweet media one

1

56

262

0

3

12

@alcinos26

Nicolas Carion

4 years

We show that the architecture and formulation can be easily extended to related tasks, such as panoptic segmentation where we achieve SOTA performance. We hope that a fully differentiable, easy to use detection pipeline will also benefit other downstream tasks (eg. VQA) (3/4)

Tweet media one

1

0

14

@alcinos26

Nicolas Carion

1 year

Amazing work from my colleagues at FAIR! A new open source (Apache 2.0) segmentation model that can segment literally *anything* (try the demo!) as well as a dataset of 1B (!!) segments on 10M images. By @kirillov_a_n @nikhilaravi and many others

Segment Anything

Meta AI Computer Vision Research

segment-anything.com

@AIatMeta

AI at Meta

1 year

Today we're releasing the Segment Anything Model (SAM) — a step toward the first foundation model for image segmentation. SAM is capable of one-click segmentation of any object from any photo or video + zero-shot transfer to other segmentation tasks ➡️

142

2K

7K

1

1

12

@alcinos26

Nicolas Carion

3 years

Built using awesome libraries from @huggingface and @wightmanr on top of @PyTorch , huge props to the open-source ecosystem! It truly accelerates research and enables bolder ideas to be tried out.

0

3

11

@alcinos26

Nicolas Carion

1 year

@phillip_isola I find this graph misleading. Stable diffusion itself is trained on billions of paired data. Shouldn't the red curve be shifted by that much to the right?

1

0

8

@alcinos26

Nicolas Carion

4 years

I just gave this new tool a try, and my first impressions are very good! Could save a lot of time while doing literature review, and potentially uncover unforeseen links between research sub-communities. And the UI is pretty slick. Give it a go and see how it works for you :)

@ConnectedPapers

Connected Papers

@ConnectedPapers

4 years

After a long beta, we are launching! Connected Papers is a unique, visual tool to help researchers and applied scientists find and explore papers relevant to their field of work.

25

373

829

0

0

7

@alcinos26

Nicolas Carion

3 years

@giffmana Approved!

Tweet media one

1

1

7

@alcinos26

Nicolas Carion

1 year

@phillip_isola Training on SD output is a form of distillation. Similar to your argument, clip models trained on LAION also already exist. Is there any evidence that distillation from a generative model as you did is superior to distilling from a contrastive one, eg the aforementioned clip?

2

0

4

@alcinos26

Nicolas Carion

3 years

@giffmana We wanted to make sure the model can navigate your home country ;)

Tweet media one

1

0

5

@alcinos26

Nicolas Carion

4 years

@tdietterich @gileshooker What about dropping the paper format, and have the contributions look like Github Pull-Requests to said wiki? Would force authors to insert their work in existing knowledge without wasting efforts on re-exposing (eg background sect.) facts already better explained somewhere else.

0

0

5

@alcinos26

Nicolas Carion

4 years

@wightmanr It got some traction lately in the NLP world (with rebranding...). Eg

Tweet card media

Reducing Transformer Depth on Demand with Structured Dropout

Overparameterized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering....

0

0

6

@alcinos26

Nicolas Carion

4 years

Thank you all for your questions during the Q&A, unfortunately we didn't have time to cover all of them. Join us in the zoom poster session on Wednesday 6-8 am or 2-4 pm (utc+1) if you want to chat.

0

1

4

@alcinos26

Nicolas Carion

3 years

Last but not least, we show that vision+language pre-training offer a way to tackle the long tail of object detection, and show good performance on the rare categories of LVIS dataset, even with a fraction of the data. You can play around with your own images in our colab! 4/4

1

2

4

@alcinos26

Nicolas Carion

3 years

We show significant improvement over state-of-the-art on various tasks, such as Referring Expression Comprehension/Segmentation and grounded detection on Flickr30k Entities (+12% accuracy over previous SOTA) 2/4

Tweet media one

1

0

3

@alcinos26

Nicolas Carion

3 years

We also fine-tune it on GQA, and obtain competitive performance. Notably, the detection of the objects mentioned in the question offers some insights on the model's reasoning process. In the illustration, the question was "What color is the train?" 3/4

Tweet media one

1

1

4

@alcinos26

Nicolas Carion

3 years

@wightmanr @NielsRogge @facebookai ViT and variants struggle with the kind of resolutions we need in detection. SWIN, on the other hand, really delivers both with DETR variants and otherwise. See for eg

1

0

3

@alcinos26

Nicolas Carion

9 months

Dream team! Gab is an amazing advisor (speaking from experience), that's a great internship opportunity if you are interested in LLMs and/or code gen

@syhw

Gabriel Synnaeve

9 months

I'm hiring Master interns at FAIR Paris to work on code generation, to work with me and the awesome CodeGen team ( @b_roziere , @jadecopet , @jnsgehring , @adiyossLC et al.). We do Code Llama and research. Candidate at and send me an email or message.

11

50

236

0

1

2

@alcinos26

Nicolas Carion

2 years

@giffmana RL for Object Detection has been around for a while. You might want to cite "Learning Globally Optimized Object Detector via Policy Gradient" Rao et al,CVPR 2018 but IIRC there are others.

1

0

2

@alcinos26

Nicolas Carion

3 years

@francoisfleuret @chriswolfvision You still need to aggregate information from all nodes, hence infinite depth to handle arbitrarily long sequences. See for a set of tasks where vanilla transformers do poorly. Ironically, the proposed fix makes transformer look like RNNs again.

0

0

2

@alcinos26

Nicolas Carion

4 years

@neozero497 @Thom_Wolf Check out our DETR paper! We showed that for object detection, attention is indeed all you need.

1

0

1

@alcinos26

Nicolas Carion

3 years

@drew_jaegle @gberta227 @icmlconf Nice work! I'd like to point out that the arch is quite similar to our decoder in DETR, with the same high level goal (dim reduction), your latent array being ~equivalent to our object queries. Maybe you could consider mentioning it in the related work if you publish a revision?

1

0

1

@alcinos26

Nicolas Carion

3 years

@drew_jaegle @gberta227 @icmlconf Woops, my phone had cached the previous revision of the paper 😅 But I feel you, I wish related work and acknowledgements didn't count towards the page limit, I don't see any good reason to limit content in those.

0

0

1

@alcinos26

Nicolas Carion

4 years

@quocleix @hieupham789 @lmthang @ZihangDai @QizheXie Impressive results! I'm curious about the use of hard pseudo labels in this work, as it intuitively makes backprop more challenging. The NoisyStudent paper claimed hard and soft perform equally well, do you have any intuition why it's not the case here?

1

0

1

@alcinos26

Nicolas Carion

3 years

@donglixp @giffmana @NielsRogge @OpenAI That's interesting, thanks for sharing! I'm curious, have you also tried non-parametric methods (maybe Kmeans?) to tokenize the patches?

1

0

1

@alcinos26

Nicolas Carion

2 years

@giffmana It would interesting to investigate if the rl step qualitatively changes the predictions (does it meaningfully move the boxes? Does it improve calibration?) or if it's just hacking AP metric(eg spamming boxes always improve AP without improving "quality" or downstream usefulness)

1

0

1

@alcinos26

Nicolas Carion

2 years

@giffmana Hum so spamming it is?

1

0

1

@alcinos26

Nicolas Carion

4 years

@pi_r_bernard Skyfall?

0

0

1

@alcinos26

Nicolas Carion

4 years

@srush_nlp @henripal It doesn't help that the "official" pytorch named_tensors are a barely usable implementation that requires verbose ugliness to workaround the rough edges and glaring holes... Sadly the project seems mostly abandoned

0

0

1