Khue Le Profile
Khue Le

@netw0rkf10w

331
Followers
128
Following
16
Media
522
Statuses

Head of R&D at . Building conversational AI by day, doing optimization research by night.

Paris, France
Joined September 2010
Don't wanna be here? Send us removal request.
@netw0rkf10w
Khue Le
5 months
Hi @aaron_defazio . Here's the result of my optimizer, compared to yours (still running). Can you beat my blue curve with hyper-parameter tuning? ;) Please give it a try using this code:
Tweet media one
@aaron_defazio
Aaron Defazio
5 months
Schedule-Free Learning We have now open sourced the algorithm behind my series of mysterious plots. Each plot was either Schedule-free SGD or Adam, no other tricks!
Tweet media one
41
238
1K
4
10
125
@netw0rkf10w
Khue Le
5 months
While waiting for @aaron_defazio 's tuning result, here's my full run of his method (green curve). Interestingly, some modifications inspired by my optimizer seem to boost its performance. Note: MAE's default hyper-params are used for all experiments.
Tweet media one
@aaron_defazio
Aaron Defazio
5 months
@netw0rkf10w Ok, I will try and run early next week.
0
0
21
8
7
61
@netw0rkf10w
Khue Le
5 years
This reminds me of this Flaubert paper: that gave proper credits to ULMFiT (by @jeremyphoward and @seb_ruder ).
Tweet media one
2
18
43
@netw0rkf10w
Khue Le
2 years
@mblondel_ml This is actually not new. We (with @inthebrownbag ) published this result at NeurIPS 2021 in our paper on Frank-Wolfe for MAP inference (), in which we provide convergence results for the general non-convex and non-smooth settings. 1/2
Tweet media one
1
5
31
@netw0rkf10w
Khue Le
5 months
@MrCatid @aaron_defazio Yes. The question is when. It really depends on the reaction of the community. If a lot of people are eager to try (as with Aaronโ€™s method) then Iโ€™ll try to release it as soon as possible. Otherwise Iโ€™m afraid itโ€™ll take time.
2
1
10
@netw0rkf10w
Khue Le
5 months
Summary of the rookie mistakes made by the Keras team:
Tweet media one
0
2
8
@netw0rkf10w
Khue Le
4 years
@A_K_Nain Call it kerax!
0
0
9
@netw0rkf10w
Khue Le
5 months
@MrCatid @aaron_defazio Why should I release it early without a proper accompanying paper if nobody cares?
1
0
9
@netw0rkf10w
Khue Le
2 years
@sp_monte_carlo 'CCCP is Frank-Wolfe in disguise' is actually a known result. Please see here:
@netw0rkf10w
Khue Le
2 years
@mblondel_ml This is actually not new. We (with @inthebrownbag ) published this result at NeurIPS 2021 in our paper on Frank-Wolfe for MAP inference (), in which we provide convergence results for the general non-convex and non-smooth settings. 1/2
Tweet media one
1
5
31
0
3
8
@netw0rkf10w
Khue Le
7 years
Our paper "Tight Continuous Relaxation of MAP Inference: A Nonconvex Perspective" has been accepted for publication at CVPR 2018! With Prof. Nikos Paragios @AgoniGrammi . PDF available soon. Code available prior to the conference. #CVPR #CVPR2018
0
2
8
@netw0rkf10w
Khue Le
7 years
CVPR 2018 received 3303 valid submissions (all-time record of #CVPR ). Acceptance rate: 979/3303 ~= 29.6% #CVPR2018 Last year's acceptance rate: 782/2680 ~= 29.2% #CVPR2017
0
2
7
@netw0rkf10w
Khue Le
5 years
Hmm... I cannot agree with @jeremyphoward on some points here: 1. LSTM was not created by Schmidhuber, but by "Hochreiter and Schmidhuber". and "Perhaps most important of all": 2. Back-propagation was invented by Seppo Linnainmaa. See
1
0
5
@netw0rkf10w
Khue Le
5 years
@ptrblck_de @PyTorch @soumithchintala This is amazing! The PyTorch community is lucky to have you. Thanks for your dedication!
0
0
3
@netw0rkf10w
Khue Le
2 years
@denny_zhou To be able to say that โ€œinfinity is not the largest numberโ€, one is required to define โ€œnumberโ€, โ€œinfinityโ€, and โ€œlargerโ€. When working this out one will see that itโ€™s not that simple. Both GPT3 and the kid can be right or wrong, depending on the definitions.
0
0
5
@netw0rkf10w
Khue Le
3 years
@gentaiscool @seb_ruder @emnlpmeeting Congratulations! I would like to point out very relevant work on a French benchmark called FLUE that appeared on arXiv on 11 December 2019: . (An evaluation server is also under construction: .)
0
1
4
@netw0rkf10w
Khue Le
4 years
@gpapamak @mpd37 @kchonyc @ChengSoonOng It seems you are describing the *derivative* instead of the *gradient*. I agree they are conceptually different. (After all it's just a convention so I have no problem accepting both, but I wouldn't say one is more correct than the other.)
Tweet media one
1
0
4
@netw0rkf10w
Khue Le
5 months
@aaron_defazio Looks very promising, congratulations! Would you mind sharing some results for ViT on ImageNet?
1
0
4
@netw0rkf10w
Khue Le
5 months
@omead_p Yes there will be a paper, in a month or so, I hope. The code may be released sooner.
1
0
4
@netw0rkf10w
Khue Le
2 years
Tweet media one
0
0
4
@netw0rkf10w
Khue Le
2 years
In our NeurIPS 2021 paper (with @inthebrownbag ) we showed that CCCP is Frank-Wolfe in disguise. Happy to see other people recently rediscovering this fact and presenting it as a striking result. Want to know another equally striking fact? Mean Field is also Frank-Wolfe!๐Ÿ‘‡
@netw0rkf10w
Khue Le
2 years
@mblondel_ml This is actually not new. We (with @inthebrownbag ) published this result at NeurIPS 2021 in our paper on Frank-Wolfe for MAP inference (), in which we provide convergence results for the general non-convex and non-smooth settings. 1/2
Tweet media one
1
5
31
0
1
4
@netw0rkf10w
Khue Le
2 years
@mblondel_ml @inthebrownbag As a byproduct we also provide a justification for the unit step-size used in CCCP (c.f. screenshot above). Basically I can say that only Section 4 of the above paper is new, compared to our results. It's worth mentioning that Mean Field inference is also Frank-Wolfe! 2/2
1
0
4
@netw0rkf10w
Khue Le
3 years
@inthebrownbag and I are happy to present our recent work on regularized Frank-Wolfe for MAP inference at #NeurIPS2021 . You are welcome at our poster spot A0 () between 9:30am-11am CET tomorrow! Paper: Code:
Tweet media one
1
1
4
@netw0rkf10w
Khue Le
3 years
@ccanonne_ IMO any non-standard result should be accompanied by a proof, at least as a footnote or an appendix. The above inequality can be proven in just 1 line (CS followed by sum >= summands), so a footnote would be appropriate in this case.
0
0
4
@netw0rkf10w
Khue Le
5 months
@aaron_defazio That's awesome, thanks! Looking forward to it.
0
0
2
@netw0rkf10w
Khue Le
1 year
Interesting ideas of using Optimal Transport for learning to align two sequences of features.
@formiel
Hang Le
1 year
Happy to share our recent work on speech translation (ST): , which will be presented at ICML 2023. We make two contributions: 1) showing that CTC can reduce the modality gap in ST pre-training... 1/2
3
5
29
0
0
3
@netw0rkf10w
Khue Le
2 years
@binhtngn @gabrielpeyre The connection to FW and the O(1/k) rate of convergence of CCCP had been shown previously in . See also this thread:
@netw0rkf10w
Khue Le
2 years
@mblondel_ml This is actually not new. We (with @inthebrownbag ) published this result at NeurIPS 2021 in our paper on Frank-Wolfe for MAP inference (), in which we provide convergence results for the general non-convex and non-smooth settings. 1/2
Tweet media one
1
5
31
0
1
2
@netw0rkf10w
Khue Le
6 years
My work is featured in @RSIPvision , but not like the other ones. Not sure whether I should be happy or sad ๐Ÿ˜‚๐Ÿ˜‚๐Ÿ˜‚
0
1
3
@netw0rkf10w
Khue Le
7 years
Neural Machine Translation (seq2seq) Tutorial
0
0
3
@netw0rkf10w
Khue Le
5 months
@aaron_defazio It's worth pointing out that PyTorch's default pre-trained ResNet50 achieved 80.858 top-1 accuracy (), so it seems your baseline is a bit weak. It could be interesting to see the results on a stronger baseline such as PyTorch's reference implementation.
1
0
3
@netw0rkf10w
Khue Le
6 years
What a bright future for AI in France: - Google is opening a new AI research lab in Paris. - DeepMind is opening a new AI research lab in Paris. - Facebook AI Research Paris is doubling their size. - Samsung is opening a new AI research lab in Paris. I am really excited!
0
1
3
@netw0rkf10w
Khue Le
8 years
โšก๏ธ โ€œTensorflow and deep learning - without a PhDโ€ by @martin_gorner
0
3
3
@netw0rkf10w
Khue Le
6 years
New MSc in Artificial Intelligence at @centralesupelec with excellent curriculum:
@AgoniGrammi
Paragios, Nikos
6 years
at the edge of academic excellence @centralesupelec introduces new #graduate #MSc on #ArtificialIntelligence Boost your career with introductory, excellence, domain-specific self-contained #AI curriculum Symbolic, subsymbolic, decisional #DataDriven #AI
Tweet media one
0
12
22
0
0
3
@netw0rkf10w
Khue Le
5 months
@aaron_defazio Awesome. Thanks for sharing, Aaron. The good news for me is that it's not the same algorithm that I have been developing for a while (there's some similarity though). Will compare and share the results soon.
1
0
3
@netw0rkf10w
Khue Le
4 years
Want to learn about scalable machine learning? The #RaySummit @raydistributed has a great program with a great speaker lineup. It's about to start, but still not too late to register:
0
0
3
@netw0rkf10w
Khue Le
4 years
@gpapamak @mpd37 @kchonyc @ChengSoonOng Screenshot taken from . (This is not to show that Wikipedia is correct, just that there exists the mentioned convention.)
1
0
2
@netw0rkf10w
Khue Le
4 years
I always have a lot of respect for authors who make their books freely available. My hatโ€™s off.
@AxlerLinear
Sheldon Axler
4 years
A new electronic version of Measure, Integration & Real Analysis is now available at . This new version dated 30 July 2020 fixes one more minor typo. The website above also has a list of all known typos in the printed version of the book. #measuretheory
Tweet media one
0
2
27
0
0
2
@netw0rkf10w
Khue Le
2 years
An ICLR 2023 submission has been accused of being a rehash of previous work, claim supported by detailed technical arguments. If true then there must be consequences. Intentional misleading contributions should not be tolerated in academic research.
0
0
2
@netw0rkf10w
Khue Le
4 years
@ducha_aiki @amy_tabb @tdietterich "Why not before?" You may think that double-blind reviewing is not necessary, but many people would disagree with that, including myself. Imagine publishing and marketing your submission during the rebuttal period as FAIR did with their DETR paper? Not a good thing to do. 2/2
1
0
2
@netw0rkf10w
Khue Le
5 months
@giffmana "multi-crop eval": Are you referring to "test-time augmentation"?
1
0
2
@netw0rkf10w
Khue Le
3 years
We would like to thank @Genci_fr for their support!
0
0
2
@netw0rkf10w
Khue Le
5 months
@danyallah_ @aaron_defazio Yes. I'll share the results gradually. Let me know if there's a specific problem that you want me to try on.
0
0
2
@netw0rkf10w
Khue Le
2 years
@mblondel_ml @inthebrownbag "vanilla FW on an epigraph reformulation" is **precisely** generalized FW. They are exactly the same. If you write Prob. (3.1) under form (3.2) with variable t, then iteration (3.3) can also be write back to the form without t, and that's generalized FW. Please let me know 1/2
1
0
2
@netw0rkf10w
Khue Le
4 years
@ducha_aiki @amy_tabb @tdietterich Obviously until the authors want to reveal their names, depending on their next submission. If a paper is on arXiv, people know it exists and can cite it, it cannot be scooped. Thus, I'm not sure your arguments about rejects are valid. 1/2
1
0
2
@netw0rkf10w
Khue Le
7 years
#CVPR 2018 reviews are out! Several things have changed this year: 4 reviews, different preliminary rating levels. What else?
0
0
2
@netw0rkf10w
Khue Le
4 years
TIL @github has a great new feature: Discussions. Now any repo is basically an online forum. This is really awesome! Separating Discussions from Issues makes it much more easier to manage the content, especially for popular repos.
Tweet media one
0
0
2
@netw0rkf10w
Khue Le
5 years
@s_requena @InriaParisNLP @CNRS This FlauBERT used your Jean Zay cluster :D
2
0
2
@netw0rkf10w
Khue Le
7 years
CVPR 2018 accepted papers: #cvpr #cvpr2018
0
0
2
@netw0rkf10w
Khue Le
5 years
After having been phenomenal in NLP, now it's time for self-supervised learning to conquer Computer Vision.
@OriolVinyalsML
Oriol Vinyals
5 years
Rapid unsupervised learning progress thanks to contrastive losses, approaching supervised learning! -40% Multitask SSLย ย (2017) -50% CPCย ย (2018) -70% AMDIM/MOCO/CPCv2/etc (2019) -76.5% SimCLRย ย (2020, so far)
Tweet media one
6
206
639
0
0
2
@netw0rkf10w
Khue Le
4 years
@fbk_mt @laurent_besacie Vietnamese ๐Ÿ˜๐Ÿ˜๐Ÿ˜
0
0
2
@netw0rkf10w
Khue Le
7 years
A good blog post for understanding Faster R-CNN in detail. Also good for understanding key components of modern Object Detection systems. Faster R-CNN: Down the rabbit hole of modern object detection - Tryolabs Blog
0
0
2
@netw0rkf10w
Khue Le
4 years
TIL @IJCAIconf 's first conference was held in 1969! Probably the conference with the longest history of the field (all AI, vision, NLP included). The entire proceedings FROM 1969 are available for download: . IMPRESSIVE!
1
1
2
@netw0rkf10w
Khue Le
6 years
Last admission session of @centralesupelec 's MSc in Artificial Intelligence is on June 25th:
@centralesupelec
CentraleSupรฉlec
6 years
Deadline to our last admission session of our MSc in #ArtificialIntelligence is approaching (June 25th)! โ—€๏ธโ—€๏ธโ–ถ๏ธโ–ถ๏ธ #machinelearning #MachineIntelligence #deeplearning , #reinforcementlearning #decisionalAI #graphicalmodels #informationtheory
Tweet media one
0
7
10
0
0
2
@netw0rkf10w
Khue Le
5 months
@Jacoed @aaron_defazio Thatโ€™s already more than enough for this experiment. You can try it yourself if you want longer training, the code is here:
0
0
2
@netw0rkf10w
Khue Le
5 months
@tienhaophung @aaron_defazio Started looking at this today but it's getting crazy. Training time is just too high (~40 days on 8 A100 GPUs??). And too many red flags in the code: no AMP, even no training resume (wtf?). Please suggest a more reasonable task with a better quality codebase.
Tweet media one
Tweet media two
1
0
2
@netw0rkf10w
Khue Le
5 years
@s_requena @InriaParisNLP @CNRS It is stated in the paper that the model was trained on (only) 32 GPUs for 410 hours. The authors will probably be able to provide more details on this. @formiel @laurent_besacie @didier_schwab
1
0
2
@netw0rkf10w
Khue Le
4 years
A nice summary of the (many) models available in the transformers library.
@GuggerSylvain
Sylvain Gugger
4 years
There are a lot of models in the transformers๐Ÿค— repo. Feeling lost? I know I was, so I made a little high-level summary of the differences between each model.
24
273
1K
0
0
2
@netw0rkf10w
Khue Le
7 years
@hbou @ylecun Les modรจles sont publiรฉs aussi:
1
2
2
@netw0rkf10w
Khue Le
4 years
@dennybritz Few-shot learning is hard. Good luck!
0
0
2
@netw0rkf10w
Khue Le
7 years
Semantic Segmentation using Fully Convolutional Networks over the years + PyTorch code
0
0
2
@netw0rkf10w
Khue Le
5 years
CVPRโ€™20 decisions are available.
@greg_mori
Greg Mori
5 years
CVPR 2020 paper decisions are now available, 1470 papers accepted: A huge thank you to everyone involved -- 198 Area Chairs, 3664 reviewers, and the authors of the 6656 papers. #cvpr #ai #ML
7
93
426
0
0
1
@netw0rkf10w
Khue Le
6 years
@fhuszar @roydanroy @jeremyphoward @tdietterich @fchollet "he blocked the last time I disagreed with him" What?? ๐Ÿ˜ฎ
1
0
1
@netw0rkf10w
Khue Le
7 years
Nice blog post about mixup, a recent data augmentation technique for neural networks.
@fhuszar
Ferenc Huszรกr
7 years
By popular demand: my thoughts on the mixup data-augmentation technique
13
73
274
0
0
1
@netw0rkf10w
Khue Le
7 years
@hardmaru It is currently faster than MathJax, and thatโ€™s all. @MathJax supports a larger set of commands/environments. And the upcoming MathJax 3 will have a significant boost in performance.
0
0
1
@netw0rkf10w
Khue Le
2 years
@mblondel_ml @inthebrownbag actually NOT that vanilla, sorry for the typo. 2/2
1
0
1
@netw0rkf10w
Khue Le
4 years
@ducha_aiki @amy_tabb @tdietterich Maybe I misunderstood it but you proposed "hands off arXiv", which is a "solution" (that is, for me, better than completely banning arXiv, so I think we agree on this). I'm glad that you also agree with my proposal ;)
1
0
1
@netw0rkf10w
Khue Le
1 year
@konstmish I forgot to include the reference: Apologies if Iโ€™ve overlooked something. @prof_grimmer
1
0
1
@netw0rkf10w
Khue Le
4 years
For comparison, CVPR started in 1983 but only digitalized proceedings of 1988 and later are available:
0
0
1
@netw0rkf10w
Khue Le
4 years
@ParcolletT @UnivAvignon I was essentially referring to the US system where there's a clear distinction between assistant/associate. I think the conversion is fair enough (in the sense that the typical order should be: PhD -> Assistant -> Associate --> Full). That's only a personal perspective though ;)
0
0
0
@netw0rkf10w
Khue Le
1 year
@LabrakYanis @ParcolletT Are you aware of FlauBERT? Itโ€™s concurrent work to CamemBERT and is highly relevant to your work.
1
0
1
@netw0rkf10w
Khue Le
6 years
2017 impact factor of computer vision journals! IJCV has surpassed PAMI!
0
0
1
@netw0rkf10w
Khue Le
2 years
@theshawwn Could you share what your inspiration was? I have been working on this for more than a year (cc @inthebrownbag ), maybe we had the same inspiration ;) Actually what you described will not work well, it lacks an important step.
1
0
1
@netw0rkf10w
Khue Le
7 years
Elsevier in a nutshell: "I review for them for free, then I pay them to publish my paper, then my university pays them so I can read my own paper." (quote from some professor)
0
0
1
@netw0rkf10w
Khue Le
4 years
@ducha_aiki @amy_tabb @tdietterich There's no contradiction. NLP conferences have an anonymity period of 1 month for a good reason. While strict double-blind is important, forbidding completely authors to publish their work on arXiv before submission have downsides (some of your post's arguments apply here) 1/2
1
0
1
@netw0rkf10w
Khue Le
4 years
Linear Algebra Done Right has been made freely available by Springer. This is a beautiful book in every sense.
0
1
1
@netw0rkf10w
Khue Le
5 years
@nikiparmar09 Could you please update the arXiv paper with the camera-ready version? The hyperlinks in the NeurIPS version are not clickable. Thanks.
0
0
1
@netw0rkf10w
Khue Le
6 years
@AgoniGrammi @essec @centralesupelec @MIT @UCBerkeley @Stanford @gchevil Excellent program! Last admission session deadline was on April 12th ;) Fingers crossed for my wife...
0
0
1
@netw0rkf10w
Khue Le
6 years
YOLOv3 is out (fast object detection system). The technical report is very informal, but easy and fun to read: Code:
0
0
1
@netw0rkf10w
Khue Le
6 years
@zacharylipton I saw a lot of errors like this on Google Scholar. I reported to them several times months ago but apparently those errors are still there. On their support page it is said that they cannot correct the errors โ€œmanuallyโ€ :)
0
0
1
@netw0rkf10w
Khue Le
5 months
@jeankaddour @SunnySanyal9 @aaron_defazio One can study any existing techniques, but it should be made clear that those are existing. Averaging the last checkpoints, in particular, has been used extensively by the community (especially in NLP & speech, even before Transformer). Giving it a new name seems misleading IMO.
2
0
0
@netw0rkf10w
Khue Le
6 years
Semantic Image Segmentation with Tensorflow: Google DeepLab-v3+ official code and models
1
0
1
@netw0rkf10w
Khue Le
4 years
0
0
1
@netw0rkf10w
Khue Le
5 years
@tscholak @stanfordnlp @chrmanning You mean Clark ( @clark_kev ) et al.? Thereโ€™s a difference ;)
1
0
1
@netw0rkf10w
Khue Le
7 years
#CVPR 2018 received 3303 valid submissions, breaking the record of CVPR 2017's 2680 submissions.
0
1
1
@netw0rkf10w
Khue Le
5 years
@bermanmaxim @cvpr2020 Congratulations Maxim! Itโ€™s time for me to catch up my (former) labmates. @ebelilov @puneetdokania
0
0
1
@netw0rkf10w
Khue Le
2 years
@kjoshanssen Maybe just simply remove the second sentence? "By Theorem 1, there exists an x with property P(x)."
0
0
1
@netw0rkf10w
Khue Le
5 months
@zacharynado @aaron_defazio Thanks. Whatโ€™s the deadline for it to be added to the leaderboard?
1
0
1
@netw0rkf10w
Khue Le
7 years
Thrilled to see a boost in performance of @PyTorch in the upcoming months!
@soumithchintala
Soumith Chintala
7 years
It's GREAT to learn that we are slower, gives us easy room to improve. Thanks a lot to @gneubig and dynet. Details:
1
14
99
0
0
1
@netw0rkf10w
Khue Le
4 years
Turns out this "professor" doesn't know how to do research.
@steve_hanke
Steve Hanke
4 years
Contrary to an image I posted last week, #Vietnam ย turns out to have a "perfect" record in its fight against coronavirus. Official Vietnamese data indicate that Vietnam has not suffered any coronavirus fatalities--zero (0) deaths.
119
16
75
0
0
1
@netw0rkf10w
Khue Le
5 months
@SunnySanyal9 @aaron_defazio @jeankaddour I understand that you cited a 2022 paper for "latest weight averaging" because that's your co-author's paper, but it should be pointed out that this technique has been standard practice since many years. You can find it in, e.g., the Transformer paper (Vaswani et al. 2017).
1
0
1
@netw0rkf10w
Khue Le
6 years
CVPR 2018 papers are available for free:
0
0
1
@netw0rkf10w
Khue Le
7 years
@drfeifei I love how you care about your students :)
0
0
1
@netw0rkf10w
Khue Le
4 years
@ducha_aiki @amy_tabb @tdietterich Therefore, they proposed a compromise, while figuring out the best solution. A perfect solution may not exist, but there are ones that are better than others. I proposed one, which I think better than many others (including yours).
1
0
1
@netw0rkf10w
Khue Le
4 years
Woah, such an interesting idea! I feel that this could work, if implemented properly...
@barbara_plank
Barbara Plank
6 years
@ZeerakW @haldaume3 @rktamplayo @kchonyc @emilymbender @zehavoc @thedansimonson @lintool @xtimv @redpony @aria42 yes, you remember right! Thx, @ZeerakW - abandon all deadlines and instead have a rolling one each month -- conferences are then there to present papers as they accumulate. All reviewers in one pot
0
2
9
0
0
1
@netw0rkf10w
Khue Le
7 years
Videos of CVPR orals and spotlights are available:
0
0
1
@netw0rkf10w
Khue Le
5 months
@tienhaophung @aaron_defazio Great, thanks. I'll try that in a few days.
0
0
1
@netw0rkf10w
Khue Le
1 year
Fantastic work! Congrats @TimDarcet and colleagues!
@TimDarcet
TimDarcet
1 year
1/ This week we released DINOv2: a series of general vision encoders pretrained without supervision. Good out-of-the-box performance on a variety of domains, matching or surpassing other publicly available encoders.
5
117
696
0
0
1
@netw0rkf10w
Khue Le
7 years
Reinforcement Learning: An Introduction, by Richard S. Sutton and Andrew G. Barto Complete draft of the 2nd edition:
0
0
1