Oleksii Kuchaiev Profile Banner
Oleksii Kuchaiev Profile
Oleksii Kuchaiev

@kuchaev

1,095
Followers
678
Following
33
Media
550
Statuses

Director, AI model alignment @NVIDIA

in the cloud
Joined February 2010
Don't wanna be here? Send us removal request.
@kuchaev
Oleksii Kuchaiev
2 months
Today we are happy to release best open models for synthetic data generation. 340B parameters, includes base, instruct and reward models. As well as new human preference dataset HelpSteer2. 340B-Reward model is #1 on the Reward Bench leaderboard.
18
84
460
@kuchaev
Oleksii Kuchaiev
5 months
In LLM pre training, curating and preparing data is perhaps the most impactful step. NeMo data curator is now open source with lots of features you will need. We used it to curate trillions of tokens for our own models training.
3
43
233
@kuchaev
Oleksii Kuchaiev
2 months
For those wondering, "june-chatbot" on @lmsysorg is exactly this model posted on @huggingface
4
27
142
@kuchaev
Oleksii Kuchaiev
2 years
NeMo speech recognition models published on @huggingface hub are now at the top of HF Speech Bench for all languages where we published the models so far: English, German, French and Chinese. Often beating other models without external LMs and with fewer parameters. #DeepLearning
2
25
127
@kuchaev
Oleksii Kuchaiev
9 months
Our team just released a new dataset for LLM model alignment, called HelpSteer on @huggingface Hub under CC-BY-4 license! This one should be used for reward model training, especially with SteerLM method.
3
28
96
@kuchaev
Oleksii Kuchaiev
11 months
Our team is happy to share SteerLM, a simpler alternative to RLHF which allows dynamic model controls during inference (humor, verbosity, etc.). To appear in Findings of EMNLP 2023. It is implemented in NeMo (open-source) and example model is on HF
Tweet media one
2
21
92
@kuchaev
Oleksii Kuchaiev
4 years
Very happy to share our latest NeMo release. We re-designed NeMo to work with @PyTorchLightnin and @Hydra_Framework projects from @PyTorch ecosystem. Train your own #ASR , #NLP and #TTS models or re-use one of the many pre-trained models we have.
@PyTorch
PyTorch
4 years
NeMo, @NVIDIA ’s open-source toolkit based on #PyTorch , allows you to quickly build, train, and fine-tune conversational AI models. See how speech recognition, natural language processing and speech synthesis can be improved in this tutorial:
0
177
598
1
20
56
@kuchaev
Oleksii Kuchaiev
2 months
@agihippo I guess you missed the part where we used 1000x less human data (10K vs 10M) for alignment than llama3. this is about synthetic data generation, literally says so in the blogpost. Also we released reward model and training data for it, all under commercial friendly license.
1
1
52
@kuchaev
Oleksii Kuchaiev
2 months
0
8
51
@kuchaev
Oleksii Kuchaiev
4 months
@srush_nlp Maybe it is because de-noising objective is "wasting" tokens compared to autoregressive models. E.g. when you mask 15% of tokens, then after 1 epoch you've backpropogated loss from only 15% of your tokens, compared to 100% in next token prediction loss.
2
4
49
@kuchaev
Oleksii Kuchaiev
2 months
Nemotron-4-340B-*Reward* model is now available via API on :) Give it a try.
0
12
37
@kuchaev
Oleksii Kuchaiev
2 months
Models on @huggingface under Nvidia Open Model License
0
4
36
@kuchaev
Oleksii Kuchaiev
9 months
Check out llama2-70B-SteerLM model which gets 7.54 on MT-bench. This model is NOT using outputs of stronger (ChatGPT) models during alignment which allowed us to keep llama2 license. Try now on NGC Catalog . Also on @huggingface hub
1
5
26
@kuchaev
Oleksii Kuchaiev
2 years
NeMo now has Ukrainian speech recognition model on @huggingface hub. This is a CitriNet model tuned by our intern working from Kyiv As of today, I think, this is the best Ukrainian ASR model freely available.
Tweet media one
4
5
25
@kuchaev
Oleksii Kuchaiev
2 years
Oh wow! Look what model is #1 on @huggingface speech bench for English speech recognition
@HaseoX94
Somshubra Majumdar
2 years
The largest NeMo ASR model is finally public on @huggingface ! This is a 600 M params Conformer Transducer X-Large, probably the largest public checkpoint trained with multiple datasets.
2
19
88
0
3
22
@kuchaev
Oleksii Kuchaiev
23 days
@paulg I grew up near Chernobyl and have always belived that its most devastating impact is a subsequent push back against nuclear power. Interesting fact - the Chernobyl station kept working after disaster (1986) until 1999 when it was fully shut down under the Western pressure.
1
2
22
@kuchaev
Oleksii Kuchaiev
5 years
A paper about our latest speech recognition model - QuartzNet has been accepted to #ICASSP 2020. Head over to for implementation and pretrained models. #DeepLearning #asr
0
5
21
@kuchaev
Oleksii Kuchaiev
2 months
@fleetingbits @lmsysorg @NVIDIAAI Another way to put it: we used 1000x less (10K vs 10M) human data for alignment than llama3 by using synthetic data. This release is about synthetic data generation which is why our license explicitly allows it.
0
0
21
@kuchaev
Oleksii Kuchaiev
2 months
@lmsysorg @NVIDIAAI With a license that permits synthetic data generation and commercial use.
1
1
20
@kuchaev
Oleksii Kuchaiev
2 months
@fchollet people claiming that some LLM is at "high schooler" or even "kindergartner" level should spend more time around kids. No AI system today is near the level of even 3 year old when it comes to general intelligence. Moreover, the progress towards that level is unclear.
0
0
19
@kuchaev
Oleksii Kuchaiev
3 years
Just finished listening to "Viral" audiobook by @Ayjchan and @mattwridley . This is an excellent account of the most likely origins of COVID-19 pandemic. A must read for anyone (which really should be everyone) interested in the the origins of #COVID19
0
3
16
@kuchaev
Oleksii Kuchaiev
2 years
Two very real steps anyone in the world can help: 1) Consider donating to humanitarian relief efforts, such as @razomforukraine and there are many others. If your company has a match - please make sure you make use of that. #UkraineRussiaWar
1
7
16
@kuchaev
Oleksii Kuchaiev
1 year
@mattrickard you missed NeMo models (1, 5 and 20B GPTs with commercialy friendly license)
1
2
16
@kuchaev
Oleksii Kuchaiev
2 months
@geoff_l generating synthetic data for alignment of smaller models is key use case we have in mind.
1
0
15
@kuchaev
Oleksii Kuchaiev
2 years
#ChatGPT is very impressive! In the dialogue below, it comes up with a suboptimal solution, and argues a little without admitting a mistake. Then takes a hint, admits the mistake, and fixes its solution!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
4
14
@kuchaev
Oleksii Kuchaiev
9 months
@natolambert @srush_nlp Yes, PPO is much more expensive then DPO in terms of infra but in all our experiments so far, on the same data (and no online setting) PPO>DPO on MT-bench.
3
0
13
@kuchaev
Oleksii Kuchaiev
2 years
This holiday seasons please consider supporting Ukraine which is being attacked by the terrorist federation. Today 76 rockets were fired at critical civilian infrastructure with typical terrorist intents of generating fear and humanitarian disaster.
3
3
11
@kuchaev
Oleksii Kuchaiev
4 years
Thank you, Mozilla Common Voice! (New ASR models in NeMo are coming ;) ) —— 2020 End-of-Year Common Voice Dataset Release - Common Voice - Mozilla Discourse
1
2
9
@kuchaev
Oleksii Kuchaiev
1 year
@_willfalcon You missed some of the open-source, commercially friendly (CC-BY-4) models built using lightning :)
1
0
8
@kuchaev
Oleksii Kuchaiev
9 months
Our team has studied the tradeoffs between performance and the number of trainable params in LoRA. This work would be especially useful to those building and scaling AI customization services. Great work by @rendu_a and Tugrul Konuk
0
2
8
@kuchaev
Oleksii Kuchaiev
3 years
We have an exciting new job opportunity for NLP researcher on our team. Please check job description and apply here if interested #NLProc #NLP #OpenSource
0
5
8
@kuchaev
Oleksii Kuchaiev
7 months
Our team is hiring Sr. Applied Scientists to work on AI model alignment and customization (text and multimodal). If you have strong track record and experience with: LLMs or RL or multi-modal, please apply. Can be in-person or remote. #NLP #hiring
1
4
8
@kuchaev
Oleksii Kuchaiev
4 months
Latest release of NeMo-Aligner adds TRT-LLM integration which speeds up RLHF rollouts up to 7x compared to pure Pytorch implementation
0
0
7
@kuchaev
Oleksii Kuchaiev
8 months
I was taught that trees aren't supposed to have any cycles!
Tweet media one
0
0
7
@kuchaev
Oleksii Kuchaiev
19 days
@karpathy It seems like the recipe to achieving superhuman performance in domain X is now well known: have perfect reward model in X (e.g. game rules, physics) + Good transformer-based heuristic + graph reasoning (MCTS/A*, etc.). Perfect reward model for LLMs would indeed be a game changer
0
0
7
@kuchaev
Oleksii Kuchaiev
2 years
DALLE-mini by @borisdayma and team is pretty good!
Tweet media one
0
0
6
@kuchaev
Oleksii Kuchaiev
3 years
@sergeykarayev We do (much more to come in the next months)
0
1
7
@kuchaev
Oleksii Kuchaiev
9 months
Santa Barbara weather never disappoints
Tweet media one
Tweet media two
0
0
7
@kuchaev
Oleksii Kuchaiev
9 months
@alexgraveley yes, currently very limited on human preference data. You might want to add new dataset we've published this week which can be used with DPO. Btw, in most of our experiments SFT < DPO < SteerLM <= PPO. So while simple, DPO lags behind PPO and SteerLM.
2
0
7
@kuchaev
Oleksii Kuchaiev
13 days
@natolambert or we can just drop “R” from RLHF for “Learning from Human Feedback”
1
0
6
@kuchaev
Oleksii Kuchaiev
9 months
@srush_nlp @natolambert Some experiments are summarized in Table 3 from this paper (though it isn't the main point of the paper).
Tweet media one
0
0
6
@kuchaev
Oleksii Kuchaiev
7 months
Congrats Som and entire NeMo Speech team! Top 4 models on @huggingface ASR leaderboard are now NeMo models, pushing Whisper-large to #5 .
@HaseoX94
Somshubra Majumdar
7 months
To read up about Token and Duration Transducers, visit the TDT is research from our team that enabled ASR inference at speeds approaching Non Autoregressive inference, but with an Autoregressive model !
0
2
8
1
0
6
@kuchaev
Oleksii Kuchaiev
2 years
@sama detailed paper about it
0
0
6
@kuchaev
Oleksii Kuchaiev
5 years
On Friday (12/13), I will be presenting NeMo at MLSys Workshop at #NeurIPS2019 . NeMo is a toolkit for building conversational AI (ASR, NLP and TTS) applications. If you are at #NeurIPS and are interested in this work ping me to have a chat. #ai
1
0
6
@kuchaev
Oleksii Kuchaiev
9 months
I'm heading to Singapore for #EMNLP2023 DM if you want to chat about model alignment and customization
0
0
6
@kuchaev
Oleksii Kuchaiev
4 years
Watch @poonamchitale and @_willfalcon doing NeMo walk-through and discussing our collaboration
@LightningAI
Lightning AI ⚡️
4 years
You can build, train and fine-tune speech recognition, TTS, and NLP SOTA models at scale with just a few lines of code with @nvidia NeMo and Lightning.
0
25
83
0
4
6
@kuchaev
Oleksii Kuchaiev
4 years
@lexfridman @navalny you should moderate an interview with @elonmusk and @navalny instead of Putin. Because Putin will not age well in history books.
0
0
6
@kuchaev
Oleksii Kuchaiev
2 years
NeMo Megatron GPT models: 1.3B, 5B, and 20B are now available on @huggingface hub! #GTC22 #GPT
1
5
6
@kuchaev
Oleksii Kuchaiev
1 month
@bindureddy At least Nemotron-4 gets it
Tweet media one
0
1
6
@kuchaev
Oleksii Kuchaiev
4 months
@paulg @VABVOX A quote I've heard (attributed to L. Landau, not sure if he truly said it though), could be roughly translated as: "What makes University great is not quality of it's professors or courses but the topics which students discuss while drinking vodka in their dorms"
0
0
5
@kuchaev
Oleksii Kuchaiev
3 years
New NeMo update! Adds new models for ASR (CitriNet, Conformer-CTC), NLP (Machine translation models) and TTS (HiFiGan, MelGan, GlowTTS, UniGlow SqueezeWave). 60 new pre-trained models: ASR in En, Zh, Ru, Es, Pl, Ca, It, Fr and De. NMT in 5 language pairs.
1
2
5
@kuchaev
Oleksii Kuchaiev
2 years
Always double check #ChatGPT before using it for scheduling your meetings. (correct answer is 8am according to Google)
Tweet media one
0
0
5
@kuchaev
Oleksii Kuchaiev
1 year
@Yampeleg no, if you increase you batch size by x, you should first try increasing lr by sqrt(x)
0
0
5
@kuchaev
Oleksii Kuchaiev
2 years
Contact your political representatives asking them to provide more support to Ukraine. Ukraine needs air defense, longe range rockets, fighter jets and tanks. To contact your US Senators check: #UkraineRussiaWar
0
3
5
@kuchaev
Oleksii Kuchaiev
1 month
@ylecun awesome work by Meta team! Can you please also release reward models like we did with Nemotron?
1
0
3
@kuchaev
Oleksii Kuchaiev
2 years
@mrm8488 some suggestion on 20B to start with :)
0
0
4
@kuchaev
Oleksii Kuchaiev
4 years
Tweet media one
0
0
4
@kuchaev
Oleksii Kuchaiev
6 years
0
1
4
@kuchaev
Oleksii Kuchaiev
21 days
@natolambert @Gradio "Too many post-training researchers don't talk to their models" - what do you think we do all day? 🤣
1
0
4
@kuchaev
Oleksii Kuchaiev
7 months
@natolambert Reward hacking is an issue, and this is assuming your reward model actually learned what you hope it did...
0
0
3
@kuchaev
Oleksii Kuchaiev
6 years
I’ll be presenting OpenSeq2Seq during #NLPOSS workshop at #ACL2018 . It is a toolkit for distributed and mixed precision training of seq2seq models for machine translation, speech recognition and speech synthesis. Check it out here: #nlproc #DemocratizeNLP
Tweet media one
0
0
4
@kuchaev
Oleksii Kuchaiev
2 months
@Scott_Wiener SB 1047 is a bad piece of legislation because it regulates model development instead of their deployments in applications.
1
0
4
@kuchaev
Oleksii Kuchaiev
2 years
#StableDiffusion is a huge hit among kids :)
Tweet media one
0
0
4
@kuchaev
Oleksii Kuchaiev
5 years
@ID_AA_Carmack If you are putting together some ASR pipeline - checkout QuartzNet it is a simple lightweight convolutional model. It does use mel spectrogram but you can switch to mfcc and tweak this
0
0
4
@kuchaev
Oleksii Kuchaiev
4 years
For the first time since 2011 a crewed launch from USA. And the first crewed launch for @SpaceX !!! #LaunchAmerica
0
0
3
@kuchaev
Oleksii Kuchaiev
2 years
A fantastic opportunity to work in one of the top speech processing groups!
@mirco_ravanelli
Mirco Ravanelli
2 years
I would like to remark that applications from #Ukrainian #students are particularly welcome!
24
2
20
0
0
3
@kuchaev
Oleksii Kuchaiev
5 years
Thank you! We love this dataset and already pre-trained and released some models with it!
@_josh_meyer_
Josh Meyer
5 years
It's finally out! (submitted to LREC) Common Voice: A Massively-Multilingual Speech Corpus
11
135
467
1
1
3
@kuchaev
Oleksii Kuchaiev
2 years
@karpathy shouldn’t there be 3) scalability gap too?
0
0
3
@kuchaev
Oleksii Kuchaiev
3 years
Are you excited about #OpenSource #DeepLearning tools for Conversational #AI ? We are #Hiring engineers to work on projects like NVIDIA NeMo. If this sounds exciting to you, apply here: #NLP #NLProc
0
2
3
@kuchaev
Oleksii Kuchaiev
2 months
@SelfishPriority helping other models become better is the goal here
0
0
3
@kuchaev
Oleksii Kuchaiev
6 years
Our #ICLR2018 Poster and Paper on mixed precision training with Volta GPUs
Tweet media one
0
0
3
@kuchaev
Oleksii Kuchaiev
2 years
@danielgross Chinchilla paper showed that more compute during training is still better, just changes what kind of tradeoffs one would make between model size and training data size.
0
0
3
@kuchaev
Oleksii Kuchaiev
5 years
GTC is always fun #GTC19
0
0
3
@kuchaev
Oleksii Kuchaiev
2 months
@alexgraveley Nemotron-4 reward model is available both as checkooint and API
1
0
3
@kuchaev
Oleksii Kuchaiev
3 years
Very excited to do more with Common Voice! ASR models trained with it already start to appear in NeMo (Fr, It, Ru, De, Es, Pl, Ca) #asr #GTC21
@janescowcroft
Jane Scowcroft
3 years
I'm so pleased to share that Common Voice is heading into it's next stage of growth as high-impact partners, like NVIDIA, recognize the value of large scale open voice datasets. It also marks a career transition for me, as I join t…
0
5
18
0
1
3
@kuchaev
Oleksii Kuchaiev
8 months
@natolambert Problem is that releasing test sets turns them into dev sets rather fast ...
1
0
2
@kuchaev
Oleksii Kuchaiev
5 years
My hometown - even with the picture of a house I grew up in: Inside Slavutych, the city created by the Chernobyl explosion via @CNNTravel
0
0
3
@kuchaev
Oleksii Kuchaiev
2 years
NeMo Megatron enables training of gigantic language models (<= 1T params) with all types of parallelisms (data, tensor, pipeline, sequence), tuning them for your usecases and deploying them #gpt #T5 #NLP
0
3
2
@kuchaev
Oleksii Kuchaiev
6 months
@natolambert This is awesome! But please, please, don't just say "Mistral-7b" or "Llama-2-7/13/70" without corresponding "-instruct/chat" suffix. So many people are confused about base vs aligned models. All models on the board are aligned.
1
0
2
@kuchaev
Oleksii Kuchaiev
5 years
Hmmm ...
Tweet media one
0
0
2
@kuchaev
Oleksii Kuchaiev
5 years
Yes! Code, docs and pre-trained checkpoint
@_josh_meyer_
Josh Meyer
5 years
@kuchaev Are the models open:)?
0
0
0
0
1
2
@kuchaev
Oleksii Kuchaiev
3 months
@alexandr_wang @scale_AI have you seen CA's SB 1047 though ...
0
0
2
@kuchaev
Oleksii Kuchaiev
4 months
@srush_nlp This is not from paper (at least I'm not aware of any published study). But just from loss definitions, if you mask x% of tokens in de-noising loss that means you'll backpropogate loss signal from x% of your tokens after 1 epoch, whereas it is 100% in case of autoregressive loss.
3
0
2
@kuchaev
Oleksii Kuchaiev
4 years
NeMo is being developed in the open on GitHub . Try it out and give us your feedback!
1
1
2
@kuchaev
Oleksii Kuchaiev
4 years
NVIDIA wins all 8 benchmarks for publicly available systems. And all 8 on per-chip basis.
0
0
2
@kuchaev
Oleksii Kuchaiev
1 month
@ylecun Thank you! this is a huge contribution towards positive AI progress
0
0
2
@kuchaev
Oleksii Kuchaiev
2 months
@rpanda89 yes, you can use everything now in NeMo Aligner. Methodology is published in paper. We are working on making everything easier to use.
0
0
2
@kuchaev
Oleksii Kuchaiev
1 year
This is an excellent list worth looking at
@george__mack
George Mack
1 year
What is ignored or neglected by the media -- but will be studied by historians? Here's the full list of 25 examples:
Tweet media one
1K
22K
105K
0
0
1
@kuchaev
Oleksii Kuchaiev
3 years
NVIDIA Academic Hardware Grant Program
0
0
2
@kuchaev
Oleksii Kuchaiev
9 months
@RepAnnaEshoo Thank you for all your work!
0
0
0
@kuchaev
Oleksii Kuchaiev
3 years
This is so horrible and sad.
@alschaben
Allen J. Schaben
3 years
Breaking: First view of major oil Spill off of Huntington Beach and Newport Beach. Booms deployed. Beaches closed @Pacific_Airshow cancelled #oilspill #orangecounty @latimes @latimesphotos
576
3K
4K
0
0
2
@kuchaev
Oleksii Kuchaiev
2 months
@thegartsy SB 1047 helps OpenAI and hurts open source, not the other way around. The bill is making a fundamental mistake by regulating tech development instead of its application. It is not fair to expect randomly sampled Californians to understand the distinction.
1
0
2
@kuchaev
Oleksii Kuchaiev
4 years
Tweet media one
0
0
2