Volodymyr Kuleshov 🇺🇦 @volokuleshov Twitter profile

Pinned Tweet

Volodymyr Kuleshov 🇺🇦

4 months

✨Simple masked diffusion language models (MDLM) match autoregressive transformer performance within 15% at GPT2 scale for the first time! 📘Paper: 💻Code: 🤖Model: 🌎Blog: [1/n]👇

4

36

176

Last Seen Profiles

@parkloey_chen

@StwGendut

@haOFei

@CheikhAlex2

@shayaeen_

@theSusanWright

@TonyNewsCamera

@bokeplokalmalam

@daddygem1ni

@keileonglitz

@ItsDanaWhite

@mnumerato

@YassouG

@dpicinemas

@cngzkync

@WorldBankKenya

@raj_aimim

@wintrplh

@Business_Eng

@byngus_mill

@zotJYfQ06pZJTYY

@SoniaDilr

@Saidina311

@an7hraxalfa

@CoachOrlovsky

@alexa20650878

@stw_pdg

@simply_tonia_

@naughtychaitra

@CHOHAN_125

@Safia_Alsnaani

@bokeplokalmalam

@Optimus_Rhym3

@shenmqkzs

@BinorRaja

@illumeably

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

This wild. Take MNIST, feed it pixel by pixel to an LLM, followed by the label (“x1=5, x2=9, …, y=3”). Fine tune on this dataset. This reaches 99% accuracy. Also works on other small datasets.

29

111

1K

Volodymyr Kuleshov 🇺🇦

@volokuleshov

4 years

Did you ever want to learn more about machine learning in 2021? I'm excited to share the lecture videos and materials from my Applied Machine Learning course at @Cornell_Tech ! We have 20+ lectures on ML algorithms and how to use them in practice. [1/5]

22

289

1K

Volodymyr Kuleshov 🇺🇦

@volokuleshov

9 months

It's crazy how many modern generative models are 15-year old Aapo Hyvarinen papers. Noise contrastive estimation => GANs Score matching => diffusion Ratio matching => discrete diffusion If I were a student today, I'd carefully read Aapo's papers, they’re a gold mine of ideas.

10

116

1K

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

Do you know what's cooler than running LLMs on consumer GPUs? Finetuning large 65B+ LLMs on consumer GPUs! 🤖 Check out my new side project: LLMTune. It can finetune 30B/65B LLAMA models on 24Gb/48Gb GPUs.

17

165

756

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

Compressing MNIST into 10 synthetic images that give almost the same performance as the full dataset. Cool paper!

7

173

672

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

Ok, I'm sorry, but this is just brilliant. Folks argue that AI can't make art, but look: (1) DALLE2 distills the essence of NY in a stunning & abstract way (2) each pic has unique visual language (bridge-loops in #1 !?) (3) it *builds* (not copies) something new on top of Picasso!

hardmaru

@hardmaru

2 years

Picasso minimal line art of New York City #dalle

33

135

935

24

89

658

Volodymyr Kuleshov 🇺🇦

@volokuleshov

11 months

Excited to announce the newest update to the Cornell Open Applied ML course! We are releasing 16 chapters of open online lecture notes covering topics across ML: neural networks, SVMs, gradient boosting, generative models, and much more.

9

132

627

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

We are publishing online the notes for the Probabilistic Graphical Models course at Stanford (work in progress!)

5

184

540

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

The award for the most clearly understandable poster goes to… this person!

9

24

532

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

Here is an experiment: using ChatGPT to emulate a Jupyter notebook. You can even get it to run GPT inside ChatGPT. And you can also train neural networks from scratch inside ChatGPT.🤯 Here's walkthrough of how it works.

9

56

411

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 months

ICLR decisions are now public, and it's confirmed that the recent (pretty high-profile) Mamba paper didn't get in. It's useful to read OpenReview to see how subjective the peer review process can be. The moral is: don't stress if your paper doesn't get in from the first try!

8

37

338

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

Christopher Bishop is publishing a new textbook!

3

113

333

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

Two-bit and three-bit LLMs are almost here! QuiP yields the first usable two-bit LLMs and further reduces the costs of running LLMs on just one GPU. [1/4] paper: code:

3

85

323

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

I love this chart from the AI index report. You clearly see the peak of 80's AI hype and its slow drop-off. We seem to have just matched the conference attendance numbers from back then.

8

164

286

Volodymyr Kuleshov 🇺🇦

@volokuleshov

3 years

Imagine you build an ML model with 80% accuracy. There are many things you can try next: collect data, create new features, increase dropout, tune the optimizer. How do you decide what to try next in a principled way?

2

56

281

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

New paper with @ermonste on accurate uncertainties for Bayesian deep learning. Addresses model overconfidence which arises from misspecification and computational approximations. Will be presented at @icmlconf next week!

1

86

270

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

My weekend side project: MiniLLM, a minimal system for running modern LLMs on consumer GPUs✨ 🐦 Supports multiple LLMs (LLAMA, BLOOM, OPT) ⚙️ Supports NVIDIA GPUs, not just Apple Silicon 🧚‍♀️ Tiny, easy-to-use codebase in Python (<500 LOC)

9

52

264

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 months

Excited to finally release our open course on deep generative models! This material has been taught at Stanford/Cornell/UCLA since 2019. It includes 🎥 20 hours of video lectures ✨ 17 sets of slides 📖 Lecture notes Youtube: Site:

9

51

265

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

Super resolution for audio. Our expanded paper is now on ArXiv + implementation + companion website:

3

87

233

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

What are the benefits of using deep learning in causal inference? Thoughts based on Monday morning's ICML tutorial on causality + my own opinions. 👇 Slides are from the tutorial (link is below).

5

27

231

Volodymyr Kuleshov 🇺🇦

@volokuleshov

3 years

As promised, here is a summary of @cornell_tech Applied ML 2021 Lecture 1: "What is ML?" The main idea is that machine learning is a form of programming, where you create software by specifying data and a learning algorithm instead of writing traditional code.

1

32

223

Volodymyr Kuleshov 🇺🇦

@volokuleshov

10 months

Loved this nice and simple idea for better data selection in LMs. First, use high level features to describe high-value data (eg textbook chunks). Then use importance sampling to prioritize similar data in a large dataset. ⁦ @sangmichaelxie ⁩

2

30

230

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

📢 Announcing the newest edition of the Cornell Tech open machine learning course! - 📺 30+ hours of lecture videos - 📕 Lecture notes, slides, and code for 20+ lectures - 🌎 A brand new website Check it out here:

4

68

220

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

Another SOTA multi-task NLP result from MSFT. What is it about the Transformer architecture that is making these advances possible? Why have we not seen the same results with LSTM-based methods like Dai and Le (2015)?

Microsoft’s New MT-DNN Outperforms Google BERT

Multi-task learning and language model pre-training are popular approaches for many of today’s natural language understanding (NLU) tasks…

medium.com

1

39

187

Volodymyr Kuleshov 🇺🇦

@volokuleshov

4 years

The slides and lectures notes that accompany my videos from the Applied Machine Learning course at Cornell are now available on Github! I'm sharing 20+ Jupyter notebooks that you can compile into HTML, PDF, or execute directly.

2

47

175

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

They claim that replacing the final softmax layer in a nn speeds up training by 33% on SOTA cifar-10 and ImageNet:

Be Careful What You Backpropagate: A Case For Linear Output...

In this work, we show that saturating output activation functions, such as the softmax, impede learning on a number of standard classification tasks. Moreover, we present results showing that the...

arxiv.org

3

58

172

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

Slides from my talk at the @uai2018 workshop on uncertainty in deep learning. Thanks again to the organizers for inviting me!

3

41

167

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

Last April, we released libraries for 3-bit and 4-bit LLM inferencing & finetuning. We've had a lot of interest in our code (including 1k+ stars on Github), and we're now officially releasing the underlying algorithm: ModuLoRA ModuLoRA is the first method to finetune 3-bit LLMs!

6

32

166

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 months

It's surprising how little core ML innovation was needed to create Sora. As with GPT, the OAI team took a proven architecture (latent diffusion), and scaled it to massive data, with incredible results. Still, it's interesting to look at the details that OAI chose to reveal 1/👇

3

15

160

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

New paper on black-box learning of undirected models using neural variational inference. Also speeds up sampling and helps estimate the partition function. Our #nips2017 paper is online here:

2

46

156

Volodymyr Kuleshov 🇺🇦

@volokuleshov

3 years

Update on my fall 2021 @cornell_tech applied ML course. Each week, I will be releasing all the slides, lecture notes, and course materials on Github, and I also plan to post summaries of each lecture on Twitter. Everybody is welcome to follow along!

GitHub - kuleshov/cornell-cs5785-2024-applied-ml: Lecture materials for Cornell CS5785 Applied...

Lecture materials for Cornell CS5785 Applied Machine Learning (Fall 2024) - kuleshov/cornell-cs5785-2024-applied-ml

github.com

1

24

140

Volodymyr Kuleshov 🇺🇦

@volokuleshov

11 months

New update from the world of LLM quantization: QuIP will appear at NeurIPS2023, and our updated paper is now on the ArXiv. QuIP is the first method that gets useful results out of LLMs quantized using as little as 2 bits/weight. Camera-ready:

8

26

135

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

And the prize for the most technologically sophisticated poster goes to… this author! Two built-in tablets—first time I see this at a conference.

4

13

128

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

I really enjoyed the ICML tutorial on causality and fairness by Elias Bareinboim and Drago Plecko. My summary and some thoughts below 👇

1

17

126

Volodymyr Kuleshov 🇺🇦

@volokuleshov

9 months

✨Introducing diffusion with learned adaptive noise, a new state-of-the-art model for density estimation✨ Our key idea is to learn the diffusion process from data (instead of it being fixed). This yields a tighter ELBO, faster training, and more! Paper:

2

29

121

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

This is absolutely mind-blowing. Finding these kinds of disentangled representations has been a goal of generative modeling research for years. The fact that style-based GANs do it so well in a purely unsupervised way and without a strong inductive bias is just crazy.

Ian Goodfellow

@goodfellow_ian

6 years

An exciting property of style-based generators is that they have learned to do 3D viewpoint rotations around objects like cars. These kinds of meaningful latent interpolations show that the model has learned about the structure of the world.

8

355

1K

3

28

118

Volodymyr Kuleshov 🇺🇦

@volokuleshov

5 years

I’d like to share a little update on what I’ve been up to in the past two years. 🙂

Food Waste Is an $18 Billion Problem--or, for These 3 'Obsessed' Stanford Grads, an $18 Billion...

Afresh is using algorithms to prevent food from ending up in landfills.

www.inc.com

3

22

117

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

Cool work on semantic manipulations in diffusion models:

3

13

107

Volodymyr Kuleshov 🇺🇦

@volokuleshov

5 years

New paper out in @NatureComms ! GWASkb is a machine-compiled knowledge base of genetic studies that approaches the quality of human curation for the first time. With @ajratner @bradenjhancock @HazyResearch @yang_i_li et al. Last paper from the PhD years.

2

14

88

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

"A Guide to Deep Learning in Healthcare" out in @NatureMedicine this week! Joint work with amazing team of collaborators @AndreEsteva @AlexandreRbcqt @rbhar90 @SebastianThrun @JeffDean M DePristo, C. Cui, K. Chou, G. Corrado Paper:

A guide to deep learning in healthcare

Nature Medicine - A primer for deep-learning techniques for healthcare, centering on deep learning in computer vision, natural language processing, reinforcement learning, and generalized methods.

www.nature.com

1

35

86

Volodymyr Kuleshov 🇺🇦

@volokuleshov

10 months

2-bit LLaMAs are here! 🦙✨ The new QuIP# ("quip-sharp") algorithm enables running the largest 70B models on consumer-level 24Gb GPUs with a only minimal drop in accuracy. Amazing work led by Cornell students @tsengalb99 @CheeJerry + colleagues @qingyao_sun @chrismdesa [1/n]

2

20

84

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

Wow, this year's Burning Man theme is Artificial Intelligence! The description talks about automation, loss of jobs, safety, etc. A mix of valid concerns and hype. In any case, it should be a lot fun.

4

25

82

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

Model weights can be quantized into a small # of bits during inference. If we're a bit clever, we can also directly train quantized weights! Experiments should be *much* faster. Or fit many quantized models, then fine-tune best one at full precision.

0

27

84

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

At #icml2018 in Stockholm this week! Come check our poster on improved uncertainty estimation for Bayesian models and deep neural networks on Thursday.

1

23

83

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

Love to see my machine learning lectures being shared online. Also, please stay tuned—I will be announcing a major new update to my online class in the next few days.

Jean de Nyandwi

@Jeande_d

2 years

Applied Machine Learning - Cornell CS5785 "Starting from the very basics, covering all of the most important ML algorithms and how to apply them in practice. Executable Jupyter notebooks (and as slides)". 80 videos. Videos: Code:

78

853

3K

2

12

76

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

We're hosting a generative AI hackathon at @cornell_tech on 4/23! Join us to meet fellow researchers & hackers, listen to interesting talks, access models like GPT4, and build cool things. Sponsors: @openai @LererHippeau @AnthropicAI @BloombergBeta RSVP

NYC GPT/LLM Hackathon | Partiful

* Sponsored by OpenAI, Lerer Hippeau, Anthropic, Hugging Face, Pinecone, Replicate, RunPod, 645 Ventures, Bloomberg Beta, Coatue, FirstMark, General Catalyst, and Lux Capital * Run by the AGI House...

partiful.com

5

13

74

Volodymyr Kuleshov 🇺🇦

@volokuleshov

3 years

After a summer break, I’m getting back to tweeting again. This Fall, I’m teaching Applied ML at @cornell_tech in this huge room (and this time in person). Stay tuned for updates as I’ll be sharing a lot of the lecture videos and materials over the next few months!

2

1

75

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

Autumn gradient 🍃🍂🍁

2

74

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

If you didn't get an invite to the Elon Musk / Tesla party tonight, come check out our poster on Neural Variational Inference in Undirected Graphical Models at board #108 :)

0

10

74

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

Benchmarking all the sklearn classifiers against each other. Gradient boosting comes out on top (confirming general wisdom)

Randy Olson

@randal_olson

7 years

Our preprint on a comprehensive benchmark of #sklearn is out. Some interesting findings! #MachineLearning #dataviz

22

435

647

1

20

70

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

Finally read this excellent paper. It shows problems where grads are the same for different models, hence GD fails

0

15

68

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

One weird trick to scale your diffusion models to high resolution images: tune the noise schedule (and sprinkle in a bit of Google-level compute). High res diffusion without latent diffusion.

2

7

67

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

Stanford researchers announced new 500TB Medical ImageNet dataset at #gtc . The project webpage is already up:

0

34

66

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

#Neurips2022 is now over---here is what I found exciting this year. Interesting trends include creative ML, diffusion models, language models, LLMs + RL, and some interesting theoretical work on conformal prediction, optimization, and more.

1

4

66

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

First legitimate looking replication of the Meissner effect in LK-99. What a great day for science!

0

13

65

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

How can deep learning be useful in causal inference? In our #NeurIPS2022 paper, we argue that causal effect estimation can benefit from large amounts of unstructured "dark" data (images, sensor data) that can be leveraged via deep generative models to account for confounders.

1

5

63

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

@NandoDF @ilyasut @icmlconf @iclr2019 Don't the benefits of increased reproducibility and rigor on the part of the authors greatly outweigh any potential misuses of their work, at least for the vast majority of ICML/ICLR papers? I think the current shift towards empirical work puts a greater need on releasing code.

4

1

60

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

Cool reddit thread on beautiful ML papers... but why are they all DL papers?? :) How about Wainwright and Jordan? Or the online learning stuff by Shalev-Schwartz? DL is awesome but, come on, that's not all there is to ML :) My vote goes to this paper:

Niru Maheswaranathan

@niru_m

6 years

This reddit thread on beautiful papers in machine learning is pretty great:

0

14

50

2

4

58

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

I thought the NIPS tutorial on deep learning on graphs was quite interesting. Uses spectral representations of graphs, which are really fascinating in themselves. Lots of potential scientific applications. I'd like to learn more about generative techniques.

Alexandros

@alexk_z

7 years

Slides from #NIPS2017 tutorial “Geometric deep learning on graphs and manifolds”

7

63

210

1

11

59

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

Question to the panel—what are the open problems in diffusion generative models? Top answer—generalizing to discrete sequences and coming up with good corruption processes for that domain.

2

5

57

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

A paper describing the DenseNet idea in 1988. Another example of researchers (independently) reinventing an idea from the last AI summer.

Rupesh Srivastava

@rupspace

7 years

Densenet from 1988. Found during skip connections lit review for my thesis, courtesy Faustino Gomez. Please cite if you use Densenets (1/2)

4

94

251

1

27

55

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

The ICLR paper decisions haven’t even been made yet, yet text-to-3d models are already been deployed into a commercial app by Luma. What a crazy year! Shootout to @poolio for the original work on DreamFusion

Luma AI

@LumaLabsAI

2 years

✨ Introducing Imagine 3D: a new way to create 3D with text! Our mission is to build the next generation of 3D and Imagine will be a big part of it. Today Imagine is in early access and as we improve we will bring it to everyone

138

665

3K

1

4

54

Volodymyr Kuleshov 🇺🇦

@volokuleshov

8 years

My take on Wasserstein GANs: slower training, convergence metric works, not always more stable. TF/Keras code here:

GitHub - kuleshov/tf-wgan: Wasserstein DCGAN in Tensorflow/Keras

Wasserstein DCGAN in Tensorflow/Keras. Contribute to kuleshov/tf-wgan development by creating an account on GitHub.

github.com

1

13

54

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

It’s awesome to be back at @cornell_tech ! Kicking off the fall semester with some stunning views of NYC.

2

1

55

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

As far as I can tell (and I might be wrong), Google Research was just renamed Google AI... What about all the work in systems, crypto, econ..? It just seems wrong to call all of Google's CS research "AI".

5

9

51

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

This talk at the NIPS creative ML workshop was amazing. One of the best results on music generation with RNNs that I've seen so far.

Adam Roberts

@ada_rob

7 years

Excited to share my newest work #MusicVAE for interpolating and sampling melodies, beats, and three-part song segments from a VAE! Listen to samples and create your own in the #colab notebook (link in YT description) w/ @jesseengel @deck #magenta #nips2017

4

94

299

1

10

52

Volodymyr Kuleshov 🇺🇦

@volokuleshov

9 months

Here are 9 predictions for AI in 2024 🎉🎊 1️⃣ Planning will take a greater role in generative modeling. Models will increasingly “think” at inference time, trading off compute for output quality. In many applications (“generate a good molecule”), this will make a ton of sense.

4

8

50

Volodymyr Kuleshov 🇺🇦

@volokuleshov

3 years

Rally in support of Ukraine now in Times Square and for the next several hours. Another one starts at 2pm in Greenwich village. One show your support.

0

14

49

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

This blew my mind. GAN-style models for identifying causal mutations in a GWAS with adjustment for population-based confounders.

Dustin Tran

@dustinvtran

7 years

Dave Blei and I are excited to share “Implicit causal models for genome-wide association studies”

9

63

194

0

13

48

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

As usual, the @agi_house hackathon doesn’t disappoint! Demos are going to start soon. 🦾

2

3

49

Volodymyr Kuleshov 🇺🇦

@volokuleshov

5 years

Earlier this month at #icml2019 , we presented new work which examines the question of what uncertainties are needed in model-based RL. Taking inspiration from early work on scoring rules in statistics, we argue that uncertainties in RL must be *calibrated*. [1/10]

1

8

49

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

📢 One of my students, Phil Si, is applying for PhD programs in this cycle. Phil's ICLR paper on quantile flows is an exciting improvement over neural autoregressive flows, which makes them applicable to not just to density estimation, but also to generation. A short thread 🧵

1

6

45

Volodymyr Kuleshov 🇺🇦

@volokuleshov

3 years

War in Ukraine, Day 2 (Feb 25) I will be summarizing key events of the day based on what I hear from friends on the ground and based on reports from western and Ukrainian media.

1

3

46

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

After Google and Baidu, Facebook is also publishing a neural text-to-speech system. Like Deep Speech 3 (and Lyrebird's unpublished demo), it quickly generalizes to new speakers. This research area is really interesting.

0

14

45

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

Outstanding paper talk by @poolio on DreamFusion. Use existing text to image generative models to train text to NERF models. Get a 3d render of a squirrel on a motorbike. #iclr2023

1

3

43

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 months

Check out our new blog post on on finetuning LLMs quantized in 2-bits using Modulora. Unlike QLoRA, Modulora works with any modern quantizer like Quip# or OPTQ, and can outperform QLoRA on downstream tasks with 2x smaller models.

1

16

44

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 months

How far can you push LLM quantization without hurting performance on downstream tasks? We could go pretty far by combining a state-of-the-art quantizer with Modulora finetuning. On some tasks, our carefully finetuned 2-bit LLMs outperform existing 8-bit LLMs. A short thread 👇

1

13

42

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

Explaining some of the behavior of deep neural nets using older results from the 90s developed for boosting, support vector machines, etc. Cool stuff:

1

5

39

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

And the new world record of information content per character in a tweet goes to...

Andrej Karpathy

@karpathy

6 years

Some ways of combining information in two branch of a net A & B: 1) A+B, 2) A*B, 3) concat [A,B], 4) LSTM-style tanh(A) * sigmoid(B), 5) hypernetwork-style convolve A with weights w = f(B), 6) hypernetwork-style batch-norm A with \gamma, \beta = f(B), 7) A soft attend to B, ... ?

28

177

882

0

2

40

Volodymyr Kuleshov 🇺🇦

@volokuleshov

5 months

If you’re at ICLR, you should try to catch Oscar (JHU undergrad), who is presenting his really cool TMLR paper on Modulora, 2bit finetuning of LLMs. Paper:

1

7

38

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

In Hawaii for #ICML2023 !

3

1

38

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

Diffusion models produce great samples, but they lack a semantically meaningful latent space like in a VAE or a GAN. We augment diffusion with low-dimensional latents that can be used for image manipulation, interpolation, controlled generation. #ICML2023

Aaron Gokaslan

@SkyLi0n

1 year

Thrilled to share our latest paper - Infomax #Diffusion ! We're pushing the boundaries of standard diffusion models by unsupervised learning of a concise, interpretable latent space. Enjoy fun latent space editing techniques just like GANs/VAEs! Details:

0

4

25

1

5

38

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

City lights on a Saturday night ✨

3

1

36

Volodymyr Kuleshov 🇺🇦

@volokuleshov

4 years

Excited to share that @afreshai raised another $12M to advance our mission of using AI to reduce waste across the food supply chain. We're always looking for talented folks who are passionate about AI, food, and the environment to join our growing team.

Afresh bags $12 million Series A follow-on funding to eliminate food waste and make fresher,...

Every year, Americans waste trillion dollars of in food waste. This hurts our planet instead of nourishing people. Inefficiencies in the fresh food supply chain lead to tens of billions of dollars in...

techstartups.com

3

36

Volodymyr Kuleshov 🇺🇦

@volokuleshov

10 months

Did you know that word2vec was rejected at the first ICLR (when it was still a workshop)? Don’t get discouraged by the peer review process: the best ideas ultimately get the recognition they deserve.

Aravind Srinivas

@AravSrinivas

10 months

Tomas Mikolov, the OG and inventor of word2vec, gives this thoughts on the test of time award, and the current state of NLP, and chatGPT. 🍿

27

176

1K

1

2

34

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

As the generative AI hackathon and post-hackathon events come to an end, I want to again thank everyone who attended! Incredibly grateful to @davederiso @agihouse_org for organizing the event! 🙏 to @LererHippeau & the NYC VC community for sponsoring it

2

6

35

Volodymyr Kuleshov 🇺🇦

@volokuleshov

8 years

Wow, "Understanding deep learning requires rethinking generalization" got perfect 10.0 reviews at ICLR (best score)!

Understanding deep learning requires rethinking generalization

Through extensive systematic experiments, we show how the traditional approaches fail to explain why large neural networks generalize well in practice, and why understanding deep learning requires...

openreview.net

1

10

34

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

Cool talk at #iclr2023 —can you find a semantic latent space within a pre-trained generative model (like stable diffusion) and use it to enable controllable generation?

3

6

33

Volodymyr Kuleshov 🇺🇦

@volokuleshov

4 years

For the full set of videos, check out this Youtube playlist:

Cornell CS 5787: Applied Machine Learning. Lecture 1. Part 1:...

Course Website & Materials: https://kuleshov-group.github.io/aml-website/Lecture Notes: https://kuleshov-group.github.io/aml-book/intro.htmlCourse Materials ...

www.youtube.com

1

3

33

Volodymyr Kuleshov 🇺🇦

@volokuleshov

6 years

@poolio @goodfellow_ian LADYGAGAN is still not taken

3

32

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

We shouldn't be using the same name for the model and its inference algo. GANs use same model as VAEs (hierarchical factor analysis), but also combine it with an approximation of two-sample test for training (the main innovation). Cool insight from Z. Gharamani @ NIPS panel.

0

2

33

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

I hope this class has been useful 🙏

Abacus.AI

@abacusai

1 year

Applied Machine Learning. Cornell Tech, 2020. 100% Free. 80 videos. Here is the full playlist:

8

313

993

1

0

33

Volodymyr Kuleshov 🇺🇦

@volokuleshov

1 year

Interesting idea: language models without any positional embeddings. They seem to get surprisingly good perplexities just by looking at word co-occurrences. Fun workshop paper from the Stanford AI group. #icml2023

1

10

31

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

🍁🍂🚡 @cornell_tech

0

3

32

Volodymyr Kuleshov 🇺🇦

@volokuleshov

4 years

Short summary: scientists found phosphine in Venus atmosphere, especially in areas thought to be hospitable. On Earth, it is only produced by microbes in low-oxygen environments and by rare chemical reactions that are not expected to occur on Venus. Will be following this closely

Brian Roemmele

@BrianRoemmele

4 years

🔮 Signs of Life discovered on Venus and atmosphere. Announcement on September 15th, 2020. 👽

49

407

956

0

1

31

Volodymyr Kuleshov 🇺🇦

@volokuleshov

4 years

More impressive applications of score matching, this time for image generation. @YSongStanford is on fire! :)

1

3

31

Volodymyr Kuleshov 🇺🇦

@volokuleshov

7 years

These Finnish NVIDIA guys are really good! I was using their semi-supervised code in my UAI paper. I like how new paper is also in Theano :)

1

5

31

Volodymyr Kuleshov 🇺🇦

@volokuleshov

2 years

Back to @cornell_tech 🚡🏙️ #NeurIPS2022 was a lot of fun. See you all next year!

2

0

31

Volodymyr Kuleshov 🇺🇦

@volokuleshov

27 days

I'm really stoked to be teaching applied machine learning with @brandondamos this semester! AML is a masters-level course taken by 150-200 students each fall. Having Brandon as a co-instructor is an incredible opportunity to expose students to cutting-edge AI research.

Brandon Amos

@brandondamos

29 days

📢 Today's my first day at Cornell Tech :) I will continue my full-time position as a scientist at Meta, and am teaching ML here on the side for the semester with @volokuleshov . The course is open source and you can follow everything in this repo:

37

46

677

1

3

31