✨Simple masked diffusion language models (MDLM) match autoregressive transformer performance within 15% at GPT2 scale for the first time!
📘Paper:
💻Code:
🤖Model:
🌎Blog:
[1/n]👇
This wild. Take MNIST, feed it pixel by pixel to an LLM, followed by the label (“x1=5, x2=9, …, y=3”). Fine tune on this dataset. This reaches 99% accuracy. Also works on other small datasets.
Did you ever want to learn more about machine learning in 2021? I'm excited to share the lecture videos and materials from my Applied Machine Learning course at
@Cornell_Tech
! We have 20+ lectures on ML algorithms and how to use them in practice. [1/5]
It's crazy how many modern generative models are 15-year old Aapo Hyvarinen papers.
Noise contrastive estimation => GANs
Score matching => diffusion
Ratio matching => discrete diffusion
If I were a student today, I'd carefully read Aapo's papers, they’re a gold mine of ideas.
Do you know what's cooler than running LLMs on consumer GPUs? Finetuning large 65B+ LLMs on consumer GPUs! 🤖
Check out my new side project: LLMTune. It can finetune 30B/65B LLAMA models on 24Gb/48Gb GPUs.
Ok, I'm sorry, but this is just brilliant. Folks argue that AI can't make art, but look: (1) DALLE2 distills the essence of NY in a stunning & abstract way (2) each pic has unique visual language (bridge-loops in
#1
!?) (3) it *builds* (not copies) something new on top of Picasso!
Excited to announce the newest update to the Cornell Open Applied ML course!
We are releasing 16 chapters of open online lecture notes covering topics across ML: neural networks, SVMs, gradient boosting, generative models, and much more.
Here is an experiment: using ChatGPT to emulate a Jupyter notebook. You can even get it to run GPT inside ChatGPT.
And you can also train neural networks from scratch inside ChatGPT.🤯
Here's walkthrough of how it works.
ICLR decisions are now public, and it's confirmed that the recent (pretty high-profile) Mamba paper didn't get in. It's useful to read OpenReview to see how subjective the peer review process can be.
The moral is: don't stress if your paper doesn't get in from the first try!
Two-bit and three-bit LLMs are almost here!
QuiP yields the first usable two-bit LLMs and further reduces the costs of running LLMs on just one GPU. [1/4]
paper:
code:
I love this chart from the AI index report. You clearly see the peak of 80's AI hype and its slow drop-off. We seem to have just matched the conference attendance numbers from back then.
Imagine you build an ML model with 80% accuracy. There are many things you can try next: collect data, create new features, increase dropout, tune the optimizer. How do you decide what to try next in a principled way?
New paper with
@ermonste
on accurate uncertainties for Bayesian deep learning. Addresses model overconfidence which arises from misspecification and computational approximations. Will be presented at
@icmlconf
next week!
My weekend side project: MiniLLM, a minimal system for running modern LLMs on consumer GPUs✨
🐦 Supports multiple LLMs (LLAMA, BLOOM, OPT)
⚙️ Supports NVIDIA GPUs, not just Apple Silicon
🧚♀️ Tiny, easy-to-use codebase in Python (<500 LOC)
Excited to finally release our open course on deep generative models! This material has been taught at Stanford/Cornell/UCLA since 2019. It includes
🎥 20 hours of video lectures
✨ 17 sets of slides
📖 Lecture notes
Youtube:
Site:
What are the benefits of using deep learning in causal inference? Thoughts based on Monday morning's ICML tutorial on causality + my own opinions. 👇
Slides are from the tutorial (link is below).
As promised, here is a summary of
@cornell_tech
Applied ML 2021 Lecture 1: "What is ML?"
The main idea is that machine learning is a form of programming, where you create software by specifying data and a learning algorithm instead of writing traditional code.
Loved this nice and simple idea for better data selection in LMs.
First, use high level features to describe high-value data (eg textbook chunks). Then use importance sampling to prioritize similar data in a large dataset.
@sangmichaelxie
📢 Announcing the newest edition of the Cornell Tech open machine learning course!
- 📺 30+ hours of lecture videos
- 📕 Lecture notes, slides, and code for 20+ lectures
- 🌎 A brand new website
Check it out here:
Another SOTA multi-task NLP result from MSFT. What is it about the Transformer architecture that is making these advances possible? Why have we not seen the same results with LSTM-based methods like Dai and Le (2015)?
The slides and lectures notes that accompany my videos from the Applied Machine Learning course at Cornell are now available on Github!
I'm sharing 20+ Jupyter notebooks that you can compile into HTML, PDF, or execute directly.
Last April, we released libraries for 3-bit and 4-bit LLM inferencing & finetuning. We've had a lot of interest in our code (including 1k+ stars on Github), and we're now officially releasing the underlying algorithm: ModuLoRA
ModuLoRA is the first method to finetune 3-bit LLMs!
It's surprising how little core ML innovation was needed to create Sora. As with GPT, the OAI team took a proven architecture (latent diffusion), and scaled it to massive data, with incredible results.
Still, it's interesting to look at the details that OAI chose to reveal 1/👇
New paper on black-box learning of undirected models using neural variational inference. Also speeds up sampling and helps estimate the partition function. Our
#nips2017
paper is online here:
Update on my fall 2021
@cornell_tech
applied ML course.
Each week, I will be releasing all the slides, lecture notes, and course materials on Github, and I also plan to post summaries of each lecture on Twitter. Everybody is welcome to follow along!
New update from the world of LLM quantization: QuIP will appear at NeurIPS2023, and our updated paper is now on the ArXiv.
QuIP is the first method that gets useful results out of LLMs quantized using as little as 2 bits/weight.
Camera-ready:
✨Introducing diffusion with learned adaptive noise, a new state-of-the-art model for density estimation✨
Our key idea is to learn the diffusion process from data (instead of it being fixed). This yields a tighter ELBO, faster training, and more!
Paper:
This is absolutely mind-blowing. Finding these kinds of disentangled representations has been a goal of generative modeling research for years. The fact that style-based GANs do it so well in a purely unsupervised way and without a strong inductive bias is just crazy.
An exciting property of style-based generators is that they have learned to do 3D viewpoint rotations around objects like cars. These kinds of meaningful latent interpolations show that the model has learned about the structure of the world.
2-bit LLaMAs are here! 🦙✨
The new QuIP# ("quip-sharp") algorithm enables running the largest 70B models on consumer-level 24Gb GPUs with a only minimal drop in accuracy.
Amazing work led by Cornell students
@tsengalb99
@CheeJerry
+ colleagues
@qingyao_sun
@chrismdesa
[1/n]
Wow, this year's Burning Man theme is Artificial Intelligence! The description talks about automation, loss of jobs, safety, etc. A mix of valid concerns and hype. In any case, it should be a lot fun.
Model weights can be quantized into a small # of bits during inference. If we're a bit clever, we can also directly train quantized weights! Experiments should be *much* faster. Or fit many quantized models, then fine-tune best one at full precision.
At
#icml2018
in Stockholm this week! Come check our poster on improved uncertainty estimation for Bayesian models and deep neural networks on Thursday.
Love to see my machine learning lectures being shared online. Also, please stay tuned—I will be announcing a major new update to my online class in the next few days.
Applied Machine Learning - Cornell CS5785
"Starting from the very basics, covering all of the most important ML algorithms and how to apply them in practice. Executable Jupyter notebooks (and as slides)". 80 videos.
Videos:
Code:
After a summer break, I’m getting back to tweeting again.
This Fall, I’m teaching Applied ML at
@cornell_tech
in this huge room (and this time in person). Stay tuned for updates as I’ll be sharing a lot of the lecture videos and materials over the next few months!
If you didn't get an invite to the Elon Musk / Tesla party tonight, come check out our poster on Neural Variational Inference in Undirected Graphical Models at board
#108
:)
One weird trick to scale your diffusion models to high resolution images: tune the noise schedule (and sprinkle in a bit of Google-level compute). High res diffusion without latent diffusion.
#Neurips2022
is now over---here is what I found exciting this year. Interesting trends include creative ML, diffusion models, language models, LLMs + RL, and some interesting theoretical work on conformal prediction, optimization, and more.
How can deep learning be useful in causal inference?
In our
#NeurIPS2022
paper, we argue that causal effect estimation can benefit from large amounts of unstructured "dark" data (images, sensor data) that can be leveraged via deep generative models to account for confounders.
@NandoDF
@ilyasut
@icmlconf
@iclr2019
Don't the benefits of increased reproducibility and rigor on the part of the authors greatly outweigh any potential misuses of their work, at least for the vast majority of ICML/ICLR papers? I think the current shift towards empirical work puts a greater need on releasing code.
Cool reddit thread on beautiful ML papers... but why are they all DL papers?? :) How about Wainwright and Jordan? Or the online learning stuff by Shalev-Schwartz? DL is awesome but, come on, that's not all there is to ML :) My vote goes to this paper:
I thought the NIPS tutorial on deep learning on graphs was quite interesting. Uses spectral representations of graphs, which are really fascinating in themselves. Lots of potential scientific applications. I'd like to learn more about generative techniques.
Question to the panel—what are the open problems in diffusion generative models? Top answer—generalizing to discrete sequences and coming up with good corruption processes for that domain.
The ICLR paper decisions haven’t even been made yet, yet text-to-3d models are already been deployed into a commercial app by Luma. What a crazy year! Shootout to
@poolio
for the original work on DreamFusion
✨ Introducing Imagine 3D: a new way to create 3D with text!
Our mission is to build the next generation of 3D and Imagine will be a big part of it. Today Imagine is in early access and as we improve we will bring it to everyone
As far as I can tell (and I might be wrong), Google Research was just renamed Google AI... What about all the work in systems, crypto, econ..? It just seems wrong to call all of Google's CS research "AI".
Excited to share my newest work
#MusicVAE
for interpolating and sampling melodies, beats, and three-part song segments from a VAE! Listen to samples and create your own in the
#colab
notebook (link in YT description)
w/
@jesseengel
@deck
#magenta
#nips2017
Here are 9 predictions for AI in 2024 🎉🎊
1️⃣ Planning will take a greater role in generative modeling. Models will increasingly “think” at inference time, trading off compute for output quality. In many applications (“generate a good molecule”), this will make a ton of sense.
Earlier this month at
#icml2019
, we presented new work which examines the question of what uncertainties are needed in model-based RL. Taking inspiration from early work on scoring rules in statistics, we argue that uncertainties in RL must be *calibrated*. [1/10]
📢 One of my students, Phil Si, is applying for PhD programs in this cycle. Phil's ICLR paper on quantile flows is an exciting improvement over neural autoregressive flows, which makes them applicable to not just to density estimation, but also to generation.
A short thread 🧵
War in Ukraine, Day 2 (Feb 25)
I will be summarizing key events of the day based on what I hear from friends on the ground and based on reports from western and Ukrainian media.
After Google and Baidu, Facebook is also publishing a neural text-to-speech system. Like Deep Speech 3 (and Lyrebird's unpublished demo), it quickly generalizes to new speakers. This research area is really interesting.
Outstanding paper talk by
@poolio
on DreamFusion. Use existing text to image generative models to train text to NERF models. Get a 3d render of a squirrel on a motorbike.
#iclr2023
Check out our new blog post on on finetuning LLMs quantized in 2-bits using Modulora.
Unlike QLoRA, Modulora works with any modern quantizer like Quip# or OPTQ, and can outperform QLoRA on downstream tasks with 2x smaller models.
How far can you push LLM quantization without hurting performance on downstream tasks?
We could go pretty far by combining a state-of-the-art quantizer with Modulora finetuning. On some tasks, our carefully finetuned 2-bit LLMs outperform existing 8-bit LLMs.
A short thread 👇
Some ways of combining information in two branch of a net A & B: 1) A+B, 2) A*B, 3) concat [A,B], 4) LSTM-style tanh(A) * sigmoid(B), 5) hypernetwork-style convolve A with weights w = f(B), 6) hypernetwork-style batch-norm A with \gamma, \beta = f(B), 7) A soft attend to B, ... ?
If you’re at ICLR, you should try to catch Oscar (JHU undergrad), who is presenting his really cool TMLR paper on Modulora, 2bit finetuning of LLMs.
Paper:
Diffusion models produce great samples, but they lack a semantically meaningful latent space like in a VAE or a GAN. We augment diffusion with low-dimensional latents that can be used for image manipulation, interpolation, controlled generation.
#ICML2023
Thrilled to share our latest paper - Infomax
#Diffusion
! We're pushing the boundaries of standard diffusion models by unsupervised learning of a concise, interpretable latent space. Enjoy fun latent space editing techniques just like GANs/VAEs! Details:
Excited to share that
@afreshai
raised another $12M to advance our mission of using AI to reduce waste across the food supply chain. We're always looking for talented folks who are passionate about AI, food, and the environment to join our growing team.
Did you know that word2vec was rejected at the first ICLR (when it was still a workshop)? Don’t get discouraged by the peer review process: the best ideas ultimately get the recognition they deserve.
As the generative AI hackathon and post-hackathon events come to an end, I want to again thank everyone who attended!
Incredibly grateful to
@davederiso
@agihouse_org
for organizing the event!
🙏 to
@LererHippeau
& the NYC VC community for sponsoring it
Cool talk at
#iclr2023
—can you find a semantic latent space within a pre-trained generative model (like stable diffusion) and use it to enable controllable generation?
We shouldn't be using the same name for the model and its inference algo. GANs use same model as VAEs (hierarchical factor analysis), but also combine it with an approximation of two-sample test for training (the main innovation). Cool insight from Z. Gharamani @ NIPS panel.
Interesting idea: language models without any positional embeddings. They seem to get surprisingly good perplexities just by looking at word co-occurrences. Fun workshop paper from the Stanford AI group.
#icml2023
Short summary: scientists found phosphine in Venus atmosphere, especially in areas thought to be hospitable. On Earth, it is only produced by microbes in low-oxygen environments and by rare chemical reactions that are not expected to occur on Venus. Will be following this closely
I'm really stoked to be teaching applied machine learning with
@brandondamos
this semester!
AML is a masters-level course taken by 150-200 students each fall. Having Brandon as a co-instructor is an incredible opportunity to expose students to cutting-edge AI research.
📢 Today's my first day at Cornell Tech :)
I will continue my full-time position as a scientist at Meta, and am teaching ML here on the side for the semester with
@volokuleshov
. The course is open source and you can follow everything in this repo: