One of my professors told me that you can measure your growth as a researcher by how recently somebody else had your "brilliant idea.
I'm proud to say that nowadays, my best new ideas were thought of only in the last decade.
🥹🥹🥹
I got really really lucky to be surrounded by amazing people. I'm beyond honored that the legendary
@madsjw
@SharonYixuanLi
and Jerry Zhu were on my committee.
And somehow I managed to get the best advisor in the world -
@DimitrisPapail
#PhDone
Congratulations to Dr.
@KartikSreeni
for successfully defending his PhD!
What an incredible journey these past 4 years was. I learnt so much from you, and was incredibly rewarding to serve as your advisor. You're amazing! I can't wait to see your next steps at
@MosaicML
🚀🚀🚀
"Give me a context length long enough and enough GPUs to run it and I shall move the world"
- Archimedes
We may not have fixed hallucination yet at
@MosaicML
, but we sure just released an amazing model with an 8k context length! Go play with MPT-30B now!
Hey folks,
I'm here at
#ICML2023
to present our exciting new work at the TEACH workshop on Saturday (thread below).
Meanwhile, DM me or wave me down if you want to talk about LLMs, pruning,
@MosaicML
,
@databricks
or just wanna get coffee.
Sometimes, I go on hikes to get coffee
1/ Our paper is out!
Teaching Arithmetic to Small Transformers
We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT).
paper:
Work led by:
@nayoung_nylee
&
@KartikSreeni
Thread below.
Our paper is finally out! We wanted to use basic arithmetic as a lens to better understand the phenomenon of emergence in transformers.
I thoroughly enjoyed working on this project with my amazing collaborators
@nayoung_nylee
,
@jasondeanlee
@Kangwook_Lee
@DimitrisPapail
1/ Our paper is out!
Teaching Arithmetic to Small Transformers
We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT).
paper:
Work led by:
@nayoung_nylee
&
@KartikSreeni
Thread below.
I got to catch up with my Integer Programming TA after so many years! So nice to catch up with
@silviadigre
who put up with all my stupid questions, and taught me many of the elegant proof techniques in discrete optimization :)
Thank you
#ICML2023
Hey folks,
I got on a podcast with my friend
@schematical
! It was sort of impromptu, casual and really fun! We discussed a bunch of things primarily about LLMs, what they do, and how to train them. Also, there were puppies in the room.
This is really foolish and problematic advice. A good advisor will be overjoyed and proud if you come up with a great idea. A good advisor is also not petty enough to want "credit" for your ideas. You're a team.
Stop playing political games and focus on doing good research.
Never appear smarter than your supervisor.
If you come up with a great idea – and you will come up with plenty – always attribute it to your supervisor’s mentorship even if they had nothing to do with.
This will make your supervisor feel smarter than they actually are.
I remember sitting beside
@aditi_jh
as she relentlessly evaluated a massive number of models in pursuit of the ultimate fine-tuning dataset.
This work shows that a small IFT dataset, if carefully chosen, can get you to the promised land of instruct models - Exquisite Vibes.
Excited to share work from my internship with the amazing people at
@MosaicML
! 🎉
How should you finetune a Large Language Model for general purpose instruction following?
Check out LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms!
Llama2-70B-Chat is now available on our Inference service! I know our team has been working really hard to make sure this works seamlessly, so I'm excited to see that it's live!
If you're a University, hire this guy. He's awesome :)
I doubt you're gonna find another person as passionate, knowledgeable and hard working as Hongyi!
1/ I am currently on the academic job market, applying for Assistant Professor positions in any field related to CS!
My research focuses on ML & Systems, specifically on computation- and communication-efficient distributed ML, efficient computing in LLMs, and federated learning.
The one and only
@giannis_daras
is giving a talk at UW-Madison in our MLOpt idea seminar series! I can't wait to pick his brain about all the cool stuff he's been working on
📢: Tomorrow, at 12:30 Central Time, I am giving a talk at UW-Madison.
I will present two accepted papers at NeurIPS 2023 🥳: Consistent Diffusion Models (not to be confused with Consistency Models🤷♂️) and Ambient Diffusion.
Feel free to join us remotely or in person 👇
@nayoung_nylee
and I will be at the at the TEACH workshop in
#ICML2023
tomorrow presenting our new paper on teaching transformers arithmetic. We'll be at both the 10-10:30am and 3-3:30pm poster sessions. We're very excited about this work and would love to hear what you think!
1/ Our paper is out!
Teaching Arithmetic to Small Transformers
We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT).
paper:
Work led by:
@nayoung_nylee
&
@KartikSreeni
Thread below.
Please read and share. I desperately want to believe that this story will end well and end soon, but I don't know how.. I was extremely worried when covid first hit but somehow, magically, India seemed okay. Now, the nightmare has begun.
#CovidIndia
I’m seeing a massive disconnect (more so than usual) between the two twitter worlds I usually inhabit — the US and India. It is important to bridge the two now for a very important reason:
#CovidIndia
Please read the whole thread and amplify.
THREAD
1/17
Thank you
@DimitrisPapail
for being such an amazing advisor!!! Your help and guidance have been invaluable throughout my phd! I am also extremely lucky to have had the opportunity to collaborate with some really brilliant researchers from various universities and organizations!
@DimitrisPapail
One of the many lessons you taught me in my PhD - "Life is too short to be drinking bad coffee".
I think about this almost ever day while I brew.
I'm at
#NeurIPS2022
in New Orleans. Come talk to us at our poster presentation tomorrow at 4pm in Hall J to talk about pruning at initialization!
Or hmu to chat about anything ML, DL or Beignets!
As always, excellent summary by
@davisblalock
!
Going forward, I want to think about how to nudge models to learn "addition in general" and I don't have a good answer yet.
I think of this as trying to choose data such that bad shortcuts no longer lead to 0 loss.
4 points stood out to me:
1) Models only learn to add the number of digits they saw during training, not how to do addition in general.
2) Models are way better at adding if you let them output lower digits first instead of higher ones; this lets them compute the answer one
If you've used PyTorch, you know
@ptrblck_de
is truly a superhero and for sure have been saved by him when you thought it was impossible to do something.
Thanks a ton for awarding me the
@PyTorch
superhero award at the conference! It really made my day, and it was great seeing so many community members in real life. You all are amazing!
Excited to share that I successfully defended my PhD earlier this week.
Immensely thankful to my advisors
@JeffLinderoth
&
@jim_luedtke
for their exceptional guidance and making research fun.
Looking forward to the exciting opportunities that lie ahead.
I really don't understand why analysis gets such a bad rap. I think it helped for me that I had a fantastic instructor. But it's also such a beautiful subject. And it really taught me how to think formally. It is still probably my favorite course.
Was just feeling like the world was moving a little too quickly, so I sat down and listened to
@corywong
's Ellie:
I imagined myself to be like that little kid, calmly enjoying herself in the stroller while the chaos of NYC rages by.
It worked!
Is it possible that Cosmo Kramer from Seinfeld was just high all the time? Rewatching Seinfeld for the 17th time, I think I see it. Giddyup!
#Seinfeld
#Netflix
@code_star
@aditi_jh
@maxisawesome538
Definitely. And I'm probably gonna take flak for this, but I think it's incredibly overpriced for fairly mediocre coffee. Nowhere near the pareto frontier. So I try to avoid it. I don't like the vibes either.
@TaliaRinger
I agree completely with this! The day I realized math was "just" a very unambiguous formal language used to represent knowledge was a very happy day for me. It didn't change my ability to learn it, but it did make me love it so much more. .
.
@ZelenskyyUa
's tv address to the Russian (!) people might be the most moving speech that I've ever seen in my entire life. The whole world needs to see, understand and share this crucial Ukrainian message.
#StandWithUkraine
#Ukraine
#Україна
#Russia
#Россия
@thegautamkamath
@DimitrisPapail
I agree that reviewing shouldn't be independent of previous rounds. But I'm worried that an "unfair" review in the past will bias future reviewers. Because occasionally, critique is subjective
“Truth in mathematics means you convince the experts that your proof is correct. Then it becomes true,”
- Peter Teichner
I often think about the difference between "truth" and a "proof". Is it how easy it is to convince others?
P.S: While academia isn't perfect, there are plenty of extremely good advisors who fit the above criterion. I had the pleasure of working with several such folk. If you feel the need to resort to such strategies, I'm sorry, academia has failed you :(
[📜New preprint] Interested in statistical inference under communication constraints 📡?
Let's look at the basic problem, (simple) binary hypothesis testing: Given n iid samples known to be sampled from either distribution p or distribution q, identify the true distribution
1/x
I used to always struggle with grasping the concept of higher dimensions until I heard Gilbert Strang say "Well.. It's going to be hard to visualize that. I don't pretend to do it. But somehow, pretend you do."
@_BrettLarsen
This is really interesting! I've always been curious what the optimization landscape looks like in the mask domain. Do you think this analysis is specific to IMP or something more general? I'm also curious how much this analysis has to do with the "warm-up" period
I agree completely with
@ylecun
here and I'm so excited that a titan of my field is also standing up for a music form that I love :)
If I could add a far less eloquent comment to an already excellent response: "Haters gonna hate."
It's not a controversial opinion.
It's just wrong.
People who say a particular kind of music or painting is "formless and meaningless" simply do not understand its structure.
They simply cannot fathom how other people can understand and appreciate the structure they can't grok.
@akhileshsoni_
Thanks Akhilesh :)
That is for damn sure! We should plan to meet up on the west coast. I'll try to sneak a stash of spotted cows in my luggage.
@thegautamkamath
Intellectual curiosity has been my main motivation as well! Although, I find that it's not always up to the task of fending of the challenges of grad school. Some say it's better if the motivation is more tangible - like a career.
Random thought. Is the backbone of Full Metal Alchemist Brotherhood (the whole truth/gate thing) and it's ending based on Franz Kafka's Before the law?