Cody Blakeney Profile
Cody Blakeney

@code_star

3,719
Followers
923
Following
818
Media
12,402
Statuses

Head of Data Research @MosaicML / @databricks | Formerly Visiting Researcher @ Facebook | Ph.D | #TXSTFOOTBALL fan |

Brooklyn, NY
Joined August 2011
Don't wanna be here? Send us removal request.
Pinned Tweet
@code_star
Cody Blakeney
2 months
Pretraining data experiments are expensive as measuring the impact of data on emergent tasks requires large FLOP scales. How do you determine what subsets of your data are important for the mixture of tasks you care about? We present Domain upsampling: a strategy to better
Tweet media one
8
36
179
@code_star
Cody Blakeney
2 years
I decided to turn my error into an image with stable diffusion. That seems about right.
Tweet media one
23
446
4K
@code_star
Cody Blakeney
2 years
I'm taking a graduate-level stats class for the first time. I now understand why all the stats people are mad at all the deep learning people.
61
166
3K
@code_star
Cody Blakeney
2 years
The next wave of startups seems to be PhD Students dropping out to build MLOps companies because they got good at training models and that turned out to be more valuable than their actual research
27
155
2K
@code_star
Cody Blakeney
7 months
People I work with have called me a “boomer” because I used tensorflow at the beginning of my PhD
47
25
878
@code_star
Cody Blakeney
4 months
It’s finally here 🎉🥳 In case you missed us, MosaicML/ Databricks is back at it, with a new best in class open weight LLM named DBRX. An MoE with 132B total parameters and 32B active 32k context length and trained for 12T tokens 🤯
Tweet media one
28
130
828
@code_star
Cody Blakeney
2 years
Everyday an AI researcher runs a hyper parameter sweep and half or more of the runs essentially equate to lighting a pile of cash on fire.
14
27
456
@code_star
Cody Blakeney
4 months
Me at work for the past 2 weeks
Tweet media one
@code_star
Cody Blakeney
4 months
It’s finally here 🎉🥳 In case you missed us, MosaicML/ Databricks is back at it, with a new best in class open weight LLM named DBRX. An MoE with 132B total parameters and 32B active 32k context length and trained for 12T tokens 🤯
Tweet media one
28
130
828
2
13
381
@code_star
Cody Blakeney
2 years
@gdequeiroz The best way I can articulate it is they care deeply about (or have worked hard at) proofs of things DL people just throw away. After several pages proving if you have an unbiased estimator of a parameter it's pretty annoying to see someone just doing a hyperparameter sweep.
6
5
349
@code_star
Cody Blakeney
4 months
High alpha in looking at the data
Tweet media one
8
16
260
@code_star
Cody Blakeney
11 months
Ok, but hear me out. a 7B model with the same performance as a 67B model is worth 7837x as much.
@sherjilozair
Sherjil Ozair
11 months
On small overtrained models 💪 To reach the loss of a 67B model, - A 33B model needs 2.3x compute 🚀 - A 13B model needs 25x compute 🤔 - A 7B model needs 7837x compute 🤡 - A 3B model can't match the 67B. Ever. 🪦 With love, from Chinchilla scaling laws.
Tweet media one
34
52
488
15
14
224
@code_star
Cody Blakeney
1 year
I'm pretty sure the reason LLMs are not "funny" is that it specifically goes against their programming. Good jokes typically subvert our expectations. Which is the opposite of what autoregressively maximizing the highest likelihood next token is designed for.
36
14
214
@code_star
Cody Blakeney
4 months
Day in the life of an LLM researcher
Tweet media one
2
11
213
@code_star
Cody Blakeney
1 year
TBF its pretty scarring
Tweet media one
3
22
205
@code_star
Cody Blakeney
1 year
Tweet media one
3
28
204
@code_star
Cody Blakeney
1 year
@maxisawesome538 @TylerAlterman Alpha meme -> zoomer news (duet) -> boomer news (actual news) -> millennial news (twitter)
2
5
192
@code_star
Cody Blakeney
3 years
I strongly believe that understanding how pruning/distillation works is the key to understanding how all neural networks work in general. I'm far less interested in "how many weights can we remove?" and more interested in "why heck can we remove them in the first place?!"
8
22
189
@code_star
Cody Blakeney
1 year
You asked for it and we listened! Today we are proud to announce the release of open-source MPT-30B. Same great architecture 1T tokens and now with 8k (and beyond) context! Try it now on our hugging face space.
8
32
179
@code_star
Cody Blakeney
6 months
SF is probably the only place on the planet you can be at a bar talking about tokenizers, and hear further down the bar someone else also talking about tokenizers. Some people love this, some people loathe this.
15
5
174
@code_star
Cody Blakeney
9 months
That’s not entirely true. We released an open source 30B model, described in great detail the data used to train it, and the framework to train it. Just add GPUs. Of course if you pay us, we make dealing with the infra much easier 😉
@borisdayma
Boris Dayma 🖍️
9 months
I think people underestimate how hard it is to train a large model like GPT-3 and up. Lots of challenges arise when reaching billions parameters, let alone 10B+ params (data management, training stability, parallelism...). Only a few have succeeded so far and the recipe is not
7
31
307
7
10
166
@code_star
Cody Blakeney
1 year
MPT-7B is back and better than ever! 8K context length: 😍 500B additional tokens: 🔥 Open Source: ✅
2
22
158
@code_star
Cody Blakeney
1 year
👀
Tweet media one
6
25
152
@code_star
Cody Blakeney
3 months
You like nlp, huh? Name every word.
8
14
148
@code_star
Cody Blakeney
2 years
@rasbt I’ll try 🥲
Tweet media one
11
2
144
@code_star
Cody Blakeney
3 years
Google researchers have better twitters than Facebook Researchers. You can't convince otherwise. Do they get more free time? 🧐
3
5
138
@code_star
Cody Blakeney
1 year
Do you guys ever think about tokenizers?
Tweet media one
9
16
137
@code_star
Cody Blakeney
2 years
One of my favorite genre of tweets is public radio hosts clapping back at people who asked them to do/not do exactly what they already did/didn’t
@NPRinskeep
Steve Inskeep
2 years
Thanks. This is such a great suggestion! In fact, the story DID read excerpts of the Declaration and then DID hear from “a diverse set” of Americans who relied on it through history. Too bad you didn’t listen! “Missed opportunity.” But it’s not too late:
9
12
289
5
7
123
@code_star
Cody Blakeney
4 months
@AlbalakAlon Yes! We trained a *new* MPT7B. Exact same arch and code. We were able to hit the same quality with half the number of tokens / training. Its not quite 2x reduction in training (larger tokenizer), but pretty dang close. We evaluated it on our newest version of guantlet.
6
11
118
@code_star
Cody Blakeney
1 year
You know machine learning isn’t even my job And it is NOT LLMs which is a common misconception Actually my job … is just … GPU
Tweet media one
2
16
115
@code_star
Cody Blakeney
4 years
@CornisDlaiLama @RexChapman The lady standing completely still? You must mean the police.
0
1
104
@code_star
Cody Blakeney
2 months
Sometimes I wonder how @jeremyphoward manages to get research done
@AMAZlNGNATURE
Nature is Amazing ☘️
2 months
Welcome to Australia
2K
5K
58K
5
3
99
@code_star
Cody Blakeney
2 months
kinda weak lol. I didn't think that was mean.
Tweet media one
@code_star
Cody Blakeney
2 months
Just say you don't know loop-daddy. Not everyone has to be cool.
3
0
31
15
0
96
@code_star
Cody Blakeney
4 months
Not only is it’s a great general purpose LLM, beating LLama2 70B and Mixtral, but it’s an outstanding code model rivaling or beating the best open weight code models!
Tweet media one
1
9
95
@code_star
Cody Blakeney
4 months
You’re telling me a data laid these bricks?
6
2
93
@code_star
Cody Blakeney
2 months
Worse then that I would argue that Marc was actually the PERFECT guest for the set of demos and live demonstrations google needed. They had lost credibility for staging demonstrations with the original Gemini presentations. They needed to show things working live and in real
@granawkins
Grant♟️
2 months
Heartbreaking that tons of people's introduction to Rebillet is "weird Google AI guy". In case you somehow missed it,
354
527
7K
4
3
93
@code_star
Cody Blakeney
4 months
“What do you mean you didn’t sweep the learning rate????!!”
Tweet media one
8
3
90
@code_star
Cody Blakeney
1 year
I still think the best use of chatGPT is just generating a template you can correct. Personally, editing requires a lot less mental strain than staring at a blank page.
5
5
88
@code_star
Cody Blakeney
1 year
@thechosenberg This is in fact the correct use of this meme.
1
0
76
@code_star
Cody Blakeney
8 months
Tweet media one
1
12
75
@code_star
Cody Blakeney
3 years
@bartbing71 This isn’t the right take away but I hate the hassle the most when I catch cheating. Like … can you cheat better so I can enjoy my evening?
0
0
73
@code_star
Cody Blakeney
1 year
I'm absolutely floored by all the community-driven projects around MPT-7B 🤯. Are you using it for something? Tell us ( @MosaicML ), we would love to hear it!
Tweet media one
3
5
69
@code_star
Cody Blakeney
2 years
I don’t have a SoundCloud but if you want to checkout the MLOps company I work for my boss (who hasn’t officially quit his PhD) would be very grateful We are trying to change the math on efficient training. Want to train imagenet in 27 min? Find out how
1
2
70
@code_star
Cody Blakeney
3 years
@kairyssdal how much do I need to donate to APM or Marketplace to have start the show off on a Wednesday saying "In Los Angeles, I am Kai Ryssdal it is Wednesday, my dudes!"
3
0
68
@code_star
Cody Blakeney
3 months
In light of recent releases, how do we feel about 8Bs with the same performance as 70Bs?
@code_star
Cody Blakeney
11 months
Ok, but hear me out. a 7B model with the same performance as a 67B model is worth 7837x as much.
15
14
224
8
4
64
@code_star
Cody Blakeney
1 year
MosaicML ends, MosaicML Shippuden begins. Rest assured that the power creep is just getting started.
Tweet media one
4
2
64
@code_star
Cody Blakeney
8 months
If it turns out Mistral’s new MoE is just 8 copies of its 7B trained “Branch, Train, Merge” style and compiled into an MoE. I suggest we call it “Mixture of Bastards” MoB.
6
1
65
@code_star
Cody Blakeney
1 year
This price to train a 13B feels off. It only cost us ~$200k to train MPT-7B 🤔
@stsDrex
🍉Daniel "Drex" Drexler🏳️‍🌈
1 year
I feel like a lot of the ideas about how to use AI are not in sync with the current cost realities. 1 MB per token! (from: )
Tweet media one
0
5
23
8
8
62
@code_star
Cody Blakeney
4 months
Give the model a try yourself in our hugging face space 🤗
3
6
62
@code_star
Cody Blakeney
2 years
Is anyone out here still using step/exponential decay instead of cosine annealing or linear decay for learning rate schedulers?
3
5
60
@code_star
Cody Blakeney
6 months
Fun deep learning tip. Make your global batch size divisible by lots of numbers. 960 is way better then 1024. Then you can train on far more combinations of gpus if you want to soak up more capacity. 64, 80, 124, 120, 240, 480 so many options.
6
0
60
@code_star
Cody Blakeney
4 years
@chrisalbon Docker is the solution. Fuck it, just send the whole OS and all the packages. I don't trust anyone.
5
2
60
@code_star
Cody Blakeney
11 months
@yacineMTB Many men, tried to take my job away.
0
0
58
@code_star
Cody Blakeney
1 year
Oh no .. sudden drops in loss. This ResNet needs to be shut down ... just in case.
2
2
61
@code_star
Cody Blakeney
1 year
Tweet media one
5
3
58
@code_star
Cody Blakeney
2 years
People writing DL papers with "Towards" in the title. Like bro where you headed?
7
7
57
@code_star
Cody Blakeney
4 months
I have to thank my amazing team (the @DbrxMosaicAI Data team @mansiege @_BrettLarsen @ZackAnkner Sean Owen and Tessa Barton) for their outstanding work. We have try made a generational improvement in our data. Token for token our data is twice as good as MPT7B was.
Tweet media one
2
4
58
@code_star
Cody Blakeney
3 months
Feels like there is a model missing from this triangle. 🤔
@rajko_rad
Rajko Radovanović ✈️ ICML 2024
3 months
Incredible performance and efficiency, all Apache 2.0 open, from the amazing @MistralAI team!!! I’m most excited for the SOTA OSS function calling, code and math reasoning capabilities!! Cc @GuillaumeLample @tlacroix6 @dchaplot @mjmj1oo @sophiamyang
Tweet media one
3
4
71
4
0
55
@code_star
Cody Blakeney
2 years
@Alan_Au @fchollet @rcalo @chr1sa So they are actively throwing out the resumes of creative proactive candidates… sounds like recruiters to me
0
0
50
@code_star
Cody Blakeney
2 years
Me asking my advisor what he thinks of my writing
Tweet media one
1
2
53
@code_star
Cody Blakeney
1 year
People have been talking on twitter about how few people can train XXbillion param LLMs, but I wonder how many people know the dark arts of building great tokenizers.
9
0
54
@code_star
Cody Blakeney
2 years
@MishraAmogh Sorry it’s not. I can share the course book though.
Tweet media one
1
2
52
@code_star
Cody Blakeney
4 months
At me next time 😅
@DwaraknathG
Dwaraknath Gnaneshwar
4 months
Me the past few weeks
Tweet media one
5
5
126
4
1
50
@code_star
Cody Blakeney
4 months
wow! you got into that *fast*. Yup that all looks right!
@danielhanchen
Daniel Han
4 months
Took a look at @databricks 's new open source 132 billion model called DBRX! 1) Merged attention QKV clamped betw (-8, 8) 2) Not RMS Layernorm - now has mean removal unlike Llama 3) 4 active experts / 16. Mixtral 2/8 experts. 4) @OpenAI 's TikToken tokenizer 100K. Llama splits
Tweet media one
24
173
1K
2
1
50
@code_star
Cody Blakeney
4 months
It’s so over
Tweet media one
5
1
49
@code_star
Cody Blakeney
4 months
IYKYK 😉
@PelosiTracker_
Nancy Pelosi Stock Tracker ♟
4 months
BREAKING 🚨: Nancy Pelosi just bought $5M of the AI company Databricks Unfortunately, Databricks is a privately held company and not available to be bought by the public Sorry people, you don’t have access to this one.
Tweet media one
290
2K
15K
3
1
48
@code_star
Cody Blakeney
1 year
Today is the first day of my big boy job. I'm excited to finally be full-time at @MosaicML ! 🥳 (now excuse me while I go flood our cluster with new experiments)
7
2
48
@code_star
Cody Blakeney
3 months
I've made it y'all
Tweet media one
@code_star
Cody Blakeney
3 months
Felt cute. Did some petabyte scale preprocessing. Might delete later.
4
1
41
4
0
47
@code_star
Cody Blakeney
1 year
Interesting things happening at MosaicML today. @jefrankle decided “he is the captain now”.
Tweet media one
3
1
48
@code_star
Cody Blakeney
3 months
Tweet media one
@Yampeleg
Yam Peleg
3 months
Trained a model for a full week But the results were total shit Machine Learning is so much fun Fuck my life, what have I done?
17
4
178
2
1
47
@code_star
Cody Blakeney
1 year
Python should be more like Zuck and give me threads.
@gdb
Greg Brockman
1 year
Much of modern ML engineering is making Python not be your bottleneck.
97
137
2K
4
3
45
@code_star
Cody Blakeney
2 months
We show that by upsampling high quality data at the end of training you can significantly improve the performance of your model on downstream tasks without spending additional flops, allowing us to measure the impact on emergent tasks with cheaper experiments.
Tweet media one
1
1
46
@code_star
Cody Blakeney
1 month
The new research @jefrankle ™️ sweater swag is kinda fire
Tweet media one
Tweet media two
4
1
45
@code_star
Cody Blakeney
1 year
you love to see it
Tweet media one
2
0
44
@code_star
Cody Blakeney
1 year
When people ask me how to train a good LLM.
Tweet media one
2
4
43
@code_star
Cody Blakeney
2 years
@ItsMePCandi @caradaze Just a guess. Spots are a zero sum game. Once too many tourists find out it’s harder for locals to go. 🤷‍♂️
2
0
42
@code_star
Cody Blakeney
3 months
I’m becoming a true believer of the dead internet theory.
8
2
42
@code_star
Cody Blakeney
4 years
@johnwil80428495 @UniversityStar Well alot of us love people that are old or have compromised immune systems. If we do the right things we can save lives.
0
0
42
@code_star
Cody Blakeney
4 months
*correction, not open weights. It’s a commercial friendly licensed model. You’ll have to forgive me I was up late 😅 feel free to download it and try it yourself.
3
3
43
@code_star
Cody Blakeney
1 year
@Tim_Dettmers Truly the shame should go further up the author list. That being said I think like 30-50% of deep learning papers of the last decade wouldn’t have been published if they had properly tuned baselines.
0
0
42
@code_star
Cody Blakeney
4 months
DBRX is the best open model on AI2 WildBench! 😀
@billyuchenlin
Bill Yuchen Lin 🤖
4 months
🆕 Check out the recent update of 𝕎𝕚𝕝𝕕𝔹𝕖𝕟𝕔𝕙! We have included a few more models including DBRX-Instruct @databricks and StarlingLM-beta (7B) @NexusflowX which are both super powerful! DBRX-Instruct is indeed the best open LLM; Starling-LM 7B outperforms a lot of even
Tweet media one
3
32
127
3
5
42
@code_star
Cody Blakeney
3 months
👀
Tweet media one
2
0
40
@code_star
Cody Blakeney
3 months
Felt cute. Did some petabyte scale preprocessing. Might delete later.
4
1
41
@code_star
Cody Blakeney
1 year
Nah bro, just send it
Tweet media one
2
2
40
@code_star
Cody Blakeney
2 months
@KAVARI_ looks like a death grips album cover
0
1
39
@code_star
Cody Blakeney
4 months
It’s coming back! The @jefrankle lost a bet with the unbelievably talented @mansiege and has been subjected to being rad. What an unfortunate turn of events.
Tweet media one
@code_star
Cody Blakeney
4 months
Which head of research at an AI company has the craziest hair?
4
0
31
5
4
39
@code_star
Cody Blakeney
3 months
I think people some people (not necessarily Jesse) misunderstood why there is a lack of transparency. Meta isn’t afraid of transparency, or giving up secret sauce. Big players will not disclose their data until case law over copyright/fair use is better defined. That doesn’t mean
@JesseDodge
Jesse Dodge
3 months
This follows the trend of large organizations releasing models and promoting their capabilities, while not providing the information necessary to understand their behavior: the training data. To be clear, this is expected, but also highlights the need for more transparency.
2
2
24
5
3
40
@code_star
Cody Blakeney
4 months
Words cannot express how excited I am about this. @lilac_ai is *the* best user experience I have found for exploring, cleaning, and understanding data for LLMs. I can’t wait to work with them to build the future of data!
@nsthorat
Nikhil Thorat
4 months
Incredibly excited to announce that @lilac_ai is joining @databricks ! With Lilac in Databricks, data curation for LLMs will be elevated to the next level: Enterprise AI 🚀🚀 A huge huge to everyone who’s supported us on this journey ❤️
44
14
221
1
1
40
@code_star
Cody Blakeney
1 year
I've become numb to my programming mistakes
Tweet media one
0
4
39
@code_star
Cody Blakeney
5 years
@Aaron_Torres @RSherman_25 pretty sure it was stanford that gave him the degree not the NCAA
1
0
37
@code_star
Cody Blakeney
1 year
Tweet media one
@code_star
Cody Blakeney
1 year
I can't believe it's finally happening. Tomorrow I don my wizard robes and become a Dr. Blakeney (again ... I'm still trying to figure out how that works). I'm gonna try and jump in the river if it isn't flooding. If y'all don't hear from me ... check the news.
2
0
20
9
1
38
@code_star
Cody Blakeney
7 months
@DimitrisPapail I’m really worried about telling them that before tensorflow I used R 😬
1
0
36
@code_star
Cody Blakeney
3 years
Ok I just got around to taking the time to learn how to use @weights_biases . Wow what a game changer. I can't believe I put it off this long.
1
4
38
@code_star
Cody Blakeney
2 months
Microsoft calling everything they do copilot now is kinda goofy
7
1
37
@code_star
Cody Blakeney
2 years
@NaveenGRao I feel like a python -> c/c++ translator would get you most of the way to what you want.
2
0
37
@code_star
Cody Blakeney
4 months
Me making memes all day to support the launch
Tweet media one
@code_star
Cody Blakeney
4 months
It’s finally here 🎉🥳 In case you missed us, MosaicML/ Databricks is back at it, with a new best in class open weight LLM named DBRX. An MoE with 132B total parameters and 32B active 32k context length and trained for 12T tokens 🤯
Tweet media one
28
130
828
0
5
37
@code_star
Cody Blakeney
4 months
@_philschmid @OpenAI I knew it was April 1st and I still clicked it. I deserve this.
2
0
37
@code_star
Cody Blakeney
10 months
Me every time I use s3: Do you know who IAM?!!
2
2
36
@code_star
Cody Blakeney
1 year
If you are hiring anything ML/NN related reach out to my boy. We were in the same PhD cohort. Half of my good ideas in my dissertation he helped me brainstorm. One of the best python programmers I know. Immigration laws in this country are bs and have him scrambling.
@iamkaysb
Keshav Bhandari
1 year
Well, bad news . I had to leave Tesla. I have a tight deadline of August 14th to get a new employer and save my immigration status😬. However, I refuse to let this setback define my journey. I am more determined than ever to continue my work in the world of #AI and #DNN !
2
11
28
1
20
35