Alex Gajewski Profile Banner
Alex Gajewski Profile
Alex Gajewski

@apagajewski

2,156
Followers
749
Following
3
Media
48
Statuses

making AI markets efficient @sfcompute , prev founder @metaphorsystems

San Francisco
Joined June 2014
Don't wanna be here? Send us removal request.
Pinned Tweet
@apagajewski
Alex Gajewski
9 months
We just released TinyNarrations, a big audio dataset of very simple stories that a toddler could understand 🎵🎶
@evanjconrad
evan conrad
9 months
today, @sfcompute is releasing 3.5 years of audio narrations based on TinyStories, as a publicly available dataset, which we hope will help will help folks explore multimodal models
6
11
163
1
1
14
@apagajewski
Alex Gajewski
1 year
are you a startup / academic lab that needs H100s? @evanjconrad and I are building a 512 H100 cluster for startups @ , <$2/hr
Tweet media one
9
32
275
@apagajewski
Alex Gajewski
2 years
Delighted to be co-organizing AI Grant with @evanjconrad ! I can’t wait to see what you all come up with. Let the games begin!
@natfriedman
Nat Friedman
2 years
Thrilled to be investing $10M with @danielgross in AI Grant to support the new wave of founders building AI-first products. Apply now at !
Tweet media one
73
252
2K
3
21
182
@apagajewski
Alex Gajewski
5 years
Announcing our EvoGrad library for grad-based evolution + our Evolvability ES meta-learning algorithm: can scale to deep nets, compete w/ MAML in RL. With @jeffclune , @kenneth0stanley , and @joelbot3000 . Blog: , paper:
0
36
132
@apagajewski
Alex Gajewski
2 years
Super excited to be releasing Metaphor to everyone today!
@ExaAILabs
Exa
2 years
is now publicly available! Metaphor is a search engine based on generative AI, the same sorts of techniques behind DALL-E 2 and GPT-3 1/
84
570
3K
7
5
109
@apagajewski
Alex Gajewski
9 months
We think our next SF compute cluster is the largest h100 cluster in the world that can support bursts right now It's 1k h100s coming in March, and it can do bursts as short as 3 months
Tweet media one
3
4
54
@apagajewski
Alex Gajewski
3 years
We’re releasing Wanderer 2 today! It learned a much fuzzier search function than Google, so you can search in very abstract terms. We put lots of examples on the website, and we’re serving the model live so you can experiment with your own searches:
@ExaAILabs
Exa
3 years
Today we’re releasing Wanderer 2, a large language model trained to search over the 2.5 million pages that have been posted to Hacker News! You can play with it here:
5
2
33
3
1
21
@apagajewski
Alex Gajewski
10 months
we just launched self-serve! you can now book clusters of thousands of H100s from the comfort of your web browser
@evanjconrad
evan conrad
10 months
As of today, @sfcompute now just shows you the lead times, calendar, and price without ever needing to talk to a salesperson
Tweet media one
10
11
252
0
0
22
@apagajewski
Alex Gajewski
9 months
the san francisco compute mercantile exchange 👀
1
1
18
@apagajewski
Alex Gajewski
1 year
it's just that none of the cloud providers will give you a bunch of compute for a short time, so you have to buy the compute / rent for 3 years buying 128 A100s is closer to $2m
1
0
13
@apagajewski
Alex Gajewski
1 year
there's this sense that pretraining is super expensive and out of reach unless you raise $40m, but that's not really true you ought to be able to train stable diffusion on probably 128 A100s in a month, about $100k
1
0
12
@apagajewski
Alex Gajewski
1 year
the lower the economic barrier to doing large scale pretraining, the more companies and labs will do it I'm hoping that over the next few years, there will be super diverse and interesting scaled up models that people try
2
0
11
@apagajewski
Alex Gajewski
1 year
we'd love to get to the point where we can give people bursts at gpt-4 scale, something like ~20m worth of h100-hours over 2-3 months
1
0
10
@apagajewski
Alex Gajewski
1 year
we're setting it up so that you can get a slice of the cluster for just one training run, say 128 H100s for a week
1
0
11
@apagajewski
Alex Gajewski
2 years
LLM evaluation as a service:
@mohak__sharma
Mohak Sharma
2 years
" @HoneyHiveAI is an example of companies working to help developers iterate on underlying model architectures, modify prompts, filter & add novel training sets, & even distill models." Great write up @palak_go & @jturow ! Thread 👇
1
3
10
0
1
7
@apagajewski
Alex Gajewski
1 year
these days YC gives you $500k, in theory enough to have two or three shots at training a model at stable diffusion scale
1
0
7
@apagajewski
Alex Gajewski
9 months
In 3 months with 1k h100s, you could train a model approaching gpt-4 quality, for $6m (Buying 1k h100s outright is about $38m, excluding power) So this is about a factor of 6 improvement in the cost of training a big model
2
0
6
@apagajewski
Alex Gajewski
1 year
the same is true at larger scale, say if you want to do something on the order of a llama 2 (70b): - reserving a cluster for 3 years is ~50m - 3 shots at a one month training run is ~5m
1
0
6
@apagajewski
Alex Gajewski
2 years
Go join the commons!
@thesfcommons
The Commons
2 years
Today we're officially announcing and opening applications to Campus 🏛️ 1/ A “school” for the soul, intellect, and inner-child: 🔭community-created curriculum exploring our curiosities 🌱 50+ circles, salons, juntos and extracurriculars 🤓 "nerd prom" at end of the qtr
Tweet media one
3
20
118
0
1
5
@apagajewski
Alex Gajewski
2 years
@officialKrishD @evanjconrad You need to have a Delaware C Corp for us to invest in, but you should be able to do that from anywhere in the world with
1
0
4
@apagajewski
Alex Gajewski
9 months
We think we can line up financing for bursts on a 25k h100 cluster—if we can pull it off, that will probably be enough to compete with gpt-5, again at a fraction of the cost
0
0
7
@apagajewski
Alex Gajewski
9 months
This is a cool idea—iirc in the alphazero paper, they did an average instead of minimax because it was more stable for neural networks to learn, but possible that something in between an average and a max would work
@ptrschmdtnlsn
Peter Schmidt-Nielsen
9 months
I found this message in my DMs ages ago. Does anyone know if this has been tried for AlphaZero-style game engines?
Tweet media one
3
0
13
1
1
4
@apagajewski
Alex Gajewski
10 months
@EmilWallner @evanjconrad @sfcompute Slightly different on our smallest cluster and on the bigger ones—smallest has 10G network, bigger have 100G, both have between 1-2TB ram, at least 7TB disk, 8 H100s with NVLink If you want like a longer spec sheet you can send us an email at team @sfcompute .com
1
0
1
@apagajewski
Alex Gajewski
9 months
Excited to see what people come up with! If you do end up making anything with TinyNarrations, shoot me a DM or send an email to team @sfcompute .com
1
0
2
@apagajewski
Alex Gajewski
9 months
And thanks to @G413N for all his work building this dataset! I think he has a good shot at solving generative audio 🪇
0
0
2
@apagajewski
Alex Gajewski
9 months
And even with mamba, you probably don't want to run it directly on 24kHz waves (although @_albertgu has tried)
1
0
2
@apagajewski
Alex Gajewski
1 year
@SkyLi0n Ah yep sorry it’s not super clear, but yes
0
0
0
@apagajewski
Alex Gajewski
9 years
@hdgigante @ApricityOS Logout and click your username; there should be a cog next to the login button. If you click it, what items show up?
0
0
1
@apagajewski
Alex Gajewski
4 years
@rabois I'll move my group house to Miami if you give a short talk/Q&A at Project School (). How does that sound?
0
0
1
@apagajewski
Alex Gajewski
9 months
The goal is to try to disentangle how much compute it takes to be able to generate the semantic content of these stories from how much compute it takes to generate the audio of someone narrating them
1
0
2
@apagajewski
Alex Gajewski
9 months
Audio is an interesting modality because it has so many bits—a wav file might have 24,000 floating point numbers per second. Compared to text, which is maybe 3 integers (tokens) per second
1
0
2
@apagajewski
Alex Gajewski
9 months
Either you've got to downsample your sequences down to a few tokens per second so that a transformer can look at more than a few seconds at a time, or you've got to use something very different like
1
0
2
@apagajewski
Alex Gajewski
10 months
@EmilWallner @evanjconrad @sfcompute We don’t have a button for postponing instances, but people often ask us to move around their reservations and we do our best to push the calendar around
1
0
1
@apagajewski
Alex Gajewski
9 months
We're hoping that folks use this dataset to try to isolate what's uniquely hard about audio from what's hard in general about sequence modeling
1
0
2
@apagajewski
Alex Gajewski
9 months
I think it ought to be possible to get an audio model trained on TinyNarrations that's semantically as good as a text model on TinyStories with ~5x more compute. Maybe 1.5x.
1
0
2
@apagajewski
Alex Gajewski
9 months
To train a good voice model, you need to somehow get the model to spend most of its effort thinking about the words in the data, and quickly compress all the other stuff going on in the audio
1
0
2