Alex Gajewski @apagajewski Twitter profile

Pinned Tweet

Alex Gajewski

@apagajewski

9 months

We just released TinyNarrations, a big audio dataset of very simple stories that a toddler could understand 🎵🎶

evan conrad

@evanjconrad

9 months

today, @sfcompute is releasing 3.5 years of audio narrations based on TinyStories, as a publicly available dataset, which we hope will help will help folks explore multimodal models

6

11

163

1

14

Last Seen Profiles

@kimietalib

@stwmaniax

@Uwe1K

@BinorRaja

@BevansAdvocate

@eva_maniis

@bokeplokalmalam

@go4585

@galery_basah10

@wakojaco99

@DIMEletroEletr1

@nfNyx2yAJDJDvcE

@MukombaAnd32340

@liso_mochi

@HawwOfficial

@bokeplokalmalam

@cornudo

@heteraflexivel

@bokeplokalmalam

@JorgeluisL91307

@Groton55

@joya_hinokino

@bokeplokalmalam

@FriedmanTh35144

@LegendYT2

@jerrycheese69

@NFWrestling

@urubuvingador81

@bokeplokalmalam

@Hijabbacol2883

@martinw21030634

@inclination_199

@soldejaneiro

@LisaMascaro

@band_co

Alex Gajewski

@apagajewski

1 year

are you a startup / academic lab that needs H100s? @evanjconrad and I are building a 512 H100 cluster for startups @ , <$2/hr

9

32

275

Alex Gajewski

@apagajewski

2 years

Delighted to be co-organizing AI Grant with @evanjconrad ! I can’t wait to see what you all come up with. Let the games begin!

Nat Friedman

@natfriedman

2 years

Thrilled to be investing $10M with @danielgross in AI Grant to support the new wave of founders building AI-first products. Apply now at !

73

252

2K

3

21

182

Alex Gajewski

@apagajewski

5 years

Announcing our EvoGrad library for grad-based evolution + our Evolvability ES meta-learning algorithm: can scale to deep nets, compete w/ MAML in RL. With @jeffclune , @kenneth0stanley , and @joelbot3000 . Blog: , paper:

0

36

132

Alex Gajewski

@apagajewski

2 years

Super excited to be releasing Metaphor to everyone today!

Exa

@ExaAILabs

2 years

is now publicly available! Metaphor is a search engine based on generative AI, the same sorts of techniques behind DALL-E 2 and GPT-3 1/

84

570

3K

7

5

109

Alex Gajewski

@apagajewski

9 months

We think our next SF compute cluster is the largest h100 cluster in the world that can support bursts right now It's 1k h100s coming in March, and it can do bursts as short as 3 months

3

4

54

Alex Gajewski

@apagajewski

3 years

We’re releasing Wanderer 2 today! It learned a much fuzzier search function than Google, so you can search in very abstract terms. We put lots of examples on the website, and we’re serving the model live so you can experiment with your own searches:

Exa

@ExaAILabs

3 years

Today we’re releasing Wanderer 2, a large language model trained to search over the 2.5 million pages that have been posted to Hacker News! You can play with it here:

5

2

33

3

1

21

Alex Gajewski

@apagajewski

10 months

we just launched self-serve! you can now book clusters of thousands of H100s from the comfort of your web browser

evan conrad

@evanjconrad

10 months

As of today, @sfcompute now just shows you the lead times, calendar, and price without ever needing to talk to a salesperson

10

11

252

0

22

Alex Gajewski

@apagajewski

9 months

the san francisco compute mercantile exchange 👀

1

18

Alex Gajewski

@apagajewski

1 year

anyway let us know if you need compute! our DMs are open, and you can fill out the form on

The San Francisco Compute Company | Training clusters | H100s with 3.2tb/s InfiniBand

Large, low-cost pre-training clusters you can rent by the hour. Get H100s with 3.2tb/s InfiniBand, with parallel storage, fast networking, and priority support.

sfcompute.com

1

0

12

Alex Gajewski

@apagajewski

1 year

it's just that none of the cloud providers will give you a bunch of compute for a short time, so you have to buy the compute / rent for 3 years buying 128 A100s is closer to $2m

1

0

13

Alex Gajewski

@apagajewski

1 year

there's this sense that pretraining is super expensive and out of reach unless you raise $40m, but that's not really true you ought to be able to train stable diffusion on probably 128 A100s in a month, about $100k

1

0

12

Alex Gajewski

@apagajewski

1 year

the lower the economic barrier to doing large scale pretraining, the more companies and labs will do it I'm hoping that over the next few years, there will be super diverse and interesting scaled up models that people try

2

0

11

Alex Gajewski

@apagajewski

1 year

we'd love to get to the point where we can give people bursts at gpt-4 scale, something like ~20m worth of h100-hours over 2-3 months

1

0

10

Alex Gajewski

@apagajewski

1 year

we're setting it up so that you can get a slice of the cluster for just one training run, say 128 H100s for a week

1

0

11

Alex Gajewski

@apagajewski

2 years

LLM evaluation as a service:

Mohak Sharma

@mohak__sharma

2 years

" @HoneyHiveAI is an example of companies working to help developers iterate on underlying model architectures, modify prompts, filter & add novel training sets, & even distill models." Great write up @palak_go & @jturow ! Thread 👇

1

3

10

0

1

7

Alex Gajewski

@apagajewski

1 year

these days YC gives you $500k, in theory enough to have two or three shots at training a model at stable diffusion scale

1

0

7

Alex Gajewski

@apagajewski

9 months

In 3 months with 1k h100s, you could train a model approaching gpt-4 quality, for $6m (Buying 1k h100s outright is about $38m, excluding power) So this is about a factor of 6 improvement in the cost of training a big model

2

0

6

Alex Gajewski

@apagajewski

1 year

the same is true at larger scale, say if you want to do something on the order of a llama 2 (70b): - reserving a cluster for 3 years is ~50m - 3 shots at a one month training run is ~5m

1

0

6

Alex Gajewski

@apagajewski

2 years

Go join the commons!

The Commons

@thesfcommons

2 years

Today we're officially announcing and opening applications to Campus 🏛️ 1/ A “school” for the soul, intellect, and inner-child: 🔭community-created curriculum exploring our curiosities 🌱 50+ circles, salons, juntos and extracurriculars 🤓 "nerd prom" at end of the qtr

3

20

118

0

1

5

Alex Gajewski

@apagajewski

2 years

@officialKrishD @evanjconrad You need to have a Delaware C Corp for us to invest in, but you should be able to do that from anywhere in the world with

1

0

4

Alex Gajewski

@apagajewski

9 months

@mr_robot1234567 . it's great

Excalidraw — Collaborative whiteboarding made easy

Excalidraw is a virtual collaborative whiteboard tool that lets you easily sketch diagrams that have a hand-drawn feel to them.

excalidraw.com

1

0

4

Alex Gajewski

@apagajewski

9 months

We think we can line up financing for bursts on a 25k h100 cluster—if we can pull it off, that will probably be enough to compete with gpt-5, again at a fraction of the cost

0

7

Alex Gajewski

@apagajewski

9 months

This is a cool idea—iirc in the alphazero paper, they did an average instead of minimax because it was more stable for neural networks to learn, but possible that something in between an average and a max would work

Peter Schmidt-Nielsen

@ptrschmdtnlsn

9 months

I found this message in my DMs ages ago. Does anyone know if this has been tried for AlphaZero-style game engines?

3

0

13

1

4

Alex Gajewski

@apagajewski

10 months

@EmilWallner @evanjconrad @sfcompute Slightly different on our smallest cluster and on the bigger ones—smallest has 10G network, bigger have 100G, both have between 1-2TB ram, at least 7TB disk, 8 H100s with NVLink If you want like a longer spec sheet you can send us an email at team @sfcompute .com

1

0

1

Alex Gajewski

@apagajewski

9 months

Excited to see what people come up with! If you do end up making anything with TinyNarrations, shoot me a DM or send an email to team @sfcompute .com

1

0

2

Alex Gajewski

@apagajewski

9 months

And thanks to @G413N for all his work building this dataset! I think he has a good shot at solving generative audio 🪇

0

2

Alex Gajewski

@apagajewski

9 months

And even with mamba, you probably don't want to run it directly on 24kHz waves (although @_albertgu has tried)

1

0

2

Alex Gajewski

@apagajewski

1 year

@SkyLi0n Ah yep sorry it’s not super clear, but yes

0

Alex Gajewski

@apagajewski

9 years

@hdgigante @ApricityOS Logout and click your username; there should be a cog next to the login button. If you click it, what items show up?

0

1

Alex Gajewski

@apagajewski

4 years

@rabois I'll move my group house to Miami if you give a short talk/Q&A at Project School (). How does that sound?

0

1

Alex Gajewski

@apagajewski

9 months

The goal is to try to disentangle how much compute it takes to be able to generate the semantic content of these stories from how much compute it takes to generate the audio of someone narrating them

1

0

2

Alex Gajewski

@apagajewski

9 months

Audio is an interesting modality because it has so many bits—a wav file might have 24,000 floating point numbers per second. Compared to text, which is maybe 3 integers (tokens) per second

1

0

2

Alex Gajewski

@apagajewski

9 months

Either you've got to downsample your sequences down to a few tokens per second so that a transformer can look at more than a few seconds at a time, or you've got to use something very different like

GitHub - mamba-org/mamba: The Fast Cross-Platform Package Manager

The Fast Cross-Platform Package Manager. Contribute to mamba-org/mamba development by creating an account on GitHub.

github.com

1

0

2

Alex Gajewski

@apagajewski

10 months

@EmilWallner @evanjconrad @sfcompute We don’t have a button for postponing instances, but people often ask us to move around their reservations and we do our best to push the calendar around

1

0

1

Alex Gajewski

@apagajewski

9 months

We're hoping that folks use this dataset to try to isolate what's uniquely hard about audio from what's hard in general about sequence modeling

1

0

2

Alex Gajewski

@apagajewski

9 months

I think it ought to be possible to get an audio model trained on TinyNarrations that's semantically as good as a text model on TinyStories with ~5x more compute. Maybe 1.5x.

1

0

2

Alex Gajewski

@apagajewski

9 months

To train a good voice model, you need to somehow get the model to spend most of its effort thinking about the words in the data, and quickly compress all the other stuff going on in the audio

1

0

2