Organizing the world's information and making it universally accessible and useful using JAX
@Google
@Deepmind
. Co-designer/implementor of JET/WIZ & RIZZ.
I grew up with Google. When life dropped me in difficult situations, Search gave me a doorway into a world that otherwise seemed so out of reach. I have my entire life to be grateful for.
Working tirelessly with this team to get to this milestone has been a privilege I donโtโฆ
Iโm very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks,
I used Gemini 1.5 Pro to write code so that Gemini 1.5 Pro could see all my tabs so that Gemini 1.5 Pro could assist me in making Gemini 1.5 Pro faster so that I could use Gemini 1.5 Pro to write codeโฆ
I remember in the past when prototyping this I had to spend a painful amount
@swyx
@polynoamial
@_sholtodouglas
@GoogleDeepMind
Iโll do it for him. He has a deep understanding of compute constraints and recent research that he carries in a very Richard Hamming TADSE like way. Iโve benefited tremendously from mirroring this from him and
@jekbradbury
. With this at least a couple of times heโs burnt theโฆ
TPUs and the abstractions built on top of them are so good that they are almost ahead of their time. Itโs hard to realize this without scaling to hundreds and thousands of devices, but once you do itโs clear parallelism and fusions being declarative and compiler driven are a
โI know I should be using Jax. A friend just shamed me for using pytorch, but this is a startup and we decided to move fast. Iโm sorry we would do better to be cooler next time. Iโm not proud of this fact.โ
@_sholtodouglas
@0interestrates
@nearcyan
My first real/long interaction with him when he started coming in to help on Gemini was staying at the office until 1am staring at a Colab, fixing a gnarly numerics bug. He grasped my code in minutes. Top tier engineer, scientist, and a pleasure to work with.
When
@sharadvikram
started Pallas, his dream was to never have to write a kernel in C++ again.
Together with the gembros, we finally achieved that (internally).
Pallas now lets you define manual TPU pipelines in Python, and compose them. For example, weโve been able to
JAX + Pathways + XLA + TPUs let us optimize new models for serving in days or weeks instead of having to spend months writing kernels every time a researcher changes something.
With the v5e's being such a scalable chip and so low cost being landed in the millions Google can undercut the pricing of anyone else for GPT-4 level models & still make a profit.
The large context window attention mechanism is also ideal for the v5e-256 pods vs GPUs..
Composable. Functional. Transformable. Multi-backend.
Just works ๐ at scale. Can drop to manual per device code seamlessly. Juice everything out of the hardware with Pallas.
So much to love and so much to be excited for in the future of JAX.
Itโs been quiet at the lab, with 2x more CE than the next, since
@_sholtodouglas
got deported for codingcodingcoding too much instead of working on his visa.
But, heโll be back soon and weโve been cooking.
๐ธ
@CharlieXYChen
chief photo officer, 3d printer in residence, and all
If people knew and understood the details of all the things people make fun of Google for. They would get a 5 second haha and we would move on. Things are not always that deep. Most of the time theyโre not. Google has great leadership that loves people and wants the best for
Can I say something without people getting mad.
Iโm bullish on Google. They helped lay down the foundations of the AI revolution. Long history of open source and research contributions. They have amazing people.
Lots to be grateful for and optimistic theyโll iron out the kinks.
I still donโt know how to make a website at Google, but hey, with only a few hundred lines of Beam I was able to wield thousands of resting TPUs across North America to run some JAX in a multi step pipeline. Set and forget style. Pretty good tech tree tradeoff for the times if
@swyx
@polynoamial
@_sholtodouglas
@GoogleDeepMind
@jekbradbury
midnight oil days on end to save the team from making a big mistake or overlooking a big opportunity on compute efficiency. He was also a key part of the design and development of the inference engine and infrastructure. People who can both think like this and change thingsโฆ
Sholto, you are doing wonders for my imposter syndrome ๐ค
I canโt emphasize enough how important this kind of inclusivity is for everyone and how thankful I am for all the countless hours of mentorship Iโve received from experts in various fields. I hope one day AI can openโฆ
Heโs 25, and didnโt go to college! That same colleague speculated โMaybe he just doesnโt know to be scared of thingsโ.
Both of us have benefited immensely from the kindness and time of everyone at
@GoogleDeepMind
. Brilliant experts will spend hours patiently explaining almost
Adulthood brings the realization that the natural state of things is decay. By being productive, weโre not just contributing to society, weโre locally clawing back entropy for the good of all. In a way, training a model reflects this, bringing order through entropy minimization,
have the words to describe. This is the next era of computing. What Search did for knowledge we can now do for expertise. Something still stuck in the manuscript era for the majority of the world.
Now back to work! We will continue to organize the world's information and makeโฆ
Weโre sharing Project Astra: our new project focused on building a future AI assistant that can be truly helpful in everyday life. ๐ค
Watch it in action, with two parts - each was captured in a single take, in real time. โ
#GoogleIO
Jax on TPU is such a lovely contrast to everyone's complaints about Torch on GPU.
Feel like I'm running a Linux webserver in 2004 - this is so much less jank than the market-leading madness, but people haven't yet switched en-masse due to some combination of not knowing that
Had so much fun chatting with my friends
@TrentonBricken
and
@_sholtodouglas
.
No way to summarize it, except:
This is the best context dump out there on how LLMs are trained, what capabilities they're likely to soon have, and what exactly is going on inside them.
You would be
Iโd much rather code around fixed shapes than have to manually deal with scheduling and memory space assignment.
Still, I think partial support of ragged shapes could be the optimal point for JAX.
What are some cases where youโve written off JAX for fixed shape reasons. I bet
tree map! pjit! shmap! pallas call!
You look away for 1 day and the team shaves off another millisecond. They are going to get the model to start predicting the future.
The part about being rusty is humbleness. He fixed a bug in Pallas last week as fast as a Principal Engineer. Also when he first joined the team he spent most of his time listening and asking good questions. Now heโs right ahead of us so often that heโs basically a pre-training
Heard this week:
โSomehow in the past week the MK turned into grad school. I feel like a TA again. Whatโs going on here? Is this a pysop to make me and JAX Matt nostalgic?โ - JAX Roy
โStack more layersโ they said.
โForget about your reproducing kernel Hilbert spacesโ they said.
@rbhar90
@deep_chem
@deepforestsci
JAX is different. If Google didnโt exist, the core team would simply be working on it elsewhere. It grew organically with full founder authority and runs out of GDM instead of the more traditional way of running things from the infrastructure organization.
Analyzing and reasoning about videos
Analyzing videos is another great capability brought by the fact that Gemini models are naturally multimodal, and this becomes even more compelling with long contexts. Consider Gemini 1.5 Proโs ability to analyze movies, like Buster Keatonโs
Bard is becoming Gemini, and weโre launching two new experiences:
1๏ธโฃ Gemini Advanced, which gives you access to Ultra 1.0, our most capable AI model
2๏ธโฃ A new mobile app for easier collaboration on the go
Learn more โ
Introducing MatX: we design hardware tailored for LLMs, to deliver an order of magnitude more computing power so AI labs can make their models an order of magnitude smarter.
Our hardware would make it possible to train GPT-4 and run ChatGPT, but on the budget of a small startup.
T-9 โIntensityโ, taken in the kitchen one year before Atelier Crenn is awarded 3 Michelin Stars for the first time.
First woman to three stars took an exceptional team who believed that matters
@duttakapil
@_sholtodouglas
@GoogleDeepMind
I can think of 2, 3 month periods that were very low points where I made it out thanks to old friends and family. Some sort of community is always necessary imo. One CEO I worked for was particularly helpful in helping me deal with stress and social anxiety in particular.
Iโm not done with MegaBlocks ๐
@apaszke
@epiqueras1
@sharadvikram
and I just dropped something weโve been working on for a bit yesterday.
MegaBlocks + JAX + TPU = MegaBlox ๐ฅ
New research from
@GoogleDeepMind
brings together soccer and robotics. Using reinforcement learning, robots display agile and reactive movements similar to a soccer player, no shin guards needed:)
A big enabler of this is being highly sequential and relying on software pipelining for hiding memory latency instead of the hierarchical many threaded model of GPUs.
TPUs and the abstractions built on top of them are so good that they are almost ahead of their time. Itโs hard to realize this without scaling to hundreds and thousands of devices, but once you do itโs clear parallelism and fusions being declarative and compiler driven are a
One of the best things that ever happened to me was having
@aranibatta
drag me out to the bay to double down on building things with this team. I am always in awe of their consistency, care, product sense, and timeless aesthetics. I canโt wait to use this. On my birthday too!
๐ฅน I hope one day soon I can achieve your focus and calm when things go wrong. Itโs an incredible asset for any team and for life in general. (And your Pickleball serves.)
@_sholtodouglas
and
@epiqueras1
definitely belong on this list. Theyโre the driving force behind the scenes, playing a pivotal role in the success of projects like Gemini and serving as a source of inspiration for all of us. Theyโre highly productive and a delight to work with.
Many of you are excited about H100 attention, so itโs a good time to show you Mosaic GPU: a Python DSL for H100s.
The attention example matches FA3 performance, while being only ~200 lines of Python:
It's easy to install too! Latest JAX packages have it.
@duttakapil
@_sholtodouglas
@GoogleDeepMind
Mostly reading deeply trying to understand one complex thing fully and then digesting 1:1 with a mentor until things clicked. Iโve had OCD and bad anxiety since I was 8ish. I went to a bit of therapy and read about CBT to apply it myself to the point where it rarely impedes me.