Jacob Austin Profile Banner
Jacob Austin Profile
Jacob Austin

@jacobaustin132

3,336
Followers
841
Following
25
Media
422
Statuses

@GoogleDeepMind researcher. Currently making LLMs go fast. AI for math and science. Coding. Gemini. I also play piano. NYC. Opinions my own

New York, NY
Joined March 2017
Don't wanna be here? Send us removal request.
@jacobaustin132
Jacob Austin
1 year
Super super happy to be able to talk about DIDACT, the first code LLM trained to model real software developers editing code, fixing builds, and doing code review end-to-end. Developers don't write code in one go and neither should our models! 1/n
20
213
1K
@jacobaustin132
Jacob Austin
6 months
We've finally put out a detailed IEEE/ACM paper on @Google 's multi-year effort to ease the burden of code review with ML. Google engineers now resolve 7.5% of all code review comments with an ML-suggested edit. But the path to that number has been a fun ML and UX journey!
Tweet media one
14
144
770
@jacobaustin132
Jacob Austin
1 year
Excited to see a blog post on one of the coolest projects I've worked on at Google: using LLMs to automatically resolve code-review comments for Google engineers! 1/n
Tweet media one
9
74
527
@jacobaustin132
Jacob Austin
3 months
This is something I've worked on for a while! You can save the activations of one LLM call and reuse them for a follow-up that overlaps with the first. This means asking a question about a big codebase can take 30 seconds the first time and 1s after that!
@rakyll
Jaana Dogan ヤナ ドガン
3 months
Gemini’s context caching is one of the most exciting releases that came out it of Google I/O.
Tweet media one
7
31
272
14
48
455
@jacobaustin132
Jacob Austin
1 year
the hardest thing about being an AI researcher is having to smell homeless people every morning while munching a tartine croissant outside your $4k house on the way to work
29
3
247
@jacobaustin132
Jacob Austin
3 years
Our new paper! We study how well large language models (244M-137B parameters) can write code, collaborate with humans via dialog (exciting!) and understand/execute the code they write (they don't/can't). TLDR: exciting tech with lots of limitations and room for future work.
5
37
208
@jacobaustin132
Jacob Austin
2 years
@jacobandreas @_jasonwei We found that code models get better when you prompt them with "I'm an expert Python programmer". The new Anthropic paper did something similar, prefixing the model's response with "I’ve tested this function myself so I know that it’s correct:"
5
30
207
@jacobaustin132
Jacob Austin
5 months
@jxmnop Every Google model in recent memory has had a 256k vocab size
2
2
180
@jacobaustin132
Jacob Austin
3 years
Happy to share our work on discrete denoising diffusion models (D3PMs) @NeurIPSConf 2021: . D3PMs are diffusion models for discrete data like text or (quantized) images, and they’re flexible! A thread (with code!) 1/n
Tweet media one
Tweet media two
3
29
177
@jacobaustin132
Jacob Austin
2 months
This may be the most magical new developer tool we've made at Google. Nothing since code completion has felt so seamless to use: devs paste code constantly, and Smart Paste instantly fixes all the little issues: syntax errors, misnamed variables, indentation, and more 1/2
@GoogleAI
Google AI
2 months
Code development often involves frequent copy & pasting of code that must be adjusted for the surrounding context. Here we describe Smart Paste, an internal tool that streamlines the code authoring workflow by automating adjustments to pasted code. More at
20
98
519
6
17
154
@jacobaustin132
Jacob Austin
2 years
Read about our recent work on ML-powered code completion models trained on the @Google codebase. A small but specialized LM trained on extremely high-quality data and backed by static analysis beats much larger models in production.
@DynamicWebPaige
👩‍💻 Paige Bailey
2 years
Learn more about how code completion is transforming the developer experience of internal @Google engineers! 👩‍💻 We measured an acceptance rate of 25-34% on >3% of production code, while reducing the coding iteration time by 6% (equating to hundreds of years of SWE hours saved).
4
35
226
2
14
138
@jacobaustin132
Jacob Austin
1 year
GPT-4 makes big gains on coding (e.g. 48% -> 67% on HumanEval) but it's still a long way from 100% pass @1 , not to mention writing a 1000-line program from scratch. GPT-4 shows that scale won't solve everything. Models need to write and debug code iteratively, like humans do
14
10
107
@jacobaustin132
Jacob Austin
4 months
Gemini 1.5 Pro is widely available now. Long context is great but it's also just a great model, better than GPT-4 on most of our metrics. And it's free!
@JeffDean
Jeff Dean (@🏡)
4 months
We're starting to roll out API support for Gemini 1.5 Pro for developers. We're excited to see what you build with the 1M token context window! We'll be onboarding people to the API slowly at first, and then we'll ramp it up. In the meantime, developers can try out Gemini 1.5
96
396
2K
8
3
106
@jacobaustin132
Jacob Austin
1 year
Full details are in our blog post here: . This was the culmination of years of work from @dtarlow2 , Petros Maniatis, and a bunch of colleagues across Google. Please take a look!
3
8
105
@jacobaustin132
Jacob Austin
3 months
I won’t be at ICLR this year, but it’s the 200th anniversary of the premier of Beethoven’s 9th in Vienna and you should go! The Wiener Philharmonic and many other symphonies have concerts!
7
7
93
@jacobaustin132
Jacob Austin
2 months
The Blueshift team has done awesome work pushing Hendryck's MATH above 90%. MATH isn't the hardest dataset in the world but it's surprisingly tricky: some problems take me 5-10 minutes to solve. Getting an LLM to solve more than 90% feels meaningful. Try one yourself!
@bneyshabur
Behnam Neyshabur
2 months
I'm excited about this! Our team has been working really hard to improve Gemini 1.5 capabilities significantly on multiple fronts and in particular MATH/STEM! Please see the report here:
9
18
168
1
7
74
@jacobaustin132
Jacob Austin
1 year
Very proud to launch coding for Bard! The model is actually pretty good, try it out!
@ThomasOrTK
Thomas Kurian
1 year
New capabilities in Bard will help programmers and software developers with code generation, debugging and code explanation. It’s an exciting next step in how generative AI can accelerate innovation across industries.
24
94
535
3
10
73
@jacobaustin132
Jacob Austin
2 years
@RichardMCNgo I find many of these questions exhausting. I don't want to psychoanalyze what about me surprises people to a stranger at 3AM after a few beers. Ask me 1:1 when it's appropriate.
1
1
72
@jacobaustin132
Jacob Austin
3 months
One thing I'm proud of is how Google's gen media team has prioritized building tools for artists rather than text-to-X tools. GenAI can either replace or augment people, let's do the latter!
@GoogleDeepMind
Google DeepMind
3 months
We put our cutting-edge video generation model Veo in the hands of filmmaker @DonaldGlover and his creative studio, Gilga. Let’s take a look. ↓ #GoogleIO
34
133
667
2
6
67
@jacobaustin132
Jacob Austin
3 months
FWIW I think this is how you make long-context economical. Long queries aren't all unique, they typically share the same source documents. Low latency, low cost full repo completion can reuse the same KV caches
3
1
61
@jacobaustin132
Jacob Austin
1 year
Please note that the doctors’ responses come from…Reddit
@mdredze
Mark Dredze
1 year
New study! We compared ChatGPT responses to people's medical questions with those of doctors. Healthcare professionals preferred ChatGPT 79% of the time; as more empathetic and higher quality. I'm excited to figure out how to use LLMs to help doctors!
23
135
573
4
1
52
@jacobaustin132
Jacob Austin
5 months
Most LLM evals are leaked. A decent heuristic is to ignore reported numbers on evals over a year old
1
4
44
@jacobaustin132
Jacob Austin
1 year
PaLM 2 is really good. Like surprisingly good. And it’s exciting to see it rolling out across a wide array of Google products
@DynamicWebPaige
👩‍💻 Paige Bailey
1 year
*cracks knuckles* and thus, we begin the "🌴PaLM v2" drinking game (but with coffee, tea, or your favorite caffeinated beverage of choice, as it's early! 😉) #GoogleIO2023 #GoogleIO
7
30
195
0
2
46
@jacobaustin132
Jacob Austin
1 year
Codex-style LLMs are trained on static code snapshots (GitHub files at HEAD) without history or context from the developer's environment (like their IDE or build system). We're throwing away all the data of how the software was built, and why! 2/n
1
2
45
@jacobaustin132
Jacob Austin
2 years
UL2 is a new training objective with big implications for LLM training. UL2 combines the span corruption objective that gives T5 its exceptional finetuning ability with causal and prefix-LM objectives which let UL2-trained LLMs outperform purely-causal LMs on few-shot tasks
@GoogleAI
Google AI
2 years
Introducing UL2, a novel language pre-training paradigm that improves performance of language models across datasets and setups by using a mixture of training objectives, each with different configurations. Read more and grab model checkpoints at
18
170
716
1
10
44
@jacobaustin132
Jacob Austin
1 year
Google developers work in a monorepo and build errors, test failures, code review comments, and resulting edits are all tracked. DIDACT models are trained on this data to build software iteratively *based on the history of a dev's work so far!* 3/n
1
2
42
@jacobaustin132
Jacob Austin
1 year
There's so much hype around "LLMs as agents" and when building LLMs for software, i think that's exactly the right approach. Our LLMs can build software like humans, iteratively and using developer tools, and be immediately useful for real developers! 5/n
1
1
43
@jacobaustin132
Jacob Austin
1 year
DIDACT powers a ton of cool dev tools, like our recently announced ML-powered code review tool and a bunch of others, like a tool to fix build errors, predict code review comments, and do GitHub Copilot-style completion conditioned on _your_ development history! 4/n
Tweet media one
1
1
42
@jacobaustin132
Jacob Austin
1 year
@EigenGender This is absolutely not true. They could test the explosive design, the subcritical assembly, the gun design. They could detonate the explosives and watch fast X-ray data. And then they had the trinity test
1
1
40
@jacobaustin132
Jacob Austin
3 months
Penzai is one of the coolest ML libraries out there. Not only can you inspect every weight matrix and attention head in a Colab, you can trivially knock out heads, skip or repeat layers, or extract intermediates with a one line change. A beautiful tool for interpretability.
@_ddjohnson
Daniel Johnson @ ICML
3 months
Excited to share Penzai, a JAX research toolkit from @GoogleDeepMind for building, editing, and visualizing neural networks! Penzai makes it easy to see model internals and lets you inject custom logic anywhere. Check it out on GitHub:
42
424
2K
0
7
39
@jacobaustin132
Jacob Austin
1 year
@andrew_n_carr CUDA and the collective decades spent installing drivers
0
1
38
@jacobaustin132
Jacob Austin
2 years
@denny_zhou If true, this highlights one of the complexities of the half-open OpenAI/GPT-3 ecosystem. I'm a fan of the API, but it's v hard to know what DaVinci-002 is, whether it had a given eval set in its training data, etc.
2
2
39
@jacobaustin132
Jacob Austin
1 year
Code LLMs are everywhere, but making them useful to real developers is hard. We trained an LLM on data from _real_ Google developers: fixing builds, performing code review, and editing files, then deploy it within the code-review UI! 2/n
Tweet media one
1
5
37
@jacobaustin132
Jacob Austin
3 months
More work from Google on AI for SWE, here automatically fixing build errors! The cool thing about fixing builds is you can check if the build succeeds before showing the user the fix. Results in a measurable shortening of code submission time too!
@xennygrimmato_
Vaibhav Tulsyan
3 months
Excited to share a new blog on ML-based repair for build errors at Google! We found that automatically repairing build errors in the IDE increases productivity as measured by overall task completion with no detectable negative impact on code safety!
6
27
132
0
7
36
@jacobaustin132
Jacob Austin
2 years
Hiking in the shadow of the eastern Sierras, it feels like another world. What a high.
Tweet media one
Tweet media two
Tweet media three
3
0
34
@jacobaustin132
Jacob Austin
1 year
Google is in the game! A lot of hard work is going into building an exciting, helpful, and responsible new generation of LLM-based tools at Google
@sundarpichai
Sundar Pichai
1 year
1/ In 2021, we shared next-gen language + conversation capabilities powered by our Language Model for Dialogue Applications (LaMDA). Coming soon: Bard, a new experimental conversational #GoogleAI service powered by LaMDA.
741
3K
15K
0
0
31
@jacobaustin132
Jacob Austin
1 year
Happy to share our work on multilingual evals for code LLMs, led by @GOrlanski . We open-source BabelCode, a framework for running execution-based coding evals across >10 languages (including Rust and Julia) and study the effect of language balancing on low-resource languages 1/2
@GOrlanski
Gabe Orlanski
1 year
📢Measuring The Impact Of Programming Language Distribution We present the BabelCode framework for multi-lingual code evaluation and an investigation into the impact of PL distributions in training data. Paper: Code: 🧵
Tweet media one
2
11
63
1
6
30
@jacobaustin132
Jacob Austin
6 months
A couple lessons from this: * IDE wars are coming. Collecting data in the same dev environment you deploy in is a huge advantage. * LLMs make great demos but it's hard to trust them at complex tasks. Reviewing code is harder than writing it. High-precision, low-recall is OK!
1
2
30
@jacobaustin132
Jacob Austin
1 year
A huge amount of credit goes to the UX team for helping us make model edits understandable, so developers can audit the code that's being changed. Model calibration also becomes surprisingly – building developer trust by only showing highly confident predictions
Tweet media one
1
1
27
@jacobaustin132
Jacob Austin
1 year
i found Oppenheimer, like most of Christopher Nolan’s movies, lacking in emotional resonance. Nolan seems to make films about concepts that interest him (time, space, a biography he just read), without worrying about their relevance to the present moment
6
0
26
@jacobaustin132
Jacob Austin
1 year
@_jasonwei Cost is an important drawback: generalist models will always be outperformed by smaller task-specific models when cost and latency are factored in, except for tasks only the largest models can do. With that said, distillation is likely to play a role
1
1
25
@jacobaustin132
Jacob Austin
3 months
2290 tons of CO2 is a lot, but it's also roughly...38 flights from NYC to London on a 737. More CO2 was probably emitted by Meta employees flying back and forth during model development
@SashaMTL
Sasha Luccioni, PhD 🦋🌎✨🤗
3 months
So LLaMa 3's carbon footprint is... huge? 🤯 They estimate it to be 2,290 tons of CO2eq, compared to 550t for training GPT-3 and 66t for training *all* of the BLOOM models (1B-176B) 🌬️
Tweet media one
Tweet media two
48
61
277
1
0
23
@jacobaustin132
Jacob Austin
2 years
Please consider joining the Blueshift Team! They're wonderful people doing amazing work on reasoning, AI for science, and more
@bneyshabur
Behnam Neyshabur
2 years
Interested in Reasoning with Large Language Models? We are hiring! Internship: Full-Time Research Scientist: Full-Time Research Engineer: Learn more about Blueshift Team:
6
18
170
0
1
23
@jacobaustin132
Jacob Austin
2 years
@amasad The next generation of code LLMs will exhaust the code available at GitHub HEAD. The amount of diff data is several orders of magnitude larger
0
0
20
@jacobaustin132
Jacob Austin
2 years
Returning from NeurIPS, I flew an hour the wrong way to Fort Worth, and then missed my flight to NYC. Now I get to experience the cozy embrace of this hard airport floor
5
0
22
@jacobaustin132
Jacob Austin
1 year
the people I trust most are loudly and persistently expressing doubt about their beliefs and actions
1
1
22
@jacobaustin132
Jacob Austin
8 months
Gemini is here and it’s actually pretty decent!
@demishassabis
Demis Hassabis
8 months
The Gemini era is here. Thrilled to launch Gemini 1.0, our most capable & general AI model. Built to be natively multimodal, it can understand many types of info. Efficient & flexible, it comes in 3 sizes each best-in-class & optimized for different uses
Tweet media one
409
2K
11K
0
0
20
@jacobaustin132
Jacob Austin
6 months
You can find the paper here: . I think it's an awesome case study in applied LLM deployment. Huge shoutout to Peter Choy, Alex Frömmgen, @lerakharatyan , @gssurita , Kevin Villela, @dtarlow2 , Maxim Tabachnyk, really too many people to list!
2
2
20
@jacobaustin132
Jacob Austin
1 year
Bard is alive. Try it out!
@JeffDean
Jeff Dean (@🏡)
1 year
Bard is now available in the US and UK, w/more countries to come. It’s great to see early @GoogleAI work reflected in it—advances in sequence learning, large neural nets, Transformers, responsible AI techniques, dialog systems & more. You can try it at
32
121
745
0
0
18
@jacobaustin132
Jacob Austin
1 year
@TheXeophon Yes, we have a DSL that decomposes the process of writing a PR into actions like "<run build [target]>" or "<make edit [location] [diff]>". The goal is to represent any action a developer could take as a small, local change, instead of making the LLM somehow output a big file
1
0
18
@jacobaustin132
Jacob Austin
1 year
To be clear, I don't mean the "scale won't solve everything" line as a criticism of scaling. I just find it implausible that LLMs can solve arbitrary problems without decomposing them or adapting to feedback from an environment
1
0
18
@jacobaustin132
Jacob Austin
1 year
Speaking from personal experience, the code completion feature in Colab is magical!
@GoogleColab
Colaboratory
1 year
Your new coding assistant is almost here! Check out these new Colab features: natural language to code generation, code completion, and an integrated chatbot. Read all about at authored by @thechrisperry and @shresbm
11
97
453
0
2
18
@jacobaustin132
Jacob Austin
2 years
@DrJimFan Big +1 here. The model is implicitly trained on a mixture of p(answer | evidence) and p(answer), so it interpolates between memorizing and looking for answers in-context (see )
1
1
15
@jacobaustin132
Jacob Austin
1 month
@nearcyan @arpitingle this isn’t really true, Noam and Daniel intended from the beginning to “solve loneliness”
3
0
16
@jacobaustin132
Jacob Austin
2 years
@lauralondon_ @moultano Desalination plants can't prevent flooding when sea-levels rise several meters due to Antarctic ice sheets melting. Burying power lines will reduce wildfire frequency at massive cost, but it won't stop them when rising temperatures lead to ever more arid conditions.
8
0
14
@jacobaustin132
Jacob Austin
1 year
it’s frightening walking around Williamsburg hearing tech grifters talk about their “AI for media” startups. it feels better to work upstream of that, on core tech, but it’s not obvious if my hands are cleaner
2
0
15
@jacobaustin132
Jacob Austin
1 year
@natfriedman Is this toolformer? Toolformer seems specifically about using prompting + log-likelihood based filtering to enable tool use. The idea of tool use in this form has been around for years
0
0
15
@jacobaustin132
Jacob Austin
1 year
Another aspect of this work to note: it (partly) solves the "specification" problem of program synthesis: how do we tell the computer what code we want it to write? TLDR: rather than tell a model what to do, let it learn from context what you'll want to do next. A thread 1/n
@dtarlow2
Danny Tarlow
1 year
Very happy to share our work on activating Google's software dev process as an engine for ML-powered dev tools. A multi-year effort from many across Alphabet. Special shout-out to @jacobaustin132 @blip42 @PManzagol @dancherp & Petros Maniatis. See Jacob's🧵& the blog for more.
0
6
39
1
1
15
@jacobaustin132
Jacob Austin
2 months
Smart Paste highlights the core UX challenge of AI for SWEs. The more context switching is required to verify a suggestion, the less useful it is. Tools like code completion and Smart Paste that make suggestions at the cursor and are instantly verifiable are the easiest to adopt
0
0
14
@jacobaustin132
Jacob Austin
11 months
@_jasonwei Character can make money without "getting something right". As you point out, exploiting loneliness/insecurity is lucrative. The fact that shamelessly monetizes a desire for connection (where OAI/Anthro refused) speaks badly, ironically, of their character
2
0
14
@jacobaustin132
Jacob Austin
6 months
We first talked about this project in mid-2022 in a @GoogleAI blog post (here's a thread at the time: ), but this paper talks in much more detail about the model and the design process we went through.
@jacobaustin132
Jacob Austin
1 year
Excited to see a blog post on one of the coolest projects I've worked on at Google: using LLMs to automatically resolve code-review comments for Google engineers! 1/n
Tweet media one
9
74
527
1
0
14
@jacobaustin132
Jacob Austin
1 year
I loved people like Anthony Bourdain for this reason. You can see him grappling with both the beauty and horror of his life and his art I wish the AI world had more of this. We cannot know if what we make is good, no matter how well-intentioned we are
0
0
13
@jacobaustin132
Jacob Austin
8 months
To grad school applicants: the single best advice I got was that you’re generally admitted by a single faculty member who’ll bet on you, not by the department. Pick a few people and target your application to them
1
0
13
@jacobaustin132
Jacob Austin
11 months
@docmilanfar @jaschasd Strongly agree, I still find this one of the clearest explanations of dynamical systems and stochastic processes, it's quite a joy to read
1
1
12
@jacobaustin132
Jacob Austin
1 year
Please stop. Naive techno-optimism and American chauvinism really aren’t a good combo
@alexandr_wang
Alexandr Wang
1 year
Today, @scale_AI is launching our 2 major platforms to bolster government and enterprise: 🎖 Scale Donovan, the AI copilot for defense 🏙 Scale EGP, full-stack generative AI for global enterprise 👇 See Donovan in action below 🧵 on our platforms and why they are so critical
37
61
391
0
0
12
@jacobaustin132
Jacob Austin
2 years
@urialon1 @_jasonwei Reminds me of the Python-GSM8K results from the PaLM paper or MathQA-Python. Cool to see that intermediate natural language instructions are helpful!
Tweet media one
0
0
12
@jacobaustin132
Jacob Austin
2 months
I think rather soon, these models will be helpful for scientists and mathematicians. An LLM doesn't have to do super advanced math to be useful, there's value (to me at least) in instantly proving little lemmas that help keep you in a flow state. More to come!
2
1
11
@jacobaustin132
Jacob Austin
3 months
i see a lot of people calling this "goodharting" but it's sort of not goodharting, it's just leaking the test set esp. as existing evals are translated into more languages, removing them becomes increasingly hard
@summeryue0
Summer Yue
3 months
How much do LLMs overfit public benchmarks? Our team at @scale_ai SEAL lab studied this by creating a GSM8k-equivalent eval from scratch. The resulting performance gap reveals data contamination in some model families, while GPT, Claude, and Gemini show no signs of overfitting.
Tweet media one
8
17
124
2
0
12
@jacobaustin132
Jacob Austin
1 year
Imagine having gone to college and thinking the highest earning graduates were the best
@cremieuxrecueil
Crémieux
1 year
Which university has the best graduates? A new paper using an earnings-based measure of graduate quality (qⱼ) provided the answer: the top of the list is dominated by Indian universities. What about Harvard? Rank #26 .
Tweet media one
68
112
768
1
0
11
@jacobaustin132
Jacob Austin
3 years
Lots more in the paper: . And a huge shoutout to my collaborators: @gstsdn , @Maxwell_Nye , @quocleix , @RandomlyWalking , @EllenJiang2 , @dmdohan , @Carryveggies , Michael Terry, @hmichalewski , and @MaartenBosma !
0
2
11
@jacobaustin132
Jacob Austin
5 months
Two weeks in London and I managed to make it to Wigmore Hall twice, for @jeremydenk playing the Bach Partitas and tonight for the Handel Players. Wigmore Hall is special, like the 92Y in NYC: small, with fantastic acoustics, intimate in the best sense.
@Geoff_Andrew
Geoff Andrew
5 months
The culmination of this week's musical mini-binge – 2nd concert today @wigmore_hall – felt somehow fitting after so much marvellous stuff both old and very new: @jeremydenk performing (the entire session from memory!) all Bach's Partitas. Magic.
Tweet media one
0
3
11
0
0
8
@jacobaustin132
Jacob Austin
2 years
@gallowspost @francoisfleuret @OpenAI To be clear, they've intentionally never confirmed the 175B number publicly
1
0
10
@jacobaustin132
Jacob Austin
6 months
Our first model had a bunch of bad habits: it made low-confidence suggestions, addressed unrelated issues, and wasn't very visible to the change author. To improve, we improved data quality, filtered for single comment reviews, filtered by confidence, and added synthetic data.
Tweet media one
1
0
10
@jacobaustin132
Jacob Austin
2 years
@amanrsanger Training smaller models on a single language alone (e.g. Python-only) can match the performance of Codex at smaller sizes on single language evals. The open source world can't match Codex without huge investment, but there are shortcuts!
0
0
10
@jacobaustin132
Jacob Austin
6 months
Our first version was a lightly-finetuned version of Google's software engineering foundation model DIDACT, and made very plausible suggestions. But people didn't trust it: there's a big difference between a plausible edit and what the developer really wants
Tweet media one
1
0
9
@jacobaustin132
Jacob Austin
1 year
Super excited to see a new company from the incredible @reinerpope !
@reinerpope
Reiner Pope
1 year
I’m excited to announce our new company, MatX, started with @MikeGunter_ . We want to make AI better, faster, and cheaper by building more powerful hardware. Read on for a short introduction, or see our full announcement here: .
24
35
388
0
0
9
@jacobaustin132
Jacob Austin
1 year
@Ted_Underwood Over-training + instruction-tuning. As @moultano says, OpenAI can e.g. train a 12B model for 10x the "Chinchilla-optimal" compute budget and end up with the same loss as a 10x larger model trained for less time 1/2
1
1
8
@jacobaustin132
Jacob Austin
11 months
I’m back in my hometown of Portland, Maine for the month. Hit me up if you’re around and want to hike, climb, or grab a beer!
0
0
9
@jacobaustin132
Jacob Austin
6 months
All-in-all, we ended up improving user trust in the model and addressing around 7.5% of all code review comments at Google with an ML-edit. All while keeping precision high (usually around 50%) to avoid wasting engineering time!
Tweet media one
1
1
9
@jacobaustin132
Jacob Austin
2 years
The lesson of language models for me is that noise generation with language is painfully easy. You have to look at what you write and say “does this say anything new? Could GPT-3 have written this?”
0
0
9
@jacobaustin132
Jacob Austin
1 year
@DynamicWebPaige @DavidSacks Fulfilling a request (in this case, to write a slogan) isn’t necessarily at odds with political neutrality in its own answers?
0
0
8
@jacobaustin132
Jacob Austin
2 years
As scaling LLMs becomes harder, performance gains come more and more from clever prompting, bootstrapping, and chaining multiple LLMs together. Cascades is a PPL that makes inference & optimization on chained language models easy!
@dmdohan
David Dohan
2 years
Happy to release our work on Language Model Cascades. Read on to learn how we can unify existing methods for interacting models (scratchpad/chain of thought, verifiers, tool-use, …) in the language of probabilistic programming. paper:
Tweet media one
3
97
668
1
0
9
@jacobaustin132
Jacob Austin
1 year
@hcky479 @NYCMayor Let's get rid of cars in the city!
0
0
9
@jacobaustin132
Jacob Austin
1 year
idea: add a carbon offset option to every gas station and airline checkout webpage. even if only 1% of people pay an extra 20% on gas, it creates a market for carbon offsets and puts the idea front and center in people’s minds
4
0
8
@jacobaustin132
Jacob Austin
3 months
@jxmnop @polynoamial I think phd students have a pretty great opportunity to publish general-purpose ideas that industry can't publish right now: write a great paper on data selection, length generalization, self-improvement, RL, etc. and include clear scaling laws up to 1B and everyone'll love it
1
0
8
@jacobaustin132
Jacob Austin
1 year
@ben11kehoe @forrestbrazeal For now, it's up to the author to approve the change, and yes, then the reviewer needs to re-approve (which they'd normally have to do anyway after the author addressed a comment). We're working on the "pre-approve" UX now, so they can flag that the ML edit is right
1
0
8
@jacobaustin132
Jacob Austin
2 years
@_jasonwei Why is emergence a useful thing to think about? Is there reason to think "emergence" is anything more than "log likelihood dropping below some critical threshold" (i.e. a function of model quality, not of size or compute)?
2
0
8
@jacobaustin132
Jacob Austin
2 years
UI design is feature engineering for humans
1
0
8
@jacobaustin132
Jacob Austin
3 months
@jxmnop To be clear, this isn't "compute optimal" in any sense, but it might be hella useful
4
0
8
@jacobaustin132
Jacob Austin
2 years
@RichardMCNgo A counterargument (which you've made yourself) is that optimal strategies in primitive or partially-observed environments may not be optimal today, e.g. avoiding pork because of disease or stoning women for adultery in a society that functions without monogamy.
1
0
8
@jacobaustin132
Jacob Austin
2 years
@nearcyan The alignment crowd has tried to push the term as broadly as possible. Now they reap the rewards. But LLMs are far more likely to harm society by undermining our notions of truth and creativity than by killing us all
0
0
8
@jacobaustin132
Jacob Austin
6 months
Here's what the UI looks like for the reviewer. The ML suggested edit auto-updates in the code review UI as the reviewer is typing, and they can try to more clearly specify their intent in the comment to guide the ML model!
Tweet media one
1
1
8
@jacobaustin132
Jacob Austin
2 years
@andy_matuschak Having taken piano lessons for 15 years, I think it's just be because it's hard to fit 15 pianos in a room and impossible for them to play at the same time. We do group music lessons for elementary students, but it's mostly chaos. At least you can do math silently
1
0
8
@jacobaustin132
Jacob Austin
2 months
@denny_zhou @lmthang That’s not quite true, it’s finetuned to do better at math and coding, not for this eval specifically
0
0
8
@jacobaustin132
Jacob Austin
2 months
1/n on classical music yesterday i heard the Pavel Haas quartet playing the Brahms A Major piano quartet at Wigmore Hall with Boris Giltburg. it's the second of Brahms' 3 piano quartets and my favorite. it's tragic and warm, rich, very full, like a Mahler symphony
1
0
7
@jacobaustin132
Jacob Austin
1 year
@Miles_Brundage @kipperrii @typedfemale not a dig at OAI, fwiw. just at the drumbeat of self-righteous twitter posts about how hard poverty makes it to enjoy wealth in SF
0
0
6
@jacobaustin132
Jacob Austin
2 months
@thekathanpatel Not via API but if you ask a follow-up question it should answer much faster
1
1
7
@jacobaustin132
Jacob Austin
4 years
@AnimaAnandkumar @OpenAI @Microsoft @Twitter Not to mention “exclusive licensing” is hardly “open”...
0
0
7