Jacob Austin @jacobaustin132 Twitter profile

Last Seen Profiles

@BardCommune

@SportsMomEtc

@AsSa66986762

@susan_macdougal

@TheTacoTribe

@Kolobok_Russia

@motso_modise

@ri_n_ur

@TragicTweeting

@toi698491

@Ex_sainty

@0achasan

@hedonistmateo

@styleslookbook

@sethalxe

@KarymKram2156

@JonMoon97

@WeandTFN

@peter_seligmann

@FunnyJim1965

@yalapublic

@_narendraaa

@eversstaatop

@vitahanamin

@Taq_Jafar

@vanniloner

@tvtimesmagazine

@hachaaan0511

@gainey_ed

@sho_molth

@eliquorice

@HaillsssHilll

@AbuAlsaudSSB

@Ash_Espoir

@madridaluche2

@sushidatehrry

Jacob Austin

@jacobaustin132

1 year

Super super happy to be able to talk about DIDACT, the first code LLM trained to model real software developers editing code, fixing builds, and doing code review end-to-end. Developers don't write code in one go and neither should our models! 1/n

20

213

1K

Jacob Austin

@jacobaustin132

6 months

We've finally put out a detailed IEEE/ACM paper on @Google 's multi-year effort to ease the burden of code review with ML. Google engineers now resolve 7.5% of all code review comments with an ML-suggested edit. But the path to that number has been a fun ML and UX journey!

14

144

770

Jacob Austin

@jacobaustin132

1 year

Excited to see a blog post on one of the coolest projects I've worked on at Google: using LLMs to automatically resolve code-review comments for Google engineers! 1/n

9

74

527

Jacob Austin

@jacobaustin132

3 months

This is something I've worked on for a while! You can save the activations of one LLM call and reuse them for a follow-up that overlaps with the first. This means asking a question about a big codebase can take 30 seconds the first time and 1s after that!

Jaana Dogan ヤナドガン

@rakyll

3 months

Gemini’s context caching is one of the most exciting releases that came out it of Google I/O.

7

31

272

14

48

455

Jacob Austin

@jacobaustin132

1 year

the hardest thing about being an AI researcher is having to smell homeless people every morning while munching a tartine croissant outside your $4k house on the way to work

29

3

247

Jacob Austin

@jacobaustin132

3 years

Our new paper! We study how well large language models (244M-137B parameters) can write code, collaborate with humans via dialog (exciting!) and understand/execute the code they write (they don't/can't). TLDR: exciting tech with lots of limitations and room for future work.

5

37

208

Jacob Austin

@jacobaustin132

2 years

@jacobandreas @_jasonwei We found that code models get better when you prompt them with "I'm an expert Python programmer". The new Anthropic paper did something similar, prefixing the model's response with "I’ve tested this function myself so I know that it’s correct:"

5

30

207

Jacob Austin

@jacobaustin132

5 months

@jxmnop Every Google model in recent memory has had a 256k vocab size

2

180

Jacob Austin

@jacobaustin132

3 years

Happy to share our work on discrete denoising diffusion models (D3PMs) @NeurIPSConf 2021: . D3PMs are diffusion models for discrete data like text or (quantized) images, and they’re flexible! A thread (with code!) 1/n

3

29

177

Jacob Austin

@jacobaustin132

2 months

This may be the most magical new developer tool we've made at Google. Nothing since code completion has felt so seamless to use: devs paste code constantly, and Smart Paste instantly fixes all the little issues: syntax errors, misnamed variables, indentation, and more 1/2

Google AI

@GoogleAI

2 months

Code development often involves frequent copy & pasting of code that must be adjusted for the surrounding context. Here we describe Smart Paste, an internal tool that streamlines the code authoring workflow by automating adjustments to pasted code. More at

20

98

519

6

17

154

Jacob Austin

@jacobaustin132

2 years

Read about our recent work on ML-powered code completion models trained on the @Google codebase. A small but specialized LM trained on extremely high-quality data and backed by static analysis beats much larger models in production.

👩‍💻 Paige Bailey

@DynamicWebPaige

2 years

Learn more about how code completion is transforming the developer experience of internal @Google engineers! 👩‍💻 We measured an acceptance rate of 25-34% on >3% of production code, while reducing the coding iteration time by 6% (equating to hundreds of years of SWE hours saved).

4

35

226

2

14

138

Jacob Austin

@jacobaustin132

1 year

GPT-4 makes big gains on coding (e.g. 48% -> 67% on HumanEval) but it's still a long way from 100% pass @1 , not to mention writing a 1000-line program from scratch. GPT-4 shows that scale won't solve everything. Models need to write and debug code iteratively, like humans do

14

10

107

Jacob Austin

@jacobaustin132

4 months

Gemini 1.5 Pro is widely available now. Long context is great but it's also just a great model, better than GPT-4 on most of our metrics. And it's free!

Jeff Dean (@🏡)

@JeffDean

4 months

We're starting to roll out API support for Gemini 1.5 Pro for developers. We're excited to see what you build with the 1M token context window! We'll be onboarding people to the API slowly at first, and then we'll ramp it up. In the meantime, developers can try out Gemini 1.5

96

396

2K

8

3

106

Jacob Austin

@jacobaustin132

1 year

Full details are in our blog post here: . This was the culmination of years of work from @dtarlow2 , Petros Maniatis, and a bunch of colleagues across Google. Please take a look!

Large sequence models for software development activities

Posted by Petros Maniatis and Daniel Tarlow, Research Scientists, Google Software isn’t created in one dramatic step. It improves bit by bit, one l...

research.google

3

8

105

Jacob Austin

@jacobaustin132

3 months

I won’t be at ICLR this year, but it’s the 200th anniversary of the premier of Beethoven’s 9th in Vienna and you should go! The Wiener Philharmonic and many other symphonies have concerts!

7

93

Jacob Austin

@jacobaustin132

2 months

The Blueshift team has done awesome work pushing Hendryck's MATH above 90%. MATH isn't the hardest dataset in the world but it's surprisingly tricky: some problems take me 5-10 minutes to solve. Getting an LLM to solve more than 90% feels meaningful. Try one yourself!

Behnam Neyshabur

@bneyshabur

2 months

I'm excited about this! Our team has been working really hard to improve Gemini 1.5 capabilities significantly on multiple fronts and in particular MATH/STEM! Please see the report here:

9

18

168

1

7

74

Jacob Austin

@jacobaustin132

1 year

Very proud to launch coding for Bard! The model is actually pretty good, try it out!

Thomas Kurian

@ThomasOrTK

1 year

New capabilities in Bard will help programmers and software developers with code generation, debugging and code explanation. It’s an exciting next step in how generative AI can accelerate innovation across industries.

24

94

535

3

10

73

Jacob Austin

@jacobaustin132

2 years

@RichardMCNgo I find many of these questions exhausting. I don't want to psychoanalyze what about me surprises people to a stranger at 3AM after a few beers. Ask me 1:1 when it's appropriate.

1

72

Jacob Austin

@jacobaustin132

3 months

One thing I'm proud of is how Google's gen media team has prioritized building tools for artists rather than text-to-X tools. GenAI can either replace or augment people, let's do the latter!

Google DeepMind

@GoogleDeepMind

3 months

We put our cutting-edge video generation model Veo in the hands of filmmaker @DonaldGlover and his creative studio, Gilga. Let’s take a look. ↓ #GoogleIO

34

133

667

2

6

67

Jacob Austin

@jacobaustin132

3 months

FWIW I think this is how you make long-context economical. Long queries aren't all unique, they typically share the same source documents. Low latency, low cost full repo completion can reuse the same KV caches

3

1

61

Jacob Austin

@jacobaustin132

1 year

Please note that the doctors’ responses come from…Reddit

Mark Dredze

@mdredze

1 year

New study! We compared ChatGPT responses to people's medical questions with those of doctors. Healthcare professionals preferred ChatGPT 79% of the time; as more empathetic and higher quality. I'm excited to figure out how to use LLMs to help doctors!

23

135

573

4

1

52

Jacob Austin

@jacobaustin132

5 months

Most LLM evals are leaked. A decent heuristic is to ignore reported numbers on evals over a year old

1

4

44

Jacob Austin

@jacobaustin132

1 year

PaLM 2 is really good. Like surprisingly good. And it’s exciting to see it rolling out across a wide array of Google products

👩‍💻 Paige Bailey

@DynamicWebPaige

1 year

*cracks knuckles* and thus, we begin the "🌴PaLM v2" drinking game (but with coffee, tea, or your favorite caffeinated beverage of choice, as it's early! 😉) #GoogleIO2023 #GoogleIO

7

30

195

0

2

46

Jacob Austin

@jacobaustin132

1 year

Codex-style LLMs are trained on static code snapshots (GitHub files at HEAD) without history or context from the developer's environment (like their IDE or build system). We're throwing away all the data of how the software was built, and why! 2/n

1

2

45

Jacob Austin

@jacobaustin132

2 years

UL2 is a new training objective with big implications for LLM training. UL2 combines the span corruption objective that gives T5 its exceptional finetuning ability with causal and prefix-LM objectives which let UL2-trained LLMs outperform purely-causal LMs on few-shot tasks

Google AI

@GoogleAI

2 years

Introducing UL2, a novel language pre-training paradigm that improves performance of language models across datasets and setups by using a mixture of training objectives, each with different configurations. Read more and grab model checkpoints at

18

170

716

1

10

44

Jacob Austin

@jacobaustin132

1 year

Full details can be found here: . Huge thanks to Peter Choy, Alex Frömmgen, @lerakharatyan , and a ton of amazing collaborators across Google!

Resolving code review comments with ML

Posted by Alexander Frömmgen, Staff Software Engineer, and Lera Kharatyan, Senior Software Engineer, Core Systems & Experiences Code-change rev...

research.google

0

2

43

Jacob Austin

@jacobaustin132

1 year

Google developers work in a monorepo and build errors, test failures, code review comments, and resulting edits are all tracked. DIDACT models are trained on this data to build software iteratively *based on the history of a dev's work so far!* 3/n

1

2

42

Jacob Austin

@jacobaustin132

1 year

There's so much hype around "LLMs as agents" and when building LLMs for software, i think that's exactly the right approach. Our LLMs can build software like humans, iteratively and using developer tools, and be immediately useful for real developers! 5/n

1

43

Jacob Austin

@jacobaustin132

1 year

DIDACT powers a ton of cool dev tools, like our recently announced ML-powered code review tool and a bunch of others, like a tool to fix build errors, predict code review comments, and do GitHub Copilot-style completion conditioned on _your_ development history! 4/n

1

42

Jacob Austin

@jacobaustin132

1 year

@EigenGender This is absolutely not true. They could test the explosive design, the subcritical assembly, the gun design. They could detonate the explosives and watch fast X-ray data. And then they had the trinity test

1

40

Jacob Austin

@jacobaustin132

3 months

Penzai is one of the coolest ML libraries out there. Not only can you inspect every weight matrix and attention head in a Colab, you can trivially knock out heads, skip or repeat layers, or extract intermediates with a one line change. A beautiful tool for interpretability.

Daniel Johnson @ ICML

@_ddjohnson

3 months

Excited to share Penzai, a JAX research toolkit from @GoogleDeepMind for building, editing, and visualizing neural networks! Penzai makes it easy to see model internals and lets you inject custom logic anywhere. Check it out on GitHub:

42

424

2K

0

7

39

Jacob Austin

@jacobaustin132

1 year

@andrew_n_carr CUDA and the collective decades spent installing drivers

0

1

38

Jacob Austin

@jacobaustin132

2 years

@denny_zhou If true, this highlights one of the complexities of the half-open OpenAI/GPT-3 ecosystem. I'm a fan of the API, but it's v hard to know what DaVinci-002 is, whether it had a given eval set in its training data, etc.

2

39

Jacob Austin

@jacobaustin132

1 year

Code LLMs are everywhere, but making them useful to real developers is hard. We trained an LLM on data from _real_ Google developers: fixing builds, performing code review, and editing files, then deploy it within the code-review UI! 2/n

1

5

37

Jacob Austin

@jacobaustin132

3 months

More work from Google on AI for SWE, here automatically fixing build errors! The cool thing about fixing builds is you can check if the build succeeds before showing the user the fix. Results in a measurable shortening of code submission time too!

Vaibhav Tulsyan

@xennygrimmato_

3 months

Excited to share a new blog on ML-based repair for build errors at Google! We found that automatically repairing build errors in the IDE increases productivity as measured by overall task completion with no detectable negative impact on code safety!

6

27

132

0

7

36

Jacob Austin

@jacobaustin132

2 years

Hiking in the shadow of the eastern Sierras, it feels like another world. What a high.

3

0

34

Jacob Austin

@jacobaustin132

1 year

Google is in the game! A lot of hard work is going into building an exciting, helpful, and responsible new generation of LLM-based tools at Google

Sundar Pichai

@sundarpichai

1 year

1/ In 2021, we shared next-gen language + conversation capabilities powered by our Language Model for Dialogue Applications (LaMDA). Coming soon: Bard, a new experimental conversational #GoogleAI service powered by LaMDA.

741

3K

15K

0

31

Jacob Austin

@jacobaustin132

1 year

Happy to share our work on multilingual evals for code LLMs, led by @GOrlanski . We open-source BabelCode, a framework for running execution-based coding evals across >10 languages (including Rust and Julia) and study the effect of language balancing on low-resource languages 1/2

Gabe Orlanski

@GOrlanski

1 year

📢Measuring The Impact Of Programming Language Distribution We present the BabelCode framework for multi-lingual code evaluation and an investigation into the impact of PL distributions in training data. Paper: Code: 🧵

2

11

63

1

6

30

Jacob Austin

@jacobaustin132

6 months

A couple lessons from this: * IDE wars are coming. Collecting data in the same dev environment you deploy in is a huge advantage. * LLMs make great demos but it's hard to trust them at complex tasks. Reviewing code is harder than writing it. High-precision, low-recall is OK!

1

2

30

Jacob Austin

@jacobaustin132

1 year

A huge amount of credit goes to the UX team for helping us make model edits understandable, so developers can audit the code that's being changed. Model calibration also becomes surprisingly – building developer trust by only showing highly confident predictions

1

27

Jacob Austin

@jacobaustin132

1 year

i found Oppenheimer, like most of Christopher Nolan’s movies, lacking in emotional resonance. Nolan seems to make films about concepts that interest him (time, space, a biography he just read), without worrying about their relevance to the present moment

6

0

26

Jacob Austin

@jacobaustin132

1 year

@_jasonwei Cost is an important drawback: generalist models will always be outperformed by smaller task-specific models when cost and latency are factored in, except for tasks only the largest models can do. With that said, distillation is likely to play a role

1

25

Jacob Austin

@jacobaustin132

3 months

2290 tons of CO2 is a lot, but it's also roughly...38 flights from NYC to London on a 737. More CO2 was probably emitted by Meta employees flying back and forth during model development

Sasha Luccioni, PhD 🦋🌎✨🤗

@SashaMTL

3 months

So LLaMa 3's carbon footprint is... huge? 🤯 They estimate it to be 2,290 tons of CO2eq, compared to 550t for training GPT-3 and 66t for training *all* of the BLOOM models (1B-176B) 🌬️

48

61

277

1

0

23

Jacob Austin

@jacobaustin132

2 years

Please consider joining the Blueshift Team! They're wonderful people doing amazing work on reasoning, AI for science, and more

Behnam Neyshabur

@bneyshabur

2 years

Interested in Reasoning with Large Language Models? We are hiring! Internship: Full-Time Research Scientist: Full-Time Research Engineer: Learn more about Blueshift Team:

6

18

170

0

1

23

Jacob Austin

@jacobaustin132

2 years

@amasad The next generation of code LLMs will exhaust the code available at GitHub HEAD. The amount of diff data is several orders of magnitude larger

0

20

Jacob Austin

@jacobaustin132

2 years

Returning from NeurIPS, I flew an hour the wrong way to Fort Worth, and then missed my flight to NYC. Now I get to experience the cozy embrace of this hard airport floor

5

0

22

Jacob Austin

@jacobaustin132

1 year

the people I trust most are loudly and persistently expressing doubt about their beliefs and actions

1

22

Jacob Austin

@jacobaustin132

8 months

Gemini is here and it’s actually pretty decent!

Demis Hassabis

@demishassabis

8 months

The Gemini era is here. Thrilled to launch Gemini 1.0, our most capable & general AI model. Built to be natively multimodal, it can understand many types of info. Efficient & flexible, it comes in 3 sizes each best-in-class & optimized for different uses

409

2K

11K

0

20

Jacob Austin

@jacobaustin132

6 months

You can find the paper here: . I think it's an awesome case study in applied LLM deployment. Huge shoutout to Peter Choy, Alex Frömmgen, @lerakharatyan , @gssurita , Kevin Villela, @dtarlow2 , Maxim Tabachnyk, really too many people to list!

2

20

Jacob Austin

@jacobaustin132

1 year

Bard is alive. Try it out!

Jeff Dean (@🏡)

@JeffDean

1 year

Bard is now available in the US and UK, w/more countries to come. It’s great to see early @GoogleAI work reflected in it—advances in sequence learning, large neural nets, Transformers, responsible AI techniques, dialog systems & more. You can try it at

32

121

745

0

18

Jacob Austin

@jacobaustin132

1 year

@TheXeophon Yes, we have a DSL that decomposes the process of writing a PR into actions like "<run build [target]>" or "<make edit [location] [diff]>". The goal is to represent any action a developer could take as a small, local change, instead of making the LLM somehow output a big file

1

0

18

Jacob Austin

@jacobaustin132

1 year

To be clear, I don't mean the "scale won't solve everything" line as a criticism of scaling. I just find it implausible that LLMs can solve arbitrary problems without decomposing them or adapting to feedback from an environment

1

0

18

Jacob Austin

@jacobaustin132

1 year

Speaking from personal experience, the code completion feature in Colab is magical!

Colaboratory

@GoogleColab

1 year

Your new coding assistant is almost here! Check out these new Colab features: natural language to code generation, code completion, and an integrated chatbot. Read all about at authored by @thechrisperry and @shresbm

11

97

453

0

2

18

Jacob Austin

@jacobaustin132

2 years

@DrJimFan Big +1 here. The model is implicitly trained on a mixture of p(answer | evidence) and p(answer), so it interpolates between memorizing and looking for answers in-context (see )

1

15

Jacob Austin

@jacobaustin132

1 month

@nearcyan @arpitingle this isn’t really true, Noam and Daniel intended from the beginning to “solve loneliness”

3

0

16

Jacob Austin

@jacobaustin132

2 years

@lauralondon_ @moultano Desalination plants can't prevent flooding when sea-levels rise several meters due to Antarctic ice sheets melting. Burying power lines will reduce wildfire frequency at massive cost, but it won't stop them when rising temperatures lead to ever more arid conditions.

8

0

14

Jacob Austin

@jacobaustin132

1 year

it’s frightening walking around Williamsburg hearing tech grifters talk about their “AI for media” startups. it feels better to work upstream of that, on core tech, but it’s not obvious if my hands are cleaner

2

0

15

Jacob Austin

@jacobaustin132

1 year

@natfriedman Is this toolformer? Toolformer seems specifically about using prompting + log-likelihood based filtering to enable tool use. The idea of tool use in this form has been around for years

0

15

Jacob Austin

@jacobaustin132

1 year

Another aspect of this work to note: it (partly) solves the "specification" problem of program synthesis: how do we tell the computer what code we want it to write? TLDR: rather than tell a model what to do, let it learn from context what you'll want to do next. A thread 1/n

Danny Tarlow

@dtarlow2

1 year

Very happy to share our work on activating Google's software dev process as an engine for ML-powered dev tools. A multi-year effort from many across Alphabet. Special shout-out to @jacobaustin132 @blip42 @PManzagol @dancherp & Petros Maniatis. See Jacob's🧵& the blog for more.

0

6

39

1

15

Jacob Austin

@jacobaustin132

2 months

Smart Paste highlights the core UX challenge of AI for SWEs. The more context switching is required to verify a suggestion, the less useful it is. Tools like code completion and Smart Paste that make suggestions at the cursor and are instantly verifiable are the easiest to adopt

0

14

Jacob Austin

@jacobaustin132

11 months

@_jasonwei Character can make money without "getting something right". As you point out, exploiting loneliness/insecurity is lucrative. The fact that shamelessly monetizes a desire for connection (where OAI/Anthro refused) speaks badly, ironically, of their character

character.ai | Personalized AI for every moment of your day

Meet AIs that feel alive. Chat with anyone, anywhere, anytime. Experience the power of super-intelligent chat bots that hear you, understand you, and remember you.

character.ai

2

0

14

Jacob Austin

@jacobaustin132

6 months

We first talked about this project in mid-2022 in a @GoogleAI blog post (here's a thread at the time: ), but this paper talks in much more detail about the model and the design process we went through.

Jacob Austin

@jacobaustin132

1 year

Excited to see a blog post on one of the coolest projects I've worked on at Google: using LLMs to automatically resolve code-review comments for Google engineers! 1/n

9

74

527

1

0

14

Jacob Austin

@jacobaustin132

1 year

I loved people like Anthony Bourdain for this reason. You can see him grappling with both the beauty and horror of his life and his art I wish the AI world had more of this. We cannot know if what we make is good, no matter how well-intentioned we are

0

13

Jacob Austin

@jacobaustin132

8 months

To grad school applicants: the single best advice I got was that you’re generally admitted by a single faculty member who’ll bet on you, not by the department. Pick a few people and target your application to them

1

0

13

Jacob Austin

@jacobaustin132

11 months

@docmilanfar @jaschasd Strongly agree, I still find this one of the clearest explanations of dynamical systems and stochastic processes, it's quite a joy to read

1

12

Jacob Austin

@jacobaustin132

1 year

Please stop. Naive techno-optimism and American chauvinism really aren’t a good combo

Alexandr Wang

@alexandr_wang

1 year

Today, @scale_AI is launching our 2 major platforms to bolster government and enterprise: 🎖 Scale Donovan, the AI copilot for defense 🏙 Scale EGP, full-stack generative AI for global enterprise 👇 See Donovan in action below 🧵 on our platforms and why they are so critical

37

61

391

0

12

Jacob Austin

@jacobaustin132

2 years

@urialon1 @_jasonwei Reminds me of the Python-GSM8K results from the PaLM paper or MathQA-Python. Cool to see that intermediate natural language instructions are helpful!

0

12

Jacob Austin

@jacobaustin132

2 months

I think rather soon, these models will be helpful for scientists and mathematicians. An LLM doesn't have to do super advanced math to be useful, there's value (to me at least) in instantly proving little lemmas that help keep you in a flow state. More to come!

2

1

11

Jacob Austin

@jacobaustin132

3 months

i see a lot of people calling this "goodharting" but it's sort of not goodharting, it's just leaking the test set esp. as existing evals are translated into more languages, removing them becomes increasingly hard

Summer Yue

@summeryue0

3 months

How much do LLMs overfit public benchmarks? Our team at @scale_ai SEAL lab studied this by creating a GSM8k-equivalent eval from scratch. The resulting performance gap reveals data contamination in some model families, while GPT, Claude, and Gemini show no signs of overfitting.

8

17

124

2

0

12

Jacob Austin

@jacobaustin132

1 year

Imagine having gone to college and thinking the highest earning graduates were the best

Crémieux

@cremieuxrecueil

1 year

Which university has the best graduates? A new paper using an earnings-based measure of graduate quality (qⱼ) provided the answer: the top of the list is dominated by Indian universities. What about Harvard? Rank #26 .

68

112

768

1

0

11

Jacob Austin

@jacobaustin132

3 years

Lots more in the paper: . And a huge shoutout to my collaborators: @gstsdn , @Maxwell_Nye , @quocleix , @RandomlyWalking , @EllenJiang2 , @dmdohan , @Carryveggies , Michael Terry, @hmichalewski , and @MaartenBosma !

0

2

11

Jacob Austin

@jacobaustin132

5 months

Two weeks in London and I managed to make it to Wigmore Hall twice, for @jeremydenk playing the Bach Partitas and tonight for the Handel Players. Wigmore Hall is special, like the 92Y in NYC: small, with fantastic acoustics, intimate in the best sense.

Geoff Andrew

@Geoff_Andrew

5 months

The culmination of this week's musical mini-binge – 2nd concert today @wigmore_hall – felt somehow fitting after so much marvellous stuff both old and very new: @jeremydenk performing (the entire session from memory!) all Bach's Partitas. Magic.

0

3

11

0

8

Jacob Austin

@jacobaustin132

2 years

@gallowspost @francoisfleuret @OpenAI To be clear, they've intentionally never confirmed the 175B number publicly

1

0

10

Jacob Austin

@jacobaustin132

6 months

Our first model had a bunch of bad habits: it made low-confidence suggestions, addressed unrelated issues, and wasn't very visible to the change author. To improve, we improved data quality, filtered for single comment reviews, filtered by confidence, and added synthetic data.

1

0

10

Jacob Austin

@jacobaustin132

2 years

@amanrsanger Training smaller models on a single language alone (e.g. Python-only) can match the performance of Codex at smaller sizes on single language evals. The open source world can't match Codex without huge investment, but there are shortcuts!

0

10

Jacob Austin

@jacobaustin132

6 months

Our first version was a lightly-finetuned version of Google's software engineering foundation model DIDACT, and made very plausible suggestions. But people didn't trust it: there's a big difference between a plausible edit and what the developer really wants

1

0

9

Jacob Austin

@jacobaustin132

1 year

Super excited to see a new company from the incredible @reinerpope !

Reiner Pope

@reinerpope

1 year

I’m excited to announce our new company, MatX, started with @MikeGunter_ . We want to make AI better, faster, and cheaper by building more powerful hardware. Read on for a short introduction, or see our full announcement here: .

24

35

388

0

9

Jacob Austin

@jacobaustin132

1 year

@Ted_Underwood Over-training + instruction-tuning. As @moultano says, OpenAI can e.g. train a 12B model for 10x the "Chinchilla-optimal" compute budget and end up with the same loss as a 10x larger model trained for less time 1/2

1

8

Jacob Austin

@jacobaustin132

11 months

I’m back in my hometown of Portland, Maine for the month. Hit me up if you’re around and want to hike, climb, or grab a beer!

0

9

Jacob Austin

@jacobaustin132

6 months

All-in-all, we ended up improving user trust in the model and addressing around 7.5% of all code review comments at Google with an ML-edit. All while keeping precision high (usually around 50%) to avoid wasting engineering time!

1

9

Jacob Austin

@jacobaustin132

2 years

The lesson of language models for me is that noise generation with language is painfully easy. You have to look at what you write and say “does this say anything new? Could GPT-3 have written this?”

0

9

Jacob Austin

@jacobaustin132

1 year

@DynamicWebPaige @DavidSacks Fulfilling a request (in this case, to write a slogan) isn’t necessarily at odds with political neutrality in its own answers?

0

8

Jacob Austin

@jacobaustin132

2 years

As scaling LLMs becomes harder, performance gains come more and more from clever prompting, bootstrapping, and chaining multiple LLMs together. Cascades is a PPL that makes inference & optimization on chained language models easy!

David Dohan

@dmdohan

2 years

Happy to release our work on Language Model Cascades. Read on to learn how we can unify existing methods for interacting models (scratchpad/chain of thought, verifiers, tool-use, …) in the language of probabilistic programming. paper:

3

97

668

1

0

9

Jacob Austin

@jacobaustin132

1 year

@hcky479 @NYCMayor Let's get rid of cars in the city!

0

9

Jacob Austin

@jacobaustin132

1 year

idea: add a carbon offset option to every gas station and airline checkout webpage. even if only 1% of people pay an extra 20% on gas, it creates a market for carbon offsets and puts the idea front and center in people’s minds

4

0

8

Jacob Austin

@jacobaustin132

3 months

@jxmnop @polynoamial I think phd students have a pretty great opportunity to publish general-purpose ideas that industry can't publish right now: write a great paper on data selection, length generalization, self-improvement, RL, etc. and include clear scaling laws up to 1B and everyone'll love it

1

0

8

Jacob Austin

@jacobaustin132

1 year

@ben11kehoe @forrestbrazeal For now, it's up to the author to approve the change, and yes, then the reviewer needs to re-approve (which they'd normally have to do anyway after the author addressed a comment). We're working on the "pre-approve" UX now, so they can flag that the ML edit is right

1

0

8

Jacob Austin

@jacobaustin132

2 years

@_jasonwei Why is emergence a useful thing to think about? Is there reason to think "emergence" is anything more than "log likelihood dropping below some critical threshold" (i.e. a function of model quality, not of size or compute)?

2

0

8

Jacob Austin

@jacobaustin132

2 years

UI design is feature engineering for humans

1

0

8

Jacob Austin

@jacobaustin132

3 months

@jxmnop To be clear, this isn't "compute optimal" in any sense, but it might be hella useful

4

0

8

Jacob Austin

@jacobaustin132

2 years

@RichardMCNgo A counterargument (which you've made yourself) is that optimal strategies in primitive or partially-observed environments may not be optimal today, e.g. avoiding pork because of disease or stoning women for adultery in a society that functions without monogamy.

1

0

8

Jacob Austin

@jacobaustin132

2 years

@nearcyan The alignment crowd has tried to push the term as broadly as possible. Now they reap the rewards. But LLMs are far more likely to harm society by undermining our notions of truth and creativity than by killing us all

0

8

Jacob Austin

@jacobaustin132

6 months

Here's what the UI looks like for the reviewer. The ML suggested edit auto-updates in the code review UI as the reviewer is typing, and they can try to more clearly specify their intent in the comment to guide the ML model!

1

8

Jacob Austin

@jacobaustin132

2 years

@andy_matuschak Having taken piano lessons for 15 years, I think it's just be because it's hard to fit 15 pianos in a room and impossible for them to play at the same time. We do group music lessons for elementary students, but it's mostly chaos. At least you can do math silently

1

0

8

Jacob Austin

@jacobaustin132

2 months

@denny_zhou @lmthang That’s not quite true, it’s finetuned to do better at math and coding, not for this eval specifically

0

8

Jacob Austin

@jacobaustin132

2 months

1/n on classical music yesterday i heard the Pavel Haas quartet playing the Brahms A Major piano quartet at Wigmore Hall with Boris Giltburg. it's the second of Brahms' 3 piano quartets and my favorite. it's tragic and warm, rich, very full, like a Mahler symphony

1

0

7

Jacob Austin

@jacobaustin132

1 year

@Miles_Brundage @kipperrii @typedfemale not a dig at OAI, fwiw. just at the drumbeat of self-righteous twitter posts about how hard poverty makes it to enjoy wealth in SF

0

6

Jacob Austin

@jacobaustin132

2 months

@thekathanpatel Not via API but if you ask a follow-up question it should answer much faster

1

7

Jacob Austin

@jacobaustin132

4 years

@AnimaAnandkumar @OpenAI @Microsoft @Twitter Not to mention “exclusive licensing” is hardly “open”...

0

7