Austin Huang Profile Banner
Austin Huang Profile
Austin Huang

@austinvhuang

4,100
Followers
1,520
Following
282
Media
2,924
Statuses

R&D @answerdotai . Past: @GoogleDeepMind , MIT, Harvard, Berkeley.

Joined February 2017
Don't wanna be here? Send us removal request.
Pinned Tweet
@austinvhuang
Austin Huang
15 days
Announcing: The initial release of my 1st project since joining the amazing team here at @answerdotai gpu.cpp Portable C++ GPU compute using WebGPU Links + info + a few demos below 👇
22
183
1K
@austinvhuang
Austin Huang
6 months
I'm happy to share the release of gemma.cpp - a lightweight, standalone C++ inference engine for Google's Gemma models: Have to say, it’s one of the best project experiences of my career.
23
199
1K
@austinvhuang
Austin Huang
7 months
I’ve think people are bad at understanding just how *small* LLMs are. LLMs will fit on a thumb drive you can buy for cheap on Amazon and stick in your pocket, and that’s before quantization. It’s compression so strong that we’re traveling back in time to when you could get the
@jam3scampbell
James Campbell
7 months
People are really bad at understanding just how big LLM's actually are. I think this is partly why they belittle them as 'just' next-word predictors
Tweet media one
98
439
3K
20
65
620
@austinvhuang
Austin Huang
2 years
9 out of 10 ML researchers are working on RLHF in 2023.
19
50
563
@austinvhuang
Austin Huang
9 months
There’s not enough creative exploration of the KV cache.. researchers seem to ignore it as just an engineering optimization. The KV cache is the global state of the transformer model. Saving, copying, reusing kv cache state is like playing with snapshots of your brain.
@srush_nlp
Sasha Rush
9 months
@haozhangml The use case I am interested in is that I want to generate from 1000 different short suffixes that all use the same long prefix. (I can do this in transformers by setting the KV cache.)
5
1
24
4
22
195
@austinvhuang
Austin Huang
2 years
A personal update - I recently joined Google Brain. I feel fortunate to join such a wonderful and collaborative research community during such an interesting time. Excited for what's next in this journey of accelerating AI progress. The coming years are going to be wild.
9
3
182
@austinvhuang
Austin Huang
2 months
Async* life update - I have joined Answer .AI , the "new old kind of R&D lab" founded by @jeremyhoward and @ericries . I feel lucky to embark on this journey, with this group of people. The old R&D labs are something I've had a relationship to my entire life... there's more to
8
8
173
@austinvhuang
Austin Huang
3 years
machine learning needs its own demoscene with constraints analogous to 64K and 4K compos: "4 MB of parameters max" "trained using only simd parallelism" "trained on 1 GPU in 15 minutes" ...
@Tim_Dettmers
Tim Dettmers
3 years
I am excited to share my latest work: 8-bit optimizers – a replacement for regular optimizers. Faster 🚀, 75% less memory 🪶, same performance📈, no hyperparam tuning needed 🔢. 🧵/n Paper: Library: Video:
Tweet media one
18
283
1K
3
27
168
@austinvhuang
Austin Huang
7 months
Relatedly, most people have no idea just how powerful our own personal computers are. I’d argue that discovering just what regular computers are capable of is one of the most impactful open research questions right now.
4
6
164
@austinvhuang
Austin Huang
1 year
If you've ever said "it's *just* matrix multiplication!", then you should probably read this article. Few people appreciate the insane amount of sophistication that goes into a matrix multiply.
1
23
157
@austinvhuang
Austin Huang
3 years
10 years. Respect to anyone who embarks on the lonely journey of deep work to create something like this.
@awwbees
Ryan Challinor @[email protected]
4 years
hello! for the past decade I've been building bespoke, a free modular synth environment with python livecoding support for mac/windows/linux you can find the code and get builds at and if you scroll through my feed, you'll find a bunch of videos of it
39
364
2K
2
9
141
@austinvhuang
Austin Huang
3 years
My @PyTorch Developer Day talk on Real-world Research to Production is here - How ML projects are changing - Building models when labeled data is not available - End-to-end considerations, neural network+user experience codesign #FidelityAssociate #PTD2
1
22
138
@austinvhuang
Austin Huang
30 days
For a while I've had some vague sense that tinygrad's frontend was pytorch-like and internals were an optimizing compiler that spits out kernel code, but I didn't have much of an idea about the implementation. This writeup looks good and might even be a beginner-friendly entry
1
11
100
@austinvhuang
Austin Huang
4 years
@julien_c @huggingface @marksaroufim This was hilarious and has a lot of truth to it. @ykilcher
Tweet media one
2
11
85
@austinvhuang
Austin Huang
3 years
I wonder if applied ML practitioners appreciate that "training a model" is going to become a drastically different process within the next 2-4 years.
7
2
79
@austinvhuang
Austin Huang
6 months
is on HN 🤖🙈
Tweet media one
@betterhn20
Hacker News 20
6 months
Gemma.cpp: lightweight, standalone C++ inference engine for Gemma models ()
0
1
6
5
8
79
@austinvhuang
Austin Huang
5 years
Finally, there's a paper to cite for @PyTorch ! From the neurips 2019 pre-proceedings - h/t @junjihashimoto3
1
25
76
@austinvhuang
Austin Huang
1 year
Amazing that this works with a matmul implementation like this. omp can be unreasonably effective sometimes.
Tweet media one
@karpathy
Andrej Karpathy
1 year
Yay, llama2.c can now load and inference the Meta released models! :) E.g. here inferencing the smallest 7B model at ~3 tokens/s on 96 OMP threads on a cloud Linux box. Still just CPU, fp32, one single .c file of 500 lines: expecting ~300 tok/s tomorrow :)
62
343
3K
3
5
74
@austinvhuang
Austin Huang
2 years
Insanely great to watch new generations of artists come into their own. I have fond memories of seeing Brandon do the goofy-yet-inspired weekly YT grind on the freddiew channel. @StressLevelZero is probably at the forefront of VR user interaction now.
@BrandonJLa
Brandon J Laatsch
2 years
The teaser for our 4th project, BONELAB.
217
417
2K
1
2
67
@austinvhuang
Austin Huang
3 years
"Any prior knowledge can be encoded as a data augmentation or data generation scheme." Yes! This is the right way to incorporate prior knowledge into machine learning models. The key factor is that data is inherently composable.
@pachyderminc
Pachyderm
3 years
Reinforcement Learning for Industrial AI with Pieter Abbeel - #476 by @twimlai
0
4
18
5
8
53
@austinvhuang
Austin Huang
15 days
Non-AI general purpose GPU compute can be fun too. Here's shadertui - a terminal-based shadertoy clone that live loads WebGPU compute shaders. shadertui is only ~ 150 lines of code and compiles in a second.
2
2
54
@austinvhuang
Austin Huang
15 days
gpu.cpp helps open up this portable + hackable + ease-of-use design space for GPU programming. Here's a "hello world" program implementing a GELU activation. We use WebGPU as a portable native GPU API (no browser or web needed). Edit/compile/run cycles are 1-2 seconds.
Tweet media one
2
7
51
@austinvhuang
Austin Huang
15 days
GPUs are the most empowering technology in the world today. Currently programming GPUs is usually - Low-level platform-specific stacks - CUDA, ROCm - High-level, portable frameworks (PyTorch/ Jax) + ML compilers. There are good, practical reasons for this combo. Nonetheless..
1
1
51
@austinvhuang
Austin Huang
7 months
@typedfemale Nix: featuring 100% reproducibility on demand yet somehow requires constant maintenance.
2
1
50
@austinvhuang
Austin Huang
3 years
We had the privilege of presenting our Sim2Real Docs project at @NeurIPSConf 2021 DCAI. TLDR: Python library for synthetic images of documents in natural scenes using @Blender . It's open source: It's @nmaddikunta21 's first paper:
@AndrewYNg
Andrew Ng
3 years
Thanks to everyone that participated in the Data-centric AI NeurIPS workshop! I was surprised and delighted at the sheer amount of innovation in the field. I also share some of my reflections from the workshop in The Batch.
5
22
192
3
8
49
@austinvhuang
Austin Huang
3 years
@abhi1thakur A while back I took a stab at unrolling the call stack into one big figure (based on Sasha Rush's Annotated Transformer post).
Tweet media one
2
3
46
@austinvhuang
Austin Huang
15 days
I've always wished we could just do this with low-level GPU code though: #include "gpu.h" // do gpu stuff No custom vendor tooling, massive build system, or fiddling w/ details like descriptor set layouts. Just a C++ compiler + instant edit/compile/run.
Tweet media one
2
2
47
@austinvhuang
Austin Huang
6 years
@fchollet The issue is "knowing the math" in the wrong sense - CS education fails by emphasizing "algorithms as sequences of operations" instead of computational models as mathematical representations. The math of PCA as a generative latent model elucidates what it does and why it works.
0
5
45
@austinvhuang
Austin Huang
15 days
Many thanks to @jeremyphoward Sarah Pan and fellow @answerdotai colleagues for supporting this project and kicking the tires. Also thanks to early contributors like @junjihashimoto , Trevor and Micheal at the gpu.cpp discord. **Join us** at the gpu.cpp channel in the @fastdotai
2
5
45
@austinvhuang
Austin Huang
4 years
Nice example of why one should never mistake implementation effort for a moat.
Tweet media one
@sjmielke
Sabrina J. Mielke
4 years
Another old bookmark from 2014 where some guy writes a fast and simple syntactic parser: ...wait ... @honnibal ? Omg, is this the spaCy origin story 😱
1
13
89
1
6
44
@austinvhuang
Austin Huang
3 months
Remember base rates - you can easily have 99% accuracy on a balanced test set and then be wrong 99% of the time when you deploy to the real world. OpenAI’s own AI text classifier was in this regime before they shut it down.
@petergyang
Peter Yang
3 months
AI detectors feel like total scams - sad that students have to deal with this
Tweet media one
762
3K
35K
1
6
42
@austinvhuang
Austin Huang
4 months
@KLdivergence I read it as: - the prediction target of the neural network is the latent variable of interest. - the neural network itself is the estimator function.
5
1
41
@austinvhuang
Austin Huang
5 years
@jmhessel @murraygabriel Point taken ... but as a parent I'm definitely observing some form of sequential pre-training + latent dimension expansion with the baby.
1
0
39
@austinvhuang
Austin Huang
4 years
Finally gave HLS+ @code a spin after holding out with a syntax-highlighting neovim workflow. Haskell Language Server is fantastic. Developer ergonomics have really improved in this dimension.
Tweet media one
Tweet media two
Tweet media three
3
6
40
@austinvhuang
Austin Huang
4 years
@RadekPaszkowski @bookofshaders 🤎 @code as a GLSL playground running locally- Shader hacking really does put all other programming feedback loops to shame. There's nothing quite like it for a flow junkie.
0
9
38
@austinvhuang
Austin Huang
2 years
Let's find out just how far text-as-vision can go. #SimTex - I coded this up over the holiday as a data generator + perception challenge for language models. Have a look👇
@goodside
Riley Goodside
2 years
There's a thread going around claiming ChatGPT / GPT‑3 can recognize subjects (e.g. "a bird", "a person") in images rasterized as ASCII art. I'm skeptical. E.g., GPT‑3 identifies this ASCII-art MNIST digit as "8" 57% of the time and "4" doesn't appear in its top-5 choices:
Tweet media one
Tweet media two
Tweet media three
22
10
166
4
1
37
@austinvhuang
Austin Huang
15 days
We want to broaden the availability of GPU compute and just drop-in custom GPU algorithms inside applications, simulations, runtimes, etc and get broad portability + ease of use. Here's a little physics sim - ensemble of double pendulums doing their thing ~ 100 LoC + compiles
1
2
37
@austinvhuang
Austin Huang
2 years
@typedfemale Aphex Twin was the original productivity grifter for this one.
Tweet media one
1
1
36
@austinvhuang
Austin Huang
3 years
@iamtrask The DPR paper opened my eyes to the central importance of neural retrieval. It was clear there's so many possibilities if you embed retrieval mechanisms w/ data as part of the model. I think we'll see more variations on fusing retrieval and inference in the next few years.
0
0
37
@austinvhuang
Austin Huang
6 months
What's next? There’s a lot of low-hanging fruit - we welcome external collaborators . I'm most excited to enable new research on co-design between models + inference engines. Stay tuned. “Now that things are so simple, there's so much to do.” - M. Feldman
0
2
36
@austinvhuang
Austin Huang
15 days
There are tradeoffs for portability but early experiments are promising. w/ @junjihashimoto a naive-ish baseline matmul is around 2.5 TFLOPS on my M1 max laptop + and there's plenty of room for further optimization. We'll be following the path paved by llm.c and plan to port
Tweet media one
2
1
36
@austinvhuang
Austin Huang
4 months
@tobyshooters Nice, sort of maps to: Abductive - artist Inductive - scientist Deductive - engineer You can live a more expansive life journey if you don’t pigeonhole yourself as an artist, scientist, or engineer.
4
2
34
@austinvhuang
Austin Huang
2 months
Such a good talk, provides much-needed clarity on conceptualizing encoder vs decoder models.
@hwchung27
Hyung Won Chung
2 months
I gave a lecture at @Stanford CS 25. Lecture video: AI is moving so fast that it's hard to keep up. Instead of spending all our energy catching up with the latest development, we should study the change itself. First step is to identify and understand
Tweet media one
25
204
1K
0
4
34
@austinvhuang
Austin Huang
3 years
@thesephist Agreed- lets move away from high maintenance tools for thought. Instead of hand curated knowledge graphs scattered over markdown docs, #OpenMemex focuses on enabling automation with a SQLite event stream + neural network integrations. It’s open source -
4
1
33
@austinvhuang
Austin Huang
4 years
Weekend hack - proof-of-concept Haskell binding to @huggingface fast tokenizers. Anyone interested in seeing this expanded? Does @huggingface accept PRs for contributed language bindings, if this were further developed @ClementDelangue @srush_nlp ?
Tweet media one
2
4
32
@austinvhuang
Austin Huang
6 months
gemma.cpp is a minimalist implementation of Gemma 2B and 7B models: focusing on simplicity and directness rather than full generality, it takes inspiration from ggml, llama.c, and other "integrated" model implementations.
1
3
30
@austinvhuang
Austin Huang
3 years
@cmuratori Reminds me of @Jonathan_Blow 's comments on the ethics of wasting people's time at scale Should have the engine team run everybody's code reviews. For that matter, non-gamedev industries would benefit from @mike_acton parachuting in as a drill sergeant.
0
1
29
@austinvhuang
Austin Huang
5 months
Efron and Morris’s non-technical intro to Stein’s paradox is a delight to read. Possibly even worldview-altering if you haven’t encountered the topic before.
@docmilanfar
Peyman Milanfar
5 months
One of the lesser known ways to compare estimators is "admissibility". An estimator θ* = g(θ,y) of θ from data y is called *in*admissible if g is uniformly dominated by another estimator g(θ,y) for all values of g(θ,y), say in the MSE sense. 1/6
Tweet media one
8
32
291
0
6
29
@austinvhuang
Austin Huang
6 months
The goal of the project is to have a small experimental inference engine for experimentation and research. The codebase has minimal dependencies and is portable pure C++ (taking advantage of for portable SIMD).
1
1
30
@austinvhuang
Austin Huang
2 years
@0interestrates “This is because Amazon will penalize you…”? After fine tuning a language model to exhibit a quasi ego it somehow doesn’t feel right to put threats in the prompt.
0
0
28
@austinvhuang
Austin Huang
3 years
Last book on the summer reading list arrived a month late but it’s finally here. H/t @marksaroufim
Tweet media one
1
0
28
@austinvhuang
Austin Huang
5 years
Concepts, ranges, coroutines, and modules in C++20. Pretty big changes coming, perhaps comparable in scope to C++11, which kicked off the whole "modern" C++ thing.
0
7
28
@austinvhuang
Austin Huang
5 years
Really enjoyed the recent @ylecun lecture series discussing neural networks and physics at the @harvardphysics dept Loeb lectures. Looks like videos have been posted now -
0
14
28
@austinvhuang
Austin Huang
9 months
@ericjang11 Went from light hearted research speculation to pure memes way too fast.
0
0
28
@austinvhuang
Austin Huang
4 years
pretty-simple 4.0 release - a simple Haskell pretty printing library. For me pretty-simple "just works" and I often find myself wishing ghci defaulted to pretty printing for the repl.
2
9
28
@austinvhuang
Austin Huang
6 years
@er_crema @rlmcelreath Bob Carpenter's tutorial on the beta binomial
3
5
28
@austinvhuang
Austin Huang
3 years
@jeremyphoward Did you ever see @gabeeegoooh 's SEIR(+) demo? It's beautifully done:
1
2
27
@austinvhuang
Austin Huang
6 years
Nice writeup by @tarantulae with an under-the-hood look at @PyTorch 1.0 internals + a js integration example
0
13
27
@austinvhuang
Austin Huang
3 years
@jeremyphoward @github @OpenAI This makes me wonder what would happen if there was a "conditioned" generation mode. Instead of "average all of github", you'd change a setting to bias the the model towards writing code in the style of @jeremyphoward 's repos, for example.
3
2
27
@austinvhuang
Austin Huang
4 years
@twiecki @tiangolo makes excellent use of types for automation in FastAPI: For me types in python might not be a huge productivity boost, but they do help reduce the cognitive overhead of returning to a piece of code after a context switch.
1
0
27
@austinvhuang
Austin Huang
9 months
@Suhail Time to publish that paper on 64k bit integer quantization.
1
0
25
@austinvhuang
Austin Huang
5 years
@larsrosenquist I sympathize, but being locked into a complicated bespoke configuration language would’ve been worse. At least a simple format serves as an attractive compilation target for DSLs to explore and compete on. @dhall_lang + @nixos_org offer hope for a better future on this front.
0
0
25
@austinvhuang
Austin Huang
3 years
@wooldridgemike I was at a party having a conversation with an MIT CS professor around 2009. Hearing that I was working with modeling, simulations, and data, the professor just said point blank, "I don't see how this is computer science."
1
0
24
@austinvhuang
Austin Huang
1 year
Seeing LLMs / generative ai being casually deployed local-first to browsers thanks to two mostly unsung heroes: @ApacheTVM being ahead of the curve w/ WebGPU and WASM targets @googlechrome shipping WebGPU in Chrome 113 this month
1
6
25
@austinvhuang
Austin Huang
5 years
Look forward to meeting everyone at the @PyTorch conference #PTDC19 . Come chat with @apaszke and me about differentiable functional programming with @hasktorch :)
3
7
24
@austinvhuang
Austin Huang
4 years
The strategy is not "more refined", it's wrong. ~ 80% of the population is so many orders of magnitude larger than capacity that even shutting down now, in the *best* case, will barely be enough to not completely overwhelm the healthcare system. (1/2)
@iandonald_psych
Professor Ian Donald
4 years
1. The govt strategy on #Coronavirus is more refined than those used in other countries and potentially very effective. But it is also riskier and based on a number of assumptions. They need to be correct, and the measures they introduce need to work when they are supposed to.
3K
18K
41K
1
3
23
@austinvhuang
Austin Huang
4 months
When there's an MoE launch, MegaBlocks is there behind the scenes. Amazing project @Tgale96
@jefrankle
Jonathan Frankle
4 months
And we stood on the shoulders of giants in the community: * @TGale96 , creator of MegaBlocks * The @PyTorch team and FSDP * @nvidia and TensorRT-LLM * The vLLM project * @AiEleuther and their evaluation tools * @dsmilkov + @nsthorat of @lilac_ai * Our amazing friends at @allen_ai
1
6
66
1
4
23
@austinvhuang
Austin Huang
6 months
Beyond the interactive terminal ui for playing with the model, with near-instant model loading we can use gemma as a local-first command line LLM tool.
1
2
23
@austinvhuang
Austin Huang
5 years
Congrats @apaszke & #S4TF on releasing "Tensors Fitting Perfectly" - static analyzers are an exciting middle path b/w dependently typed and untyped tensor dimensions. @srush_nlp - this assert approach may be interesting given prior conversations :)
@s4tfnews
Swift for Tensorflow newsletter
5 years
ITI: Tensors Fitting Perfectly library by @apaszke has been open sourced. @DynamicWebPaige and @bsaeta visited @swiftbysundell podcast to talk about #MachineLearning and #S4TF . @gsoc '19 students presented projects they hacked on during the summer at the last Swift Design meeting.
0
8
22
2
5
23
@austinvhuang
Austin Huang
3 years
Looking forward to participating in the @ml_collective research jam this Wednesday. I'll be presenting a lightning talk on @hasktorch . Additional details should be up soon. Thanks to @savvyRL for organizing!
@ml_collective
ML Collective
3 years
Working on an ML research project but don’t have labmates to show? Show a plot or three in our Research Jam in two weeks! Also open to folks that want to bounce an idea off others or who just want to hang out and talk shop. Details:
1
30
96
0
5
23
@austinvhuang
Austin Huang
5 years
@GabrielG439 I'm a believer in these sand mandala rituals of development 😀 Throwing away code is not a waste if the internal state of the developer has been mutated by the process.
0
2
21
@austinvhuang
Austin Huang
11 months
Researchers often dismiss NN inference runtimes as merely deployment infrastructure. We should embrace runtime implementations as a form of research and discovery. This helps us understand models as systems, discover new methods, and enable new capabilities.
@karpathy
Andrej Karpathy
11 months
Speculative execution for LLMs is an excellent inference-time optimization. It hinges on the following unintuitive observation: forwarding an LLM on a single input token takes about as much time as forwarding an LLM on K input tokens in a batch (for larger K than you might
111
604
4K
2
2
22
@austinvhuang
Austin Huang
3 years
@a_cowley There's things to learn from UX/gamedev folks about engagement loops. How long does is it take for a beginner to get to their first success (e.g. a working useful app)? Haskellers are usually orders of magnitude off as to what constitutes "acceptable"/"good" along this axis.
2
0
22
@austinvhuang
Austin Huang
5 years
@Thom_Wolf An unexpected, forward-thinking idea in PyTorch's list of design principles is that "worse is better". From
Tweet media one
0
1
21
@austinvhuang
Austin Huang
5 years
Nice writeup for those coming from python or other languages. People love to write about advanced topics in Haskell but this basic stuff is where a lot of the core value proposition is.
@oisdk
Oisín Kidney
5 years
New Post: What is good about #Haskell ?
5
33
80
1
6
20
@austinvhuang
Austin Huang
2 years
@BlackHC @rasbt @randal_olson If this were applied to a class where 4% of students were cheating, then 9 in 10 students flagged as "Likely AI-generated" would be false positives? Total flagged as AI: .96*.09 + .04*.26 Human-written but wrongly flagged as AI: .96*.09 .96*.09 / ((.96*.09) + (.04*.26)) = .89
1
3
21
@austinvhuang
Austin Huang
2 years
How has dwarf fortress not been turned into an RL environment?
1
1
20
@austinvhuang
Austin Huang
6 months
The core implementation is ~ 2K LOC,w/ ~ 4K LOC supporting code. It’s meant to be both hackable and also embeddable as a library w/ cmake. Prototype your apps with local LLM inference as a C++ function call. Add runtime support for your own research with a few lines of code.
1
1
20
@austinvhuang
Austin Huang
1 year
@yoavgo Fwiw I don’t get it either. Everyone is talking about boilerplate and abstractions and agents but: 1) LLMs have 1 endpoint 2) afaict it adds boilerplate + cognitive overhead vs just … doing the thing. 3) It’s perfectly doable to implement agents and API chaining as programs.
0
0
20
@austinvhuang
Austin Huang
3 months
@srush_nlp IMO being an incoherent discipline is a virtue. Once a field is coherent enough that there's a sharp delineation between in/out of scope, that adds an (imo unnecessary) constraint to search directions for progress.
1
0
18
@austinvhuang
Austin Huang
2 months
I've only been at Answer .AI a short while, but I can already say there are some amazing things coming. I'll be releasing something soon too. Stay tuned.
1
0
20
@austinvhuang
Austin Huang
6 years
@jeremyphoward @apaszke @BrabecJan91 @gregcons covers a beginner-safe modern-ish C++ subset (though not available as a book and the talk is aimed at convincing teachers) +1 @BrabecJan91 Effective [Modern] C++ comes closest (except for gamedev .. in which case @apaszke 's joke answer applies)
1
5
19
@austinvhuang
Austin Huang
4 months
@davidad Audiomulch had matrix routing as modules embedded within a topological graph. Worked incredibly well for re-routing patches as part of a live performance. Still one of my favorite HCIs 25 years on.
Tweet media one
0
1
19
@austinvhuang
Austin Huang
1 year
@dustinvtran Language is also much more forgiving than (1-e)^n. I get where the diffusion process view comes from but I don't think it provides the correct intuition.
0
0
18
@austinvhuang
Austin Huang
6 months
Jan Wassenberg (author of ) and I started gemma.cpp as a small project just a few months ago. We were lucky to find amazing collaborators from around Google - @PhilCulliton , @dancherp , Paul Chang, and of course, the GDM Gemma team.
1
1
19
@austinvhuang
Austin Huang
2 months
Much appreciation to all my colleagues at @GoogleDeepMind . My time there was a life-changing experience and I was especially proud to be a part of the Gemma and open sourcing effort. Despite the Google criticism that's common nowadays (sometimes warranted), there are
1
0
19
@austinvhuang
Austin Huang
3 years
A large part of my own team's successes solving everyday applied ML problems can be boiled down to starting with "what program can I write to produce the desired model behavior?" instead of "I need labeled training data".
@maxjaderberg
Max Jaderberg
3 years
Very excited to release our new work: Open-Ended Learning Leads to Generally Capable Agents. tldr; algorithm that dynamically shapes task distributions to train agents on huge task space, resulting in surprisingly general behaviour Thread: (1/n)
10
216
874
2
3
19
@austinvhuang
Austin Huang
5 years
Hire this guy. Also, congrats @apaszke !
@apaszke
Adam Paszke
5 years
My graduation is approaching rapidly, and that means that I’ll be able to work full-time soon. If you’re looking for someone excited about building next generation tools and infra for scientific computing (and ML) then let's talk! Note: remote and Europe only 🚀
19
19
316
0
1
19
@austinvhuang
Austin Huang
2 months
Next generation AI in Japan extends beyond particular products, it’s a geopolitical imperative. I’m really happy to see Sakana AI and Shane x GDM Japan leading things there. It’s a bullish sign for a country’s institutional capacity that it’s able to discern technical
@shaneguML
Shane Gu
2 months
Whether you agree or not with $1.1B valuation of Sakana AI based on their outputs, I argue it was easy to raise $155M. Japan is a fascinating country with untapped market and talent opportunities. VCs/AI engineers interested in investing/working in Japan, feel free to DM me. My
Tweet media one
Tweet media two
Tweet media three
6
13
172
0
5
19
@austinvhuang
Austin Huang
4 years
I learn a lot from design notes of frameworks. These are great -
@DynamicWebPaige
👩‍💻 Paige Bailey
4 years
😁 Couldn't agree more! "Unlike the stateful pseudorandom number generators (PRNGs) that users of NumPy and SciPy may be accustomed to, JAX random functions all require an explicit PRNG state to be passed as a first argument." Learn about it here 👉
4
7
53
1
1
19
@austinvhuang
Austin Huang
3 years
Both can be simultaneously true: 1. Your model makes a mistake only 1 out of a trillion times in your test set. 2. The probability of your model exhibiting catastrophic failures shortly after prod deployment is nearly 1.
0
2
18
@austinvhuang
Austin Huang
3 years
Here's the final version, for now. High res svg/png/pdf:
Tweet media one
@austinvhuang
Austin Huang
3 years
First attempt at a visual which flattens the call hierarchy of the classic "Annotated Transformer" by @srush_nlp . Suggestions welcome.
Tweet media one
1
0
9
2
2
18
@austinvhuang
Austin Huang
1 year
@yoavgo @johnschulman2 It seems having an auxiliary model is useful to learn the knowledge boundary of the main model and the reward model can do that (among other things) w/ RL. Still not obvious that this "I-don't-know" problem couldn't be somehow addressed using an auxillary model + SL though.
2
1
18
@austinvhuang
Austin Huang
9 months
@jeremyphoward Summary of the 90s for people who didn't get to experience the magic: Use a computer? Must be an antisocial nerd. Produce electronic music? Sounds like video game music to everyone because real music has guitars. Friends is the most popular TV show.
3
0
17
@austinvhuang
Austin Huang
3 years
@jlongster Types are intended to be broken as you evolve a program. @Jonathan_Blow calls this "one of the most powerful programming techniques" In Haskell I don't spend time to plan types - I just start YOLO coding and use this technique to refactor as needed.
1
0
18
@austinvhuang
Austin Huang
2 years
@lexi_lambda I'm working on this backed by sqlite. Not there yet with direct manipulation but i am working towards a very different client. It is sad that there are many ideas that are technically possible yet don't cross the threshold of financially sustainability.
1
1
17