James Bradbury Profile Banner
James Bradbury Profile
James Bradbury

@jekbradbury

11,218
Followers
8,387
Following
141
Media
68,347
Statuses

Compute at @AnthropicAI ! Previously JAX, TPUs, and LLMs at Google, MetaMind/ @SFResearch , @Stanford Linguistics, @Caixin .

Joined October 2012
Don't wanna be here? Send us removal request.
@jekbradbury
James Bradbury
6 years
A huge fan of putting this in papers (from the BigGAN paper )
Tweet media one
13
267
1K
@jekbradbury
James Bradbury
4 years
In 2016, when I was working on machine translation, it took me more than a week on a multi-GPU machine to train a competitive system on WMT English-German. Today, JAX on a TPU v3 supercomputer can train a better model on the same data in 16 seconds!
Tweet media one
10
148
907
@jekbradbury
James Bradbury
2 years
Can multi-100B param language models be served efficiently? We think so! Today we’re announcing the PaLM inference paper and releasing code for low-latency, high-throughput inference of 8B–540B models on TPU v4. Paper: Code: 1/5
Tweet media one
12
139
904
@jekbradbury
James Bradbury
7 years
Google Colab apparently now gives you one free K80 GPU for up to 12hrs at a time! Note that you have to go to "Runtime" --> "Change runtime type" to add a GPU.
@rctatman
Rachael Tatman @[email protected]
7 years
Ummmm, Colab now lets you use GPUs to accelerate your notebooks? In the cloud? For free? 😍🤓😍🤓😍🤓 Step-by-step how to:
8
283
721
4
207
590
@jekbradbury
James Bradbury
6 years
Facebook is announcing a slew of new ML-related software projects and code releases this morning at F8, including Glow (a neural network compiler), PyTorch Translate (a public version of their production NMT code) and the roadmap for PyTorch 1.0
1
130
323
@jekbradbury
James Bradbury
4 years
If you missed it, our JAX/Cloud TPU talk is now up! We announced a new way to access Cloud TPUs, allowing direct SSH access and custom code on the TPU hosts, and gave FOUR demos showing how this supercharges JAX! Video: Slides:
3
56
317
@jekbradbury
James Bradbury
6 years
Automated essay grading that takes into account factual accuracy and content coherence is several years away. In the meantime, NLP and AI researchers not paid by Pearson should push back against school systems that rely on this deeply flawed technology for standardized testing.
@NPR
NPR
6 years
Computers already drive cars and detect cancer, so they can certainly handle grading students' essays, developers say.
113
60
135
8
73
301
@jekbradbury
James Bradbury
2 years
Incredibly excited that Sundar launched the public preview of Cloud TPU v4 Pods at I/O today, with a flythrough video of a datacenter filled with them: ! This is really three separate announcements:
Tweet media one
4
49
305
@jekbradbury
James Bradbury
2 years
@karpathy configdict () and fiddle ()
5
21
298
@jekbradbury
James Bradbury
2 years
The Anthropic team is fantastic and I'm so excited to be working with them!
@AnthropicAI
Anthropic
2 years
We're excited to use Google Cloud to train our AI systems, including Claude!
21
105
935
2
6
297
@jekbradbury
James Bradbury
6 years
Something I’m pretty excited about :)
Tweet media one
Tweet media two
1
53
288
@jekbradbury
James Bradbury
4 years
JAX on Cloud TPUs is getting a big upgrade! Come to our NeurIPS demo Tue. Dec. 8 at 11AM PT/19 GMT to see it in action, plus catch a sneak peek of a new Flax-based library for language research on TPU pods. Link: ( is still open!)
Tweet media one
4
30
229
@jekbradbury
James Bradbury
7 years
Fun with @PyTorch and 😀
Tweet media one
7
60
212
@jekbradbury
James Bradbury
7 years
TensorFlow autograph, to be demoed in a couple hours, compiles Python code with control flow directly to a TensorFlow graph (CC @broolucks )
0
64
209
@jekbradbury
James Bradbury
6 years
As for me? I’m excited to join Google Brain later this month to work at the intersection of ML and programming languages. Among other things, I want to help make it easier to build structured NLP models (like those in @gneubig ’s group’s 9 fantastic EMNLP papers) at Google scale.
11
7
204
@jekbradbury
James Bradbury
3 years
DeepMind shares (some of) its distributed RL secrets! Many of the authors on this paper were early and passionate advocates of JAX, and the Podracer architectures described here have helped inform the design of our parallelism APIs and distributed programming model.
@_akhaliq
AK
3 years
Podracer architectures for scalable Reinforcement Learning pdf: abs: "we argue that TPUs are particularly well suited for training RL agents in a scalable, efficient and reproducible way"
Tweet media one
Tweet media two
0
10
68
1
27
179
@jekbradbury
James Bradbury
7 years
@seb_ruder That's not the latest adaptive learning rate method any more 😉, the latest adaptive learning rate method is AdaFactor, quietly added three weeks ago to the Tensor2Tensor repository along with a note reading "TODO(noam): write a paper."
3
41
178
@jekbradbury
James Bradbury
4 years
@FitzTheReporter @RecParkSF Someone’s outright taking bites out of it now
21
23
158
@jekbradbury
James Bradbury
3 years
The paper describing the XLA SPMD automatic partitioning infrastructure (what’s behind JAX model parallelism APIs like sharded_jit and pjit) is out:
1
29
168
@jekbradbury
James Bradbury
6 years
Tweet media one
2
35
158
@jekbradbury
James Bradbury
7 years
. @benjaminwittes : I want to say a brief word about Dan Richman
Tweet media one
8
71
155
@jekbradbury
James Bradbury
3 years
The new way to use Cloud TPUs, enabling direct SSH access and custom code, is now in public preview for TF, PyTorch, and JAX! Read some testimonials from alpha users like @CohereAI and @KenoFischer or check it out for yourself with
@jekbradbury
James Bradbury
4 years
If you missed it, our JAX/Cloud TPU talk is now up! We announced a new way to access Cloud TPUs, allowing direct SSH access and custom code on the TPU hosts, and gave FOUR demos showing how this supercharges JAX! Video: Slides:
3
56
317
0
33
142
@jekbradbury
James Bradbury
6 years
Facebook's fairseq MT engine is really, really fast... Like, 50% faster than @marian_nmt (which is itself way faster than Sockeye/OpenNMT/Tensor2Tensor/xnmt/Nematus/etc) at generating from the same Transformer model
2
36
136
@jekbradbury
James Bradbury
7 years
“Image Transformer” from @nikiparmar09 and the rest of the Transformer team extends self-attention to 2D and provides a substantial quality improvement over the state of the art for image generation and super-resolution
2
30
129
@jekbradbury
James Bradbury
2 years
I’m honestly astounded how much cluster babysitting work this (and BigScience) took on GPU systems. TPUs have their own problems, but in my experience they’re MUCH easier from a “how many ways can things go wrong” perspective.
@yoavgo
(((ل()(ل() 'yoav))))👾
2 years
(amazing how much gpus keep hanging and ssh keeps failing and losses keep exploding...)
4
3
84
11
6
136
@jekbradbury
James Bradbury
6 years
Yesterday was my last day at Salesforce Research. I’m incredibly proud of what the team has accomplished: we built a world-class deep learning research team from scratch, and helped make Salesforce Einstein the most powerful set of AI capabilities in enterprise software.
3
5
136
@jekbradbury
James Bradbury
3 years
@karpathy Also fun: how this has changed over time! Maybe something like: prehistory-500 BCE: ceramics 500 BCE-800 CE: metallurgy 800-1600: pipe organs 1600-1850: watches 1850-1950: engines 1950-1980: aircraft 1980-today: semiconductors
6
4
131
@jekbradbury
James Bradbury
4 years
Something like half the appendix of the DALL-E paper () describes work the authors had to do on GPUs that they wouldn't have had to do on TPUs: - scaling fp16 mixed precision - reducing gradient all-reduce comms w/ PowerSGD - manual optimizer sharding
4
12
133
@jekbradbury
James Bradbury
2 years
Did you know? Reading a paper signed by the author doubles your learning rate! Today we are launching to share our beloved arXiv of signed machine learning papers with the world All proceeds go to charity 💖
5
8
128
@jekbradbury
James Bradbury
6 years
This is one of the most off-base threads I’ve ever seen on this hellhole of a website. 100s of researchers at Brain, FAIR, and other industry ML labs are doing science (not engineering, and not grad student descent, though there’s lots of that) without regard to corporate goals.
0
11
127
@jekbradbury
James Bradbury
2 years
A bit about PaLM () infrastructure: - trained on 6144 TPU v4 chips across two pods, without pipelining - first use of the Pathways runtime at scale - achieves 46.2% end to end matmul FLOPs utilization (or 57.8% including rematerialization)
2
11
125
@jekbradbury
James Bradbury
7 years
. @jeffdean talk is standing room only here at the ML Systems Workshop with a livestream for overflow. Slides are up though!
4
36
117
@jekbradbury
James Bradbury
7 years
Martin Popel has been the most active non-Googler on the Tensor2Tensor repository for months, and has posted a series of very interesting experiments about training and convergence in issue comments. Very happy to see he's turned them into an arXiv paper!
0
24
114
@jekbradbury
James Bradbury
3 years
@ben_golub Because web programming at Google is unreasonably difficult due to 20 years of tech debt
4
0
107
@jekbradbury
James Bradbury
2 years
Anthropic arguably initiated the growing consensus around non-publication of “capabilities” research—they publish openly, but only on safety/interpretability. Don’t lump them with FAIR 😜
@RamaswmySridhar
sridhar
2 years
9/ Already, many other LLM players like Adept, Character, and Cohere have not published the details of their models. Just blog posts. FAIR and Anthropic might remain as the only large open research labs.
1
6
99
13
7
107
@jekbradbury
James Bradbury
7 years
And I’ll be speaking tomorrow at 2pm about Matchbox, my brand new package for automatic batching in PyTorch. As @JeffDean says, manual batching “makes my head hurt”—never worry about padding and masking again!
@jekbradbury
James Bradbury
7 years
. @JeffDean at #SysML : having to work directly with batching in ML models "kind of makes my head hurt sometimes"
1
7
44
5
29
94
@jekbradbury
James Bradbury
2 years
Second, we’re publicizing TPU v4 specs for the first time! In addition to what’s in this table, TPU v4 also has one logical core with a full 32 GiB of HBM (vs. two w/16) and all slices with 64+ chips have wraparound on all three ICI axes, improving collective throughput vs. v3.
Tweet media one
5
10
92
@jekbradbury
James Bradbury
2 years
First, we’re bringing eight TPU v4 Pods to Google Cloud, in a single datacenter with 90% carbon-free energy. If this were used as a single supercomputer (along the lines of PaLM multi-pod training) we think it’d be the world’s fastest public ML system (9 exaflops peak bfloat16)!
Tweet media one
2
14
81
@jekbradbury
James Bradbury
2 years
And we’re publishing some Transformer language model benchmarks we’ve been working on, which show that JAX + GSPMD + XLA + TPU v4 can achieve exceptionally high FLOPs utilization with two different scaling patterns (“optimal” here means Chinchilla-like):
Tweet media one
2
10
79
@jekbradbury
James Bradbury
7 years
NVIDIA chief scientist Bill Dally at #SysML : fast memory is expensive for the same reason that Palo Alto real estate is expensive―there isn’t much space close to where the compute happens
3
19
75
@jekbradbury
James Bradbury
7 years
Baidu's Deep Voice 2 paper has some of the best model diagrams and hyperparameter tables I’ve seen (+ great results)
1
9
69
@jekbradbury
James Bradbury
6 years
Researchers at Columbia and DeepMind have independently shown that artificial NNs can learn representations qualitatively similar to grid cells in biological brains
1
25
69
@jekbradbury
James Bradbury
8 years
This is why a fashion company has 80 PhD researchers. It's like seeing (a particularly optimistic version of) our overall economic future
@lafersty
Lindsay Ferstandig
8 years
A tour of the magic behind Stitch Fix Algorithms: via @stitchfix_algo #datascience #algotour
0
21
38
0
26
69
@jekbradbury
James Bradbury
7 years
@soumithchintala @PreferredNetJP @ChainerOfficial Something I just realized the other day is that, since version 2 when CuPy split into a separate package, Chainer is now 100% pure Python other than the NumPy dependency and therefore runs unmodified on iOS/Pythonista!
Tweet media one
1
29
65
@jekbradbury
James Bradbury
9 years
Definitely the most surreal Twitter exchange of the evening: Wikileaks vs Anonymous
Tweet media one
3
63
64
@jekbradbury
James Bradbury
6 years
And it's out, with insanely comprehensive ablations and bonus cameo from Mitchell Stern
@jekbradbury
James Bradbury
7 years
@seb_ruder That's not the latest adaptive learning rate method any more 😉, the latest adaptive learning rate method is AdaFactor, quietly added three weeks ago to the Tensor2Tensor repository along with a note reading "TODO(noam): write a paper."
3
41
178
0
18
64
@jekbradbury
James Bradbury
7 years
Great results from Bryan McCann and the team! We also have a trained model, code to use it, and a Docker image here:
@RichardSocher
Richard Socher
7 years
Contextualized word vectors from translation help #nlproc Blog Paper
Tweet media one
3
105
211
0
27
61
@jekbradbury
James Bradbury
3 years
@TaliaRinger PyTorch chose Python after trying very hard, over several years, to make Lua work instead. IIRC issues included lack of native OOP, a 32-bit JIT, poor support for large codebases, and the Lua core team’s preference against evolving the language for industry ML needs.
2
1
60
@jekbradbury
James Bradbury
3 years
Researchers at MSR seem to have localized the engram (memory image) of certain pieces of world knowledge in a handful of neurons in a pretrained Transformer:
3
5
59
@jekbradbury
James Bradbury
8 years
@brandondamos Chainer, MinPy, DyNet, and Autograd all can, but PyTorch's autodifferentiation engine is several times faster
3
14
56
@jekbradbury
James Bradbury
2 years
And Hugging Face is currently serving BLOOM in JAX on Cloud TPU: (although it was trained in PyTorch on GPUs)
@BigscienceW
BigScience Research Workshop
2 years
BLOOM is here. The largest open-access multilingual language model ever. Read more about it or get it at
Tweet media one
29
812
3K
3
13
51
@jekbradbury
James Bradbury
2 years
Respect (??) for Salesforce for simultaneously launching a bajillion OpenAI integrations and investing in their competitors
@Benioff
Marc Benioff
2 years
We’re excited to celebrate our investment in @AnthropicAI , @CohereAI , @hearthai_co , and @YouSearchEngine as part of this announcement. Thank you for your partnership. 🙌
22
27
148
3
0
51
@jekbradbury
James Bradbury
7 years
All-star panel on future hardware for ML at the NIPS supercomputing workshop, with Simon Knowles ( @graphcoreai cofounder), @jeffdean , @scottgray76 ( @OpenAI ), Michael James ( @CerebrasSystems cofounder), and Greg Diamos ( @BaiduResearch )
Tweet media one
2
22
50
@jekbradbury
James Bradbury
8 years
Had no idea that Geoff Hinton chose Canada over the US because he didn't want to take DOD funding!
@kchonyc
Kyunghyun Cho
8 years
Tomorrow Geoff Hinton will share the burden with me in my lecture
1
13
30
0
18
47
@jekbradbury
James Bradbury
1 year
I love these unhinged LK99 rumors
@brutalmog
ID_law
1 year
@orthonormalist Yep apparently kwon from the 1st paper got the news that China was starting to make rudimentary lk99 and published the paper even if he hasn’t been part of the lab since march. Apparently the process has since changed and all we got is the old recipe in the paper.
2
0
13
1
0
46
@jekbradbury
James Bradbury
7 years
at @iclr2017 all week -- come talk to me about QRNNs, tree-structured models, machine translation, or jobs at Salesforce Research 👋
1
8
46
@jekbradbury
James Bradbury
6 years
Really interesting thread about the recent grid cell results
@gershbrain
Sam Gershman
6 years
@AdaptiveAgents Can someone explain to me how this is different from the well-known result that doing linear PCA on place cells gives you grid cells? You can get this without the LSTM component.
3
28
137
0
13
45
@jekbradbury
James Bradbury
7 years
. @thoma_gu +I got non-autoregressive neural MT to work ; we try to explain why that matters
1
14
43
@jekbradbury
James Bradbury
7 years
. @JeffDean at #SysML : having to work directly with batching in ML models "kind of makes my head hurt sometimes"
1
7
44
@jekbradbury
James Bradbury
6 years
This is also known as the dual number approach to forward-mode automatic differentiation 🙂
@goodfellow_ian
Ian Goodfellow
6 years
A math trick I like a lot is the approach to taking derivatives using hyperreal numbers. Thread:
15
168
711
3
8
44
@jekbradbury
James Bradbury
7 years
This is a FANTASTIC paper that's like nothing I've ever read: almost no experiments+very little math, just 8 pages of connections+intuition
@Miles_Brundage
Miles Brundage
7 years
"Adversarial Divergences are Good Task Losses for Generative Modeling," Huang et al.:
1
13
29
1
15
41
@jekbradbury
James Bradbury
6 years
The original TensorFlow control flow ops (the ones underlying `cond` and `while_loop`) are detailed in a fun new paper: But they were likely a mistake: if/for/while can be lowered to functional control flow instead, which is easier to implement+parallelize
2
7
41
@jekbradbury
James Bradbury
4 years
@MattHaneySF SF should: - allow vaccination inside drugstores - allow more people to give shots and pay them more - open 24hr public sites - protect good faith vaccinators - throw an ice cream party this summer for the supe district that vaxxes the fastest ...and dare the governor to stop us
1
3
39
@jekbradbury
James Bradbury
3 years
Strong endorse: “When investing in terms of scaling in terms of data, model parameters and compute, we should think of an additional axis which is _data diversity_.” (Narrow self-supervision datasets cause downstream task performance to saturate.)
0
4
39
@jekbradbury
James Bradbury
7 years
I totally missed that the principal example for TensorFlow Eager was ported from my PyTorch SPINN code! It's impressive how one-to-one the conversion is; the framework convergence is real :)
0
5
38
@jekbradbury
James Bradbury
7 years
Jiatao Gu (interning with our group this fall!) augments NMT with an IR system that can access the whole training corpus and gets +5 BLEU
@kchonyc
Kyunghyun Cho
7 years
Jiatao has done an awesome job at building a fully non-parametric neural...
2
12
52
2
9
37
@jekbradbury
James Bradbury
1 year
@tszzl dojo has dramatically more interconnect bandwidth, letting you scale smaller batch sizes on larger systems with simpler/lower-overhead parallelism. I suspect it’s more expensive per flop though (even vs. A100 and almost definitely vs. H100), but I’m assuming I don’t have to pay.
4
0
36
@jekbradbury
James Bradbury
6 years
All the talks @ylecun has ever given:
@ylecun
Yann LeCun
6 years
Number of talks I've given by year (excluding teaching). I'm trying to cut down and get some real work done now. 2018 17 <- as of June 21. Doing better. 2017 56 <- having no life. 2016 54 <- over 1 talk/week 2015...
5
11
171
0
11
35
@jekbradbury
James Bradbury
5 years
@jeremyphoward @NvidiaAI Behind the scenes (starting ~a year ago) NVIDIA has also set up a dedicated engineering team to work on XLA:GPU. (Something Jeremy might get a kick out of is that one of them is Frederic Bastien, the creator of Theano! )
1
3
35
@jekbradbury
James Bradbury
4 years
We're also hosting Q&A sessions at #NeurIPS2020 with the JAX core team! - Wednesday 9:30-10AM PST/17:30-18:00 GMT (Europe/Africa-friendly time) - Thursday 6-6:30PM PST/Friday 02:00-02:30 GMT (Asia/Australia-friendly time) Meet links at
@jekbradbury
James Bradbury
4 years
JAX on Cloud TPUs is getting a big upgrade! Come to our NeurIPS demo Tue. Dec. 8 at 11AM PT/19 GMT to see it in action, plus catch a sneak peek of a new Flax-based library for language research on TPU pods. Link: ( is still open!)
Tweet media one
4
30
229
0
3
36
@jekbradbury
James Bradbury
6 years
"most of MKL-DNN’s performance is lost during framework integration (Tensorflow in this case) for various reasons such as the lack of fusion, inefficient scratch memory allocation, or thread scheduling"
@Miles_Brundage
Miles Brundage
6 years
"Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures," Georganas et al., Intel: "This proves that CPUs can be a competitive alternative when training neural nets." 🧐
1
11
48
2
11
36
@jekbradbury
James Bradbury
6 years
I’m particularly thankful to @RichardSocher for taking a chance on me four years ago. The team is in good hands, and there’s much, much more to come.
1
1
35
@jekbradbury
James Bradbury
5 years
@eturner303 512 TPU chips is 128 TPU devices, or $61,440 for 2.5 days. The authors could also have meant 512 cores, which is 64 devices or $30,720.
2
2
35
@jekbradbury
James Bradbury
3 years
@ClementDelangue @MSFTResearch @GoogleAI @nvidia @OpenAI @BigscienceW the TPU approach (matches f32 training) is to perform all matmuls in bf16*bf16->f32, perform all vector math in f32, and truncate every value stored to HBM to bf16 EXCEPT: - optimizer state (incl. primary copy of params) - layernorm intermediates - attention logits - final logits
2
0
35
@jekbradbury
James Bradbury
2 years
Kinda weird to me to call this a “transition from Codex to GPT-3.5” when code-davinci-002 _is_ the big 3.5 base model…it feels more like a product safety decision (that I think I grudgingly support?) to not have a base model available
@mckaywrigley
Mckay Wrigley
2 years
OpenAI is discontinuing Codex. GPT-3.5 outperforms Codex, and GPT-4 blows it out of the water. I think the takeaway here is that eventually everything converges to one general purpose model.
Tweet media one
32
73
681
2
3
35
@jekbradbury
James Bradbury
11 months
I’m in Taipei this week! LMK if you are too.
5
0
34
@jekbradbury
James Bradbury
3 years
kinda sounds like copilot is trying to follow the license terms and people are ignoring it seriously though, between gpt-3 using copyrighted books and codex using gpl’ed code, openai is tempting fate, and it would be pretty amusing if it’s linus rather than the authors guild
@eevee
eevee 💨
3 years
github copilot has, by their own admission, been trained on mountains of gpl code, so i'm unclear on how it's not a form of laundering open source code into commercial works. the handwave of "it usually doesn't reproduce exact chunks" is not very satisfying
Tweet media one
132
2K
5K
1
5
34
@jekbradbury
James Bradbury
2 years
@EigenGender My preferred counterfactual here is “IBM gets into convnets in the 90s and builds an ASIC-based NN training supercomputer instead of Deep Blue”
1
0
34
@jekbradbury
James Bradbury
3 years
@andy_l_jones This feels (to me) like a review by someone who feels like they’ve been left behind by the pace/changes in the modern ML conference ecosystem (and a strong reject is not an great way to react to that!) I think you’d have better luck at NeurIPS—IMO your paper meets their bar.
1
0
34
@jekbradbury
James Bradbury
2 years
Third, we use multidimensional partitioning with overlapped collective communication and other low-level optimizations, many of which we believe are new in the literature. Learn more in our paper or consider adapting our code to your own models! 4/5
Tweet media one
2
0
33
@jekbradbury
James Bradbury
7 years
It's pretty disappointing that Douglas Hofstadter—of all people!—is almost completely incurious about what deep learning is and what our goals and methods as MT researchers actually are.
@andersen
Ross Andersen
7 years
This morning—thanks to contributing editor extraordinaire @jsomers —we have a piece by *Douglas Hofstadter* in @TheAtlantic
0
14
56
3
8
33
@jekbradbury
James Bradbury
6 years
Slides from my talk last week at Uber Science Day: Skip about halfway through if you’re more interested in “future” than “past”... Thanks @savvyRL for the invitation!
1
6
33
@jekbradbury
James Bradbury
7 years
The requisite @PyTorch screenshot 😉
Tweet media one
2
8
32
@jekbradbury
James Bradbury
2 years
2
0
32
@jekbradbury
James Bradbury
5 years
Unfortunately Sally Lieber didn’t stand up for SB50 at the climate forum today—only Shelly Masur did. SB50 isn’t a radical bill; it’s table stakes. If you won’t support this first step in the place in the state that needs it most, you’re not a YIMBY.
@cafedujord
Jordan Grimes🚰
5 years
@penforeveryone @PaloAltoYimby @yimbyaction @cayimby Fmr Assembly Speaker Pro Tem Sally Lieber: "I consider myself a YIMBY. I think we all should be." Redwood City Councilmember and all-around badass Shelly Masur: IDs as a YIMBY, saying "The need for housing is critical to addressing climate change and displacement." 😍😍😍
1
4
41
1
4
31
@jekbradbury
James Bradbury
8 years
@brandondamos Yep, the PyTorch autograd codebase started with a fork from Chainer -- but then rewrote it in highly optimized pure C
2
16
31
@jekbradbury
James Bradbury
7 years
"Attention Solves Your [Traveling Salesman Problem]" from W. W. M. Kool and @wellingmax at UvA takes @IrwanBello 's work on neural combinatorial optimization and swaps the RNN for a Transformer—very cool results!
Tweet media one
0
5
31
@jekbradbury
James Bradbury
7 years
this is something basically every musician has been waiting for for decades
@fjord41
Curtis Hawthorne
7 years
New blog post about the project I've been working on for a while. Automatic piano music transcription (raw audio to MIDI) that works really well!
12
142
451
1
9
28
@jekbradbury
James Bradbury
1 year
> submit idea > get money > now you have to actually do the idea!
@kipperrii
kipply
1 year
25k fund so that sf can have whimsy, anything goes
Tweet media one
6
12
117
1
1
29
@jekbradbury
James Bradbury
7 years
This is the best article I've seen about how AI research in industry actually works, and the business case for openness and participation in the community
0
11
26
@jekbradbury
James Bradbury
6 years
A fun read from @KenoFischer about the last-minute push to get Celeste.jl scaling up on the Cori supercomputer for the 2017 Gordon Bell deadline
0
5
29
@jekbradbury
James Bradbury
6 years
This is a pretty incredible story: former Google eng "alleges that [Pinscreen] submitted false results to SIGGRAPH" and that he was fired and "Pinscreen employees, under [CEO] Li’s commands...physically attacked him" after he pointed out the fraud
2
13
29
@jekbradbury
James Bradbury
6 years
As @jjding writes in his latest newsletter, "let's dispel once and for all this fiction that there are no discussions of AI ethics happening in China"
@icracnet
ICRAC
6 years
Essay translation 🇨🇳➡️🇺🇸 h/t @jjding99 : Zhao Tingyang: "Near-term Worries" and "Long-term Concerns" of the Artificial Intelligence "Revolution": An Analysis of Ethics and Ontology.
2
6
13
1
7
29
@jekbradbury
James Bradbury
3 years
@giffmana @elonmusk @askerlee I think it’s pretty straightforwardly true at the hardware level: both Cerebras and Dojo, and to a lesser extent Graphcore, have very high bandwidth (relative to flops) for their parameter/activation memory, and more flexible matmul structure.
1
2
27
@jekbradbury
James Bradbury
2 years
@dmdohan
David Dohan
2 years
Happy to release our work on Language Model Cascades. Read on to learn how we can unify existing methods for interacting models (scratchpad/chain of thought, verifiers, tool-use, …) in the language of probabilistic programming. paper:
Tweet media one
3
101
678
1
0
28