James Bradbury @jekbradbury Twitter profile

Last Seen Profiles

@Gerlich_Girl

@stwmaniax

@M0thra_

@HS_komatsu

@mon1229776

@vlodye_

@frncsnzdz

@anonimo28665

@ferbyano

@american_g5986

@_littlemisstina

@Chris_laem

@chilli_leo

@RaihanHusen261

@GervaisTrivia

@AgungRH212957

@KiNgRoNaLd007

@12Musikku79695

@fabiofdez_96

@g14ntpkmn

@VmaniakJ

@MohamedOum10133

@ovy3331

@NduNdu1385750

@VmaniakJ

@LexSteele11

@Krypt0Nupe

@ProBullzEye

@_littlemisstina

@Dirk

@GervaisTrivia

@Cuckoldmothers

@anonimo28665

@GurjarOf_India

@veighfode

@HS_komatsu

James Bradbury

@jekbradbury

6 years

A huge fan of putting this in papers (from the BigGAN paper )

13

267

1K

James Bradbury

@jekbradbury

4 years

In 2016, when I was working on machine translation, it took me more than a week on a multi-GPU machine to train a competitive system on WMT English-German. Today, JAX on a TPU v3 supercomputer can train a better model on the same data in 16 seconds!

10

148

907

James Bradbury

@jekbradbury

2 years

Can multi-100B param language models be served efficiently? We think so! Today we’re announcing the PaLM inference paper and releasing code for low-latency, high-throughput inference of 8B–540B models on TPU v4. Paper: Code: 1/5

12

139

904

James Bradbury

@jekbradbury

7 years

Google Colab apparently now gives you one free K80 GPU for up to 12hrs at a time! Note that you have to go to "Runtime" --> "Change runtime type" to add a GPU.

Rachael Tatman @[email protected]

@rctatman

7 years

Ummmm, Colab now lets you use GPUs to accelerate your notebooks? In the cloud? For free? 😍🤓😍🤓😍🤓 Step-by-step how to:

8

283

721

4

207

590

James Bradbury

@jekbradbury

6 years

Facebook is announcing a slew of new ML-related software projects and code releases this morning at F8, including Glow (a neural network compiler), PyTorch Translate (a public version of their production NMT code) and the roadmap for PyTorch 1.0

Announcing PyTorch 1.0 for both research and production

The path for taking AI development from research to production has historically involved multiple steps and tools, making it time-intensive and complicated to test new approaches, deploy them, and …

engineering.fb.com

1

130

323

James Bradbury

@jekbradbury

4 years

If you missed it, our JAX/Cloud TPU talk is now up! We announced a new way to access Cloud TPUs, allowing direct SSH access and custom code on the TPU hosts, and gave FOUR demos showing how this supercharges JAX! Video: Slides:

3

56

317

James Bradbury

@jekbradbury

6 years

Automated essay grading that takes into account factual accuracy and content coherence is several years away. In the meantime, NLP and AI researchers not paid by Pearson should push back against school systems that rely on this deeply flawed technology for standardized testing.

NPR

@NPR

6 years

Computers already drive cars and detect cancer, so they can certainly handle grading students' essays, developers say.

113

60

135

8

73

301

James Bradbury

@jekbradbury

2 years

Incredibly excited that Sundar launched the public preview of Cloud TPU v4 Pods at I/O today, with a flythrough video of a datacenter filled with them: ! This is really three separate announcements:

4

49

305

James Bradbury

@jekbradbury

2 years

@karpathy configdict () and fiddle ()

5

21

298

James Bradbury

@jekbradbury

2 years

The Anthropic team is fantastic and I'm so excited to be working with them!

Anthropic

@AnthropicAI

2 years

We're excited to use Google Cloud to train our AI systems, including Claude!

21

105

935

2

6

297

James Bradbury

@jekbradbury

6 years

Something I’m pretty excited about :)

1

53

288

James Bradbury

@jekbradbury

2 years

statements a language model considers true can be identified with an unsupervised probe!

Discovering Latent Knowledge in Language Models Without Supervision

Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to...

arxiv.org

4

30

264

James Bradbury

@jekbradbury

4 years

JAX on Cloud TPUs is getting a big upgrade! Come to our NeurIPS demo Tue. Dec. 8 at 11AM PT/19 GMT to see it in action, plus catch a sneak peek of a new Flax-based library for language research on TPU pods. Link: ( is still open!)

4

30

229

James Bradbury

@jekbradbury

7 years

Fun with @PyTorch and 😀

7

60

212

James Bradbury

@jekbradbury

7 years

TensorFlow autograph, to be demoed in a couple hours, compiles Python code with control flow directly to a TensorFlow graph (CC @broolucks )

0

64

209

James Bradbury

@jekbradbury

6 years

As for me? I’m excited to join Google Brain later this month to work at the intersection of ML and programming languages. Among other things, I want to help make it easier to build structured NLP models (like those in @gneubig ’s group’s 9 fantastic EMNLP papers) at Google scale.

11

7

204

James Bradbury

@jekbradbury

3 years

DeepMind shares (some of) its distributed RL secrets! Many of the authors on this paper were early and passionate advocates of JAX, and the Podracer architectures described here have helped inform the design of our parallelism APIs and distributed programming model.

AK

@_akhaliq

3 years

Podracer architectures for scalable Reinforcement Learning pdf: abs: "we argue that TPUs are particularly well suited for training RL agents in a scalable, efficient and reproducible way"

0

10

68

1

27

179

James Bradbury

@jekbradbury

7 years

@seb_ruder That's not the latest adaptive learning rate method any more 😉, the latest adaptive learning rate method is AdaFactor, quietly added three weeks ago to the Tensor2Tensor repository along with a note reading "TODO(noam): write a paper."

3

41

178

James Bradbury

@jekbradbury

4 years

@FitzTheReporter @RecParkSF Someone’s outright taking bites out of it now

21

23

158

James Bradbury

@jekbradbury

3 years

The paper describing the XLA SPMD automatic partitioning infrastructure (what’s behind JAX model parallelism APIs like sharded_jit and pjit) is out:

1

29

168

James Bradbury

@jekbradbury

6 years

2

35

158

James Bradbury

@jekbradbury

7 years

. @benjaminwittes : I want to say a brief word about Dan Richman

8

71

155

James Bradbury

@jekbradbury

3 years

The new way to use Cloud TPUs, enabling direct SSH access and custom code, is now in public preview for TF, PyTorch, and JAX! Read some testimonials from alpha users like @CohereAI and @KenoFischer or check it out for yourself with

Run a calculation on a Cloud TPU VM using JAX | Google Cloud

Learn how to run a calculation on a Cloud TPU VM by using JAX and the Google Cloud CLI.

cloud.google.com

James Bradbury

@jekbradbury

4 years

If you missed it, our JAX/Cloud TPU talk is now up! We announced a new way to access Cloud TPUs, allowing direct SSH access and custom code on the TPU hosts, and gave FOUR demos showing how this supercharges JAX! Video: Slides:

3

56

317

0

33

142

James Bradbury

@jekbradbury

6 years

Facebook's fairseq MT engine is really, really fast... Like, 50% faster than @marian_nmt (which is itself way faster than Sockeye/OpenNMT/Tensor2Tensor/xnmt/Nematus/etc) at generating from the same Transformer model

Michael Auli

We are releasing new features for fairseq, FAIR's sequence to sequence learning library: https://github.com/pytorch/fairseq # Distributed training, fp16, delayed batching We release code and...

www.facebook.com

2

36

136

James Bradbury

@jekbradbury

7 years

“Image Transformer” from @nikiparmar09 and the rest of the Transformer team extends self-attention to 2D and provides a substantial quality improvement over the state of the art for image generation and super-resolution

Image Transformer

Image generation has been successfully cast as an autoregressive sequence generation or transformation problem. Recent work has shown that self-attention is an effective way of modeling textual...

arxiv.org

2

30

129

James Bradbury

@jekbradbury

2 years

I’m honestly astounded how much cluster babysitting work this (and BigScience) took on GPU systems. TPUs have their own problems, but in my experience they’re MUCH easier from a “how many ways can things go wrong” perspective.

(((ل()(ل() 'yoav))))👾

@yoavgo

2 years

(amazing how much gpus keep hanging and ssh keeps failing and losses keep exploding...)

4

3

84

11

6

136

James Bradbury

@jekbradbury

6 years

Yesterday was my last day at Salesforce Research. I’m incredibly proud of what the team has accomplished: we built a world-class deep learning research team from scratch, and helped make Salesforce Einstein the most powerful set of AI capabilities in enterprise software.

3

5

136

James Bradbury

@jekbradbury

3 years

@karpathy Also fun: how this has changed over time! Maybe something like: prehistory-500 BCE: ceramics 500 BCE-800 CE: metallurgy 800-1600: pipe organs 1600-1850: watches 1850-1950: engines 1950-1980: aircraft 1980-today: semiconductors

6

4

131

James Bradbury

@jekbradbury

4 years

Something like half the appendix of the DALL-E paper () describes work the authors had to do on GPUs that they wouldn't have had to do on TPUs: - scaling fp16 mixed precision - reducing gradient all-reduce comms w/ PowerSGD - manual optimizer sharding

4

12

133

James Bradbury

@jekbradbury

2 years

Did you know? Reading a paper signed by the author doubles your learning rate! Today we are launching to share our beloved arXiv of signed machine learning papers with the world All proceeds go to charity 💖

5

8

128

James Bradbury

@jekbradbury

6 years

This is one of the most off-base threads I’ve ever seen on this hellhole of a website. 100s of researchers at Brain, FAIR, and other industry ML labs are doing science (not engineering, and not grad student descent, though there’s lots of that) without regard to corporate goals.

0

11

127

James Bradbury

@jekbradbury

2 years

A bit about PaLM () infrastructure: - trained on 6144 TPU v4 chips across two pods, without pipelining - first use of the Pathways runtime at scale - achieves 46.2% end to end matmul FLOPs utilization (or 57.8% including rematerialization)

2

11

125

James Bradbury

@jekbradbury

7 years

. @jeffdean talk is standing room only here at the ML Systems Workshop with a livestream for overflow. Slides are up though!

4

36

117

James Bradbury

@jekbradbury

7 years

Martin Popel has been the most active non-Googler on the Tensor2Tensor repository for months, and has posted a series of very interesting experiments about training and convergence in issue comments. Very happy to see he's turned them into an arXiv paper!

Training Tips for the Transformer Model

This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model (Vaswani et al., 2017). We examine some...

arxiv.org

0

24

114

James Bradbury

@jekbradbury

3 years

@ben_golub Because web programming at Google is unreasonably difficult due to 20 years of tech debt

4

0

107

James Bradbury

@jekbradbury

2 years

Anthropic arguably initiated the growing consensus around non-publication of “capabilities” research—they publish openly, but only on safety/interpretability. Don’t lump them with FAIR 😜

sridhar

@RamaswmySridhar

2 years

9/ Already, many other LLM players like Adept, Character, and Cohere have not published the details of their models. Just blog posts. FAIR and Anthropic might remain as the only large open research labs.

1

6

99

13

7

107

James Bradbury

@jekbradbury

1 year

Inside a datacenter with TPU v5e, which we launched today at Next

Inside a Google Cloud TPU Data Center

Get an inside look at the magic of Google Cloud TPUs, including a rare inside view of the data centers where it all happens. Customers use Cloud TPUs to run ...

www.youtube.com

0

12

105

James Bradbury

@jekbradbury

8 years

A good list, and my colleagues are #1 :) | “Ten Deserving Deep Learning Papers that were Rejected at ICLR 2017”

Ten Deserving Deep Learning Papers that were Rejected at ICLR 2017

I first wrote about the deluge of papers that were submitted to ICLR 2017. The paper I described “Rethinking Generalization”. Very curious…

medium.com

2

44

100

James Bradbury

@jekbradbury

7 years

And I’ll be speaking tomorrow at 2pm about Matchbox, my brand new package for automatic batching in PyTorch. As @JeffDean says, manual batching “makes my head hurt”—never worry about padding and masking again!

James Bradbury

@jekbradbury

7 years

. @JeffDean at #SysML : having to work directly with batching in ML models "kind of makes my head hurt sometimes"

1

7

44

5

29

94

James Bradbury

@jekbradbury

2 years

Second, we’re publicizing TPU v4 specs for the first time! In addition to what’s in this table, TPU v4 also has one logical core with a full 32 GiB of HBM (vs. two w/16) and all slices with 64+ chips have wraparound on all three ICI axes, improving collective throughput vs. v3.

5

10

92

James Bradbury

@jekbradbury

2 years

First, we’re bringing eight TPU v4 Pods to Google Cloud, in a single datacenter with 90% carbon-free energy. If this were used as a single supercomputer (along the lines of PaLM multi-pod training) we think it’d be the world’s fastest public ML system (9 exaflops peak bfloat16)!

2

14

81

James Bradbury

@jekbradbury

2 years

And we’re publishing some Transformer language model benchmarks we’ve been working on, which show that JAX + GSPMD + XLA + TPU v4 can achieve exceptionally high FLOPs utilization with two different scaling patterns (“optimal” here means Chinchilla-like):

2

10

79

James Bradbury

@jekbradbury

7 years

NVIDIA chief scientist Bill Dally at #SysML : fast memory is expensive for the same reason that Palo Alto real estate is expensive―there isn’t much space close to where the compute happens

3

19

75

James Bradbury

@jekbradbury

7 years

Baidu's Deep Voice 2 paper has some of the best model diagrams and hyperparameter tables I’ve seen (+ great results)

1

9

69

James Bradbury

@jekbradbury

6 years

Researchers at Columbia and DeepMind have independently shown that artificial NNs can learn representations qualitatively similar to grid cells in biological brains

1

25

69

James Bradbury

@jekbradbury

8 years

This is why a fashion company has 80 PhD researchers. It's like seeing (a particularly optimistic version of) our overall economic future

Lindsay Ferstandig

@lafersty

8 years

A tour of the magic behind Stitch Fix Algorithms: via @stitchfix_algo #datascience #algotour

0

21

38

0

26

69

James Bradbury

@jekbradbury

7 years

@soumithchintala @PreferredNetJP @ChainerOfficial Something I just realized the other day is that, since version 2 when CuPy split into a separate package, Chainer is now 100% pure Python other than the NumPy dependency and therefore runs unmodified on iOS/Pythonista!

1

29

65

James Bradbury

@jekbradbury

9 years

Definitely the most surreal Twitter exchange of the evening: Wikileaks vs Anonymous

3

63

64

James Bradbury

@jekbradbury

6 years

And it's out, with insanely comprehensive ablations and bonus cameo from Mitchell Stern

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

In several recently proposed stochastic optimization methods (e.g. RMSProp, Adam, Adadelta), parameter updates are scaled by the inverse square roots of exponential moving averages of squared past...

arxiv.org

James Bradbury

@jekbradbury

7 years

@seb_ruder That's not the latest adaptive learning rate method any more 😉, the latest adaptive learning rate method is AdaFactor, quietly added three weeks ago to the Tensor2Tensor repository along with a note reading "TODO(noam): write a paper."

3

41

178

0

18

64

James Bradbury

@jekbradbury

7 years

Great results from Bryan McCann and the team! We also have a trained model, code to use it, and a Docker image here:

GitHub - salesforce/cove

Contribute to salesforce/cove development by creating an account on GitHub.

github.com

Richard Socher

@RichardSocher

7 years

Contextualized word vectors from translation help #nlproc Blog Paper

3

105

211

0

27

61

James Bradbury

@jekbradbury

3 years

@TaliaRinger PyTorch chose Python after trying very hard, over several years, to make Lua work instead. IIRC issues included lack of native OOP, a 32-bit JIT, poor support for large codebases, and the Lua core team’s preference against evolving the language for industry ML needs.

2

1

60

James Bradbury

@jekbradbury

3 years

Researchers at MSR seem to have localized the engram (memory image) of certain pieces of world knowledge in a handful of neurons in a pretrained Transformer:

Knowledge Neurons in Pretrained Transformers

Large-scale pretrained language models are surprisingly good at recalling factual knowledge presented in the training corpus. In this paper, we present preliminary studies on how factual knowledge...

arxiv.org

3

5

59

James Bradbury

@jekbradbury

8 years

@brandondamos Chainer, MinPy, DyNet, and Autograd all can, but PyTorch's autodifferentiation engine is several times faster

3

14

56

James Bradbury

@jekbradbury

2 years

And Hugging Face is currently serving BLOOM in JAX on Cloud TPU: (although it was trained in PyTorch on GPUs)

bigscience/bloom · Hugging Face

huggingface.co

BigScience Research Workshop

@BigscienceW

2 years

BLOOM is here. The largest open-access multilingual language model ever. Read more about it or get it at

29

812

3K

3

13

51

James Bradbury

@jekbradbury

2 years

Respect (??) for Salesforce for simultaneously launching a bajillion OpenAI integrations and investing in their competitors

Marc Benioff

@Benioff

2 years

We’re excited to celebrate our investment in @AnthropicAI , @CohereAI , @hearthai_co , and @YouSearchEngine as part of this announcement. Thank you for your partnership. 🙌

22

27

148

3

0

51

James Bradbury

@jekbradbury

7 years

All-star panel on future hardware for ML at the NIPS supercomputing workshop, with Simon Knowles ( @graphcoreai cofounder), @jeffdean , @scottgray76 ( @OpenAI ), Michael James ( @CerebrasSystems cofounder), and Greg Diamos ( @BaiduResearch )

2

22

50

James Bradbury

@jekbradbury

8 years

Had no idea that Geoff Hinton chose Canada over the US because he didn't want to take DOD funding!

Kyunghyun Cho

@kchonyc

8 years

Tomorrow Geoff Hinton will share the burden with me in my lecture

1

13

30

0

18

47

James Bradbury

@jekbradbury

1 year

I love these unhinged LK99 rumors

ID_law

@brutalmog

1 year

@orthonormalist Yep apparently kwon from the 1st paper got the news that China was starting to make rudimentary lk99 and published the paper even if he hasn’t been part of the lab since march. Apparently the process has since changed and all we got is the old recipe in the paper.

2

0

13

1

0

46

James Bradbury

@jekbradbury

7 years

at @iclr2017 all week -- come talk to me about QRNNs, tree-structured models, machine translation, or jobs at Salesforce Research 👋

1

8

46

James Bradbury

@jekbradbury

6 years

Really interesting thread about the recent grid cell results

Sam Gershman

@gershbrain

6 years

@AdaptiveAgents Can someone explain to me how this is different from the well-known result that doing linear PCA on place cells gives you grid cells? You can get this without the LSTM component.

3

28

137

0

13

45

James Bradbury

@jekbradbury

7 years

. @thoma_gu +I got non-autoregressive neural MT to work ; we try to explain why that matters

1

14

43

James Bradbury

@jekbradbury

7 years

. @JeffDean at #SysML : having to work directly with batching in ML models "kind of makes my head hurt sometimes"

1

7

44

James Bradbury

@jekbradbury

7 years

This is super cool and major 👏 to FAIR and Jason for open-sourcing it -- wraps @PyTorch and 20+ NLP/dialog datasets

GitHub - facebookresearch/ParlAI: A framework for training and evaluating AI models on a variety of...

A framework for training and evaluating AI models on a variety of openly available dialogue datasets. - facebookresearch/ParlAI

github.com

0

25

43

James Bradbury

@jekbradbury

6 years

This is also known as the dual number approach to forward-mode automatic differentiation 🙂

Ian Goodfellow

@goodfellow_ian

6 years

A math trick I like a lot is the approach to taking derivatives using hyperreal numbers. Thread:

15

168

711

3

8

44

James Bradbury

@jekbradbury

7 years

This is a FANTASTIC paper that's like nothing I've ever read: almost no experiments+very little math, just 8 pages of connections+intuition

Miles Brundage

@Miles_Brundage

7 years

"Adversarial Divergences are Good Task Losses for Generative Modeling," Huang et al.:

1

13

29

1

15

41

James Bradbury

@jekbradbury

6 years

The original TensorFlow control flow ops (the ones underlying `cond` and `while_loop`) are detailed in a fun new paper: But they were likely a mistake: if/for/while can be lowered to functional control flow instead, which is easier to implement+parallelize

Dynamic Control Flow in Large-Scale Machine Learning

Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning...

arxiv.org

2

7

41

James Bradbury

@jekbradbury

4 years

@MattHaneySF SF should: - allow vaccination inside drugstores - allow more people to give shots and pay them more - open 24hr public sites - protect good faith vaccinators - throw an ice cream party this summer for the supe district that vaxxes the fastest ...and dare the governor to stop us

1

3

39

James Bradbury

@jekbradbury

3 years

Strong endorse: “When investing in terms of scaling in terms of data, model parameters and compute, we should think of an additional axis which is _data diversity_.” (Narrow self-supervision datasets cause downstream task performance to saturate.)

0

4

39

James Bradbury

@jekbradbury

7 years

I totally missed that the principal example for TensorFlow Eager was ported from my PyTorch SPINN code! It's impressive how one-to-one the conversion is; the framework convergence is real :)

0

5

38

James Bradbury

@jekbradbury

7 years

Jiatao Gu (interning with our group this fall!) augments NMT with an IR system that can access the whole training corpus and gets +5 BLEU

Kyunghyun Cho

@kchonyc

7 years

Jiatao has done an awesome job at building a fully non-parametric neural...

2

12

52

2

9

37

James Bradbury

@jekbradbury

1 year

@tszzl dojo has dramatically more interconnect bandwidth, letting you scale smaller batch sizes on larger systems with simpler/lower-overhead parallelism. I suspect it’s more expensive per flop though (even vs. A100 and almost definitely vs. H100), but I’m assuming I don’t have to pay.

4

0

36

James Bradbury

@jekbradbury

6 years

All the talks @ylecun has ever given:

Yann LeCun

@ylecun

6 years

Number of talks I've given by year (excluding teaching). I'm trying to cut down and get some real work done now. 2018 17 <- as of June 21. Doing better. 2017 56 <- having no life. 2016 54 <- over 1 talk/week 2015...

5

11

171

0

11

35

James Bradbury

@jekbradbury

5 years

@jeremyphoward @NvidiaAI Behind the scenes (starting ~a year ago) NVIDIA has also set up a dedicated engineering team to work on XLA:GPU. (Something Jeremy might get a kick out of is that one of them is Frederic Bastien, the creator of Theano! )

1

3

35

James Bradbury

@jekbradbury

4 years

We're also hosting Q&A sessions at #NeurIPS2020 with the JAX core team! - Wednesday 9:30-10AM PST/17:30-18:00 GMT (Europe/Africa-friendly time) - Thursday 6-6:30PM PST/Friday 02:00-02:30 GMT (Asia/Australia-friendly time) Meet links at

James Bradbury

@jekbradbury

4 years

JAX on Cloud TPUs is getting a big upgrade! Come to our NeurIPS demo Tue. Dec. 8 at 11AM PT/19 GMT to see it in action, plus catch a sneak peek of a new Flax-based library for language research on TPU pods. Link: ( is still open!)

4

30

229

0

3

36

James Bradbury

@jekbradbury

6 years

"most of MKL-DNN’s performance is lost during framework integration (Tensorflow in this case) for various reasons such as the lack of fusion, inefficient scratch memory allocation, or thread scheduling"

Miles Brundage

@Miles_Brundage

6 years

"Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures," Georganas et al., Intel: "This proves that CPUs can be a competitive alternative when training neural nets." 🧐

1

11

48

2

11

36

James Bradbury

@jekbradbury

6 years

I’m particularly thankful to @RichardSocher for taking a chance on me four years ago. The team is in good hands, and there’s much, much more to come.

1

35

James Bradbury

@jekbradbury

5 years

@eturner303 512 TPU chips is 128 TPU devices, or $61,440 for 2.5 days. The authors could also have meant 512 cores, which is 64 devices or $30,720.

2

35

James Bradbury

@jekbradbury

3 years

@ClementDelangue @MSFTResearch @GoogleAI @nvidia @OpenAI @BigscienceW the TPU approach (matches f32 training) is to perform all matmuls in bf16*bf16->f32, perform all vector math in f32, and truncate every value stored to HBM to bf16 EXCEPT: - optimizer state (incl. primary copy of params) - layernorm intermediates - attention logits - final logits

2

0

35

James Bradbury

@jekbradbury

2 years

Kinda weird to me to call this a “transition from Codex to GPT-3.5” when code-davinci-002 _is_ the big 3.5 base model…it feels more like a product safety decision (that I think I grudgingly support?) to not have a base model available

Mckay Wrigley

@mckaywrigley

2 years

OpenAI is discontinuing Codex. GPT-3.5 outperforms Codex, and GPT-4 blows it out of the water. I think the takeaway here is that eventually everything converges to one general purpose model.

32

73

681

2

3

35

James Bradbury

@jekbradbury

11 months

I’m in Taipei this week! LMK if you are too.

5

0

34

James Bradbury

@jekbradbury

3 years

kinda sounds like copilot is trying to follow the license terms and people are ignoring it seriously though, between gpt-3 using copyrighted books and codex using gpl’ed code, openai is tempting fate, and it would be pretty amusing if it’s linus rather than the authors guild

eevee 💨

@eevee

3 years

github copilot has, by their own admission, been trained on mountains of gpl code, so i'm unclear on how it's not a form of laundering open source code into commercial works. the handwave of "it usually doesn't reproduce exact chunks" is not very satisfying

132

2K

5K

1

5

34

James Bradbury

@jekbradbury

2 years

@EigenGender My preferred counterfactual here is “IBM gets into convnets in the 90s and builds an ASIC-based NN training supercomputer instead of Deep Blue”

1

0

34

James Bradbury

@jekbradbury

3 years

@andy_l_jones This feels (to me) like a review by someone who feels like they’ve been left behind by the pace/changes in the modern ML conference ecosystem (and a strong reject is not an great way to react to that!) I think you’d have better luck at NeurIPS—IMO your paper meets their bar.

1

0

34

James Bradbury

@jekbradbury

2 years

Third, we use multidimensional partitioning with overlapped collective communication and other low-level optimizations, many of which we believe are new in the literature. Learn more in our paper or consider adapting our code to your own models! 4/5

2

0

33

James Bradbury

@jekbradbury

7 years

It's pretty disappointing that Douglas Hofstadter—of all people!—is almost completely incurious about what deep learning is and what our goals and methods as MT researchers actually are.

Ross Andersen

@andersen

7 years

This morning—thanks to contributing editor extraordinaire @jsomers —we have a piece by *Douglas Hofstadter* in @TheAtlantic

0

14

56

3

8

33

James Bradbury

@jekbradbury

6 years

Slides from my talk last week at Uber Science Day: Skip about halfway through if you’re more interested in “future” than “past”... Thanks @savvyRL for the invitation!

1

6

33

James Bradbury

@jekbradbury

7 years

The requisite @PyTorch screenshot 😉

2

8

32

James Bradbury

@jekbradbury

2 years

@thesephist @karinanguyen_ omg

2

0

32

James Bradbury

@jekbradbury

5 years

Unfortunately Sally Lieber didn’t stand up for SB50 at the climate forum today—only Shelly Masur did. SB50 isn’t a radical bill; it’s table stakes. If you won’t support this first step in the place in the state that needs it most, you’re not a YIMBY.

Jordan Grimes🚰

@cafedujord

5 years

@penforeveryone @PaloAltoYimby @yimbyaction @cayimby Fmr Assembly Speaker Pro Tem Sally Lieber: "I consider myself a YIMBY. I think we all should be." Redwood City Councilmember and all-around badass Shelly Masur: IDs as a YIMBY, saying "The need for housing is critical to addressing climate change and displacement." 😍😍😍

1

4

41

1

4

31

James Bradbury

@jekbradbury

8 years

@brandondamos Yep, the PyTorch autograd codebase started with a fork from Chainer -- but then rewrote it in highly optimized pure C

2

16

31

James Bradbury

@jekbradbury

7 years

"Attention Solves Your [Traveling Salesman Problem]" from W. W. M. Kool and @wellingmax at UvA takes @IrwanBello 's work on neural combinatorial optimization and swaps the RNN for a Transformer—very cool results!

0

5

31

James Bradbury

@jekbradbury

7 years

@_rockt @kchonyc @JeffDean It’s coming today at

GitHub - salesforce/matchbox: Write PyTorch code at the level of individual examples, then run it...

Write PyTorch code at the level of individual examples, then run it efficiently on minibatches. - salesforce/matchbox

github.com

0

7

30

James Bradbury

@jekbradbury

7 years

this is something basically every musician has been waiting for for decades

Curtis Hawthorne

@fjord41

7 years

New blog post about the project I've been working on for a while. Automatic piano music transcription (raw audio to MIDI) that works really well!

12

142

451

1

9

28

James Bradbury

@jekbradbury

1 year

> submit idea > get money > now you have to actually do the idea!

kipply

@kipperrii

1 year

25k fund so that sf can have whimsy, anything goes

6

12

117

1

29

James Bradbury

@jekbradbury

7 years

This is the best article I've seen about how AI research in industry actually works, and the business case for openness and participation in the community

0

11

26

James Bradbury

@jekbradbury

6 years

A fun read from @KenoFischer about the last-minute push to get Celeste.jl scaling up on the Cori supercomputer for the 2017 Gordon Bell deadline

What was it like to run code on some of the world's largest supercomputers?

www.quora.com

0

5

29

James Bradbury

@jekbradbury

6 years

This is a pretty incredible story: former Google eng "alleges that [Pinscreen] submitted false results to SIGGRAPH" and that he was fired and "Pinscreen employees, under [CEO] Li’s commands...physically attacked him" after he pointed out the fraud

2

13

29

James Bradbury

@jekbradbury

6 years

As @jjding writes in his latest newsletter, "let's dispel once and for all this fiction that there are no discussions of AI ethics happening in China"

ICRAC

@icracnet

6 years

Essay translation 🇨🇳➡️🇺🇸 h/t @jjding99 : Zhao Tingyang: "Near-term Worries" and "Long-term Concerns" of the Artificial Intelligence "Revolution": An Analysis of Ethics and Ontology.

2

6

13

1

7

29

James Bradbury

@jekbradbury

3 years

@giffmana @elonmusk @askerlee I think it’s pretty straightforwardly true at the hardware level: both Cerebras and Dojo, and to a lesser extent Graphcore, have very high bandwidth (relative to flops) for their parameter/activation memory, and more flexible matmul structure.

1

2

27

James Bradbury

@jekbradbury

2 years

@ylecun

David Dohan

@dmdohan

2 years

Happy to release our work on Language Model Cascades. Read on to learn how we can unify existing methods for interacting models (scratchpad/chain of thought, verifiers, tool-use, …) in the language of probabilistic programming. paper:

3

101

678

1

0

28