Tamay Besiroglu @tamaybes Twitter profile

Last Seen Profiles

@lost2heaven666

@opanchuken

@Shaaylap

@RuelSotiriou

@DollGangBxtch

@JamesRu89888445

@N7VSsttU79Coh9M

@DJ_JVMUSIC

@renatozer0

@javerianaeem012

@Braziliscoming

@connordlowe

@AshClashYT

@Kalooga

@Clocky333

@yt_tn0317

@SaroopCKol

@Jeffsliper

@IHuffake

@rahilazam25

@P_Szymanskii

@bayan_ran

@Misti2020

@weldRe3J

@lunor141

@Ngoan6868

@HandroVerm

@phoebeepierce

@LorenzoTomasin

@podruzhka21

@GBRailfreight

@ra_mil80736249

@Josegrangel

@RebusFarm

@tongtoktong

@miranou95

Tamay Besiroglu

@tamaybes

5 months

The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)

17

138

918

Tamay Besiroglu

@tamaybes

1 year

We assess if AI will accelerate economic growth by as much as growth accelerated during the industrial revolution, digging into growth theory, bottlenecks, feasibility of regulation, AI reliability/alignment, etc. Takeaway: acceleration looks plausible

22

108

475

Tamay Besiroglu

@tamaybes

1 year

Disappointed to see that GPT-4 fails the Monty Fall problem.

49

33

456

Tamay Besiroglu

@tamaybes

2 years

Recent applications of deep learning in science and engineering, such as AlphaFold and Copilot, have been astonishing. What does standard economic growth theory say about the economic effects of its adoption in R&D? We sketch a simple picture:

Economic impacts of AI-augmented R&D

Since its emergence around 2010, deep learning has rapidly become the most important technique in Artificial Intelligence (AI), producing an array of scientific firsts in areas as diverse as...

arxiv.org

4

68

416

Tamay Besiroglu

@tamaybes

4 years

A recent paper about innovation over the long run reveals a very neat snapshot of the composition of inventions over time. Using data on US patents, it identifies the following key waves:

5

135

412

Tamay Besiroglu

@tamaybes

4 years

A few months ago, I wrote an economics dissertation on whether machine learning models are getting harder to find. Here’s a summary of what I found:

4

69

303

Tamay Besiroglu

@tamaybes

6 months

Language models have come a long way since 2012, when recurrent networks struggled to form coherent sentences. Our new paper finds that the compute needed to achieve a set performance level has been halving every 5 to 14 months on average. (1/10)

8

55

297

Tamay Besiroglu

@tamaybes

2 months

This is misleading. The 1950 Census actually lists many occupations that have since been automated, including adding-machine operators, computers, switchboard operators, addressograph operators, lamplighters, and many more.

Ethan Mollick

@emollick

1 year

Just one of the 270 jobs in the 1950 census has been eliminated by automation... elevator operator. Other jobs that were expected to be automated by tech, like bank tellers by ATMs, just shifted the nature of the job. Hopefully, AI follows this pattern.

16

72

340

5

14

209

Tamay Besiroglu

@tamaybes

1 month

Submitted this to NeurIPS. I thought it would be suitable because it points out a flaw in a NeurIPS best-paper award. They didn't like it because they point out we should have just asked the authors for the data. Alas. If only we thought of that.

Tamay Besiroglu

@tamaybes

5 months

The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)

17

138

918

6

13

200

Tamay Besiroglu

@tamaybes

3 months

On an internal math problem dataset, Claude 3.5 performs better than Claude 3 Opus but substantially worse than GPT-4o.

10

9

187

Tamay Besiroglu

@tamaybes

1 year

The reason this is a fantastic test of reasoning abilities is that the model needs to override the inclination to pattern-match it onto the very closely related Monty hall problem which its undoudedtly seen in the training set many times over.

11

1

167

Tamay Besiroglu

@tamaybes

2 years

How much progress in machine learning has been due to advances in algorithms (architectures, optimisers, activation functions, etc.), and how much as been due to the scaling of compute or datasets? @EgeErdil2 and I provide new answers:

Algorithmic progress in computer vision

We investigate algorithmic progress in image classification on ImageNet, perhaps the most well-known test bed for computer vision. We estimate a model, informed by work on neural scaling laws, and...

arxiv.org

8

35

162

Tamay Besiroglu

@tamaybes

2 years

I recently organized a contest for @Metaculus on investigations into predictions of the future of AI. This resulted in two-dozen insightful analyses by forecasters into the prospects of transformatively advanced AI systems. Here are my short summaries of some that stood out:

6

30

134

Tamay Besiroglu

@tamaybes

1 year

Can we use scaling laws to estimate what is required to reach 'human level' on some arbitrary task? Our (speculative) framework suggests yes. We show that scaling laws provide insight into the *horizons* over which outputs are indistinguishable from human-generated outputs.

4

17

132

Tamay Besiroglu

@tamaybes

5 months

Here is a short preprint that describes our findings in more detail: (9/9) Worked on this togther with @EgeErdil2 , @MatthewJBar , and @justjoshinyou13 .

Chinchilla Scaling: A replication attempt

Hoffmann et al. (2022) propose three methods for estimating a compute-optimal scaling law. We attempt to replicate their third estimation procedure, which involves fitting a parametric loss...

arxiv.org

4

6

126

Tamay Besiroglu

@tamaybes

5 months

We reconstructed the data by extracting the SVG from the paper, parsing out the point locations & colors, mapping the coordinates to model size & FLOP, and mapping the colors to loss values. This let us closely approximate their original dataset from just the figure. (2/9)

4

2

120

Tamay Besiroglu

@tamaybes

4 months

A few weeks ago, we attempted to replicate the Chinchilla paper. We found that their estimated model fails to adequately fit the reconstructed data, that it implies inconsistent scaling policies, and that their confidence intervals are implausibly narrow.

Tamay Besiroglu

@tamaybes

5 months

The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)

17

138

918

2

11

113

Tamay Besiroglu

@tamaybes

5 months

@TheSaddlePoint @drjwrae We asked for it several times.

0

1

100

Tamay Besiroglu

@tamaybes

6 months

A recent paper asseses whether AI could cause explosive growth and suggests no. It's good to have other economists seriously engage with the arguments that suggest that AI that substitutes for humans could accelerate growth, right?

6

8

98

Tamay Besiroglu

@tamaybes

2 years

There seems to be evidence that members of the EA community are overindexing on recent advances in ML and forming unreasonable expectations of transformative AI this decade. @MatthewJBar and I counterbalance this by offering a $1,000 bet to the contrary.

A concrete bet offer to those with short AGI timelines — LessWrong

[Update 4 (12/23/2023): Tamay has now conceded.] • [Update 3 (3/16/2023): Matthew has now conceded.] …

www.lesswrong.com

7

10

88

Tamay Besiroglu

@tamaybes

5 months

You can reproduce all our work: Extracted data: Code to reproduce results: Code to extract data from SVG:

approach_3_svg.ipynb

Colab notebook

colab.research.google.com

6

2

89

Tamay Besiroglu

@tamaybes

4 years

I found that the marginal returns of researchers are rapidly declining. There is what’s called a “standing on toes” effect: researcher productivity declines as the field grows. Because ML has recently grown very quickly, this makes better ML models much harder to find.

7

26

84

Tamay Besiroglu

@tamaybes

5 months

When we fit their parametric scaling law, we get strikingly different estimates (Chi-squared p-value <1e-60!). The differences are significant for the data-scaling coefficient β and the irreducible loss E. (3/9)

2

82

Tamay Besiroglu

@tamaybes

5 months

We have asked the authors for assistance, but we haven’t been able to get a response. (8/9)

2

0

82

Tamay Besiroglu

@tamaybes

4 months

How much does a doubling R&D effort increase innovation in software? Our new paper proposes new empirical techniques, applies these, and finds evidence of increasing returns to scale: doubling software R&D could more than double the rate of innovation.

Epoch AI

@EpochAIResearch

4 months

Could increasing returns to software R&D lead to explosive tech progress? Our new paper surveys estimation methods and finds evidence of increasing returns to scale in software R&D.

3

13

52

1

9

77

Tamay Besiroglu

@tamaybes

2 months

@justjoshinyou13 Seems like motivated reasoning.

6

0

77

Tamay Besiroglu

@tamaybes

2 years

@pmddomingos Thanks for sharing our work! Our interpretation is that there was something of a phase-transition in the early 2010s coinciding with the advent of Deep Learning, rather than there having been superexpontial growth in compute. See our paper:

2

6

74

Tamay Besiroglu

@tamaybes

3 years

Guys, my Tweet where I draw a line on a graph that seperates models into 'not conscious' vs. 'maybe slightly conscious' was tongue-in-cheek. I wish I had discovered the key to the question of conscioussness, but I haven't—sorry to disappoint.

Futurism

@futurism

3 years

This debate just keeps getting spicier.

4

7

23

4

5

76

Tamay Besiroglu

@tamaybes

3 years

Seeing so many prominent ML folks ridiculing this idea is disappointing. It makes me less hopeful in the field’s ability to seriously take on some of the profound, weird and important questions that they’ll undoubtedly be faced with over the next few decades.

Ilya Sutskever

@ilyasut

3 years

it may be that today's large neural networks are slightly conscious

452

556

3K

7

6

72

Tamay Besiroglu

@tamaybes

4 years

Surprised to learn that Soros largely single-handedly bailed out Russian science following USSR collapse. The funds significantly induced scientists to remain in the science sector, and had long-lasting impacts on Russian scientific output.

Saving Soviet Science: The Impact of Grants When Government R&D Funding Disappears

(April 2017) - I estimate the impact of a historic grant program, funded by George Soros, that provided grants to over 28,000 Soviet scientists shortly after the end of the USSR. Exploiting a...

www.aeaweb.org

4

15

70

Tamay Besiroglu

@tamaybes

3 years

I just realised that I’m the most prolific forecast operationalizer on @Metaculus , having written 272 over the last 4 years. I find spelling out forecasts forces you to be more precise and emperically-grounded about your beliefs. Here’s a guide I wrote …

6

4

69

Tamay Besiroglu

@tamaybes

9 months

How much compute did Google use to train Gemini Ultra? The paper unfortunately doesn’t say, and there’s few hints in the technical report. A speculative thing to do is to extrapolate how much is needed to match Gemini on benchmarks. Doing so yields this picture.

3

6

67

Tamay Besiroglu

@tamaybes

5 months

Hoffmann et al. also report extremely narrow confidence intervals for some key parameters. We calculate that you’d need about 600,000 data points to nail it down that precisely. By contrast, they likely had ~400. (5/9)

1

2

68

Tamay Besiroglu

@tamaybes

1 year

A failure I notice among economists is conflating mental models for AI. They claim to but seem not to fully consider the implications of "good AI" actually able to flexibly substitute labor "My lips say 'human-level', but my heart says 'a fancier version of GPT" h/t @steve47285

2

7

66

Tamay Besiroglu

@tamaybes

2 years

@jasonhickel This is so misleading. The paper compares states with similar levels of economic development. Socialist policies tend to depress econ development, so a comparison that controls for econ development will miss much (most?) of the welfare diff btwn socialist and non-socialist states

3

1

65

Tamay Besiroglu

@tamaybes

5 months

Moreover, Hoffmann et al.'s estimates imply a scaling policy inconsistent with their other results and the token-to-parameter ratio used for Chinchilla. Our estimates align better with these and have more reasonable uncertainty. (6/9)

1

0

62

Tamay Besiroglu

@tamaybes

5 months

Hoffmann et al.'s estimated scaling law fits the reconstructed data very poorly compared to ours. Their residuals are not centered at 0 at all! Our model achieves a lower loss on 98% of data points. Clearly, their model does not fit the data. (4/9)

1

2

62

Tamay Besiroglu

@tamaybes

6 months

The survey of thousands of AI experts shows that it is believed that the falling cost of computation was most important for AI progress in the past ten years. Increased funding and progress in AI algorithms were about on par. This seems mistaken.

5

11

61

Tamay Besiroglu

@tamaybes

2 years

I’m very excited to announce Epoch. We’re working on investigating trends in Machine Learning and understanding the transition to a world with advanced AI.

Jaime Sevilla

@Jsevillamol

2 years

We are excited to announce our new research organization: Epoch! We are working on investigating AI developments and forecasting the development of Transformative AI. You can learn more in our announcement: Summary below 🧵⬇️

7

19

116

2

6

54

Tamay Besiroglu

@tamaybes

1 month

@NeelNanda5 We heard back after we posted to arxiv and tweeted about it. However, we never got the data.

2

3

55

Tamay Besiroglu

@tamaybes

5 months

Hoffmann et al.’s paper has been highly influential in the language modeling community. Our analysis highlights some potential issues that warrant clarification. (7/9)

1

55

Tamay Besiroglu

@tamaybes

1 year

@max_paperclips It still fails when I append the prompt with "Let's think this through step-by-step and explain your reasoning"

2

0

53

Tamay Besiroglu

@tamaybes

1 month

It's curious how Llama 405b's performance drops by 5 percentage points when using standard simple-evals prompts instead of its native Llama 3.1 prompts. Other models show much less sensitivity to this prompt change and fall nicely along the 45-degree line.

Epoch AI

@EpochAIResearch

1 month

3/8 Evaluation settings make a difference in GPQA performance. We replicated Meta's results using the same settings they used (T=0, Llama 3.1 prompt), with average accuracy at 51.3%. But with default settings for the API we used (T=0.7, simple-evals prompts), it drops to 48.5%.

1

0

11

5

7

55

Tamay Besiroglu

@tamaybes

2 years

If deep learning becomes good enough to be broadly adopted in the R&D sector, its adoption could induce an accumulation of relevant capital that could nearly double the productivity growth rate in the U.S.

1

5

51

Tamay Besiroglu

@tamaybes

5 months

I'm thrilled to see that our work has apparently unified the Chinchilla scaling laws. It's great to hear that they're making the data open source!

Sebastian Borgeaud

@borgeaud_s

5 months

Great analysis, approach 3 is finally in agreement! The loss scale was too low in our paper, resulting in premature termination of L-BFGS, and leading to bad fits. After fixing this we can reproduce your findings! We're also open sourcing the data in the paper, stay tuned :)

8

37

245

0

7

53

Tamay Besiroglu

@tamaybes

9 months

Pleased about this work. We wanted to know how much compute is possible with current tech, and derived some bounds. Result: using the world's current energy consumption and maximally efficient GPUs yields 1e35 FP16 ± 0.7 OOMs, about 10B-fold more than GPT-4.

Anson Ho

@ansonwhho

9 months

What are the limits to the energy efficiency of CMOS microprocessors? In our new paper, published in the International Conference on Rebooting Computing, we propose a simple model to shed light on this question:

2

9

51

2

5

49

Tamay Besiroglu

@tamaybes

4 years

References. @ChadJonesEcon , @johnvanreenen 's awesome paper: My dissertation can be accessed here:

Are Ideas Getting Harder to Find?

(April 2020) - Long-run growth in many models is the product of two terms: the effective number of researchers and their research productivity. We present evidence from various industries, products,...

www.aeaweb.org

10

1

48

Tamay Besiroglu

@tamaybes

3 years

@ilyasut Our recent paper on compute trends in ML has some insights:

Lennart Heim

@ohlennart

3 years

**ML training compute has been doubling every 6 months since 2010!** Our preprint "Compute Trends Across Three Eras of Machine Learning" is out. 🧵 Thread below ↓ 1/

25

239

857

1

22

50

Tamay Besiroglu

@tamaybes

6 months

Unfortunately, that's not what this is. The authors rule out the possibility of AI broadly substituting for humans, asserting it's "science fiction" and dismiss the arguments that are premised on this.

3

2

49

Tamay Besiroglu

@tamaybes

11 months

Keeping tabs on what's happening in AI (who is scaling how fast, how much data or what architectures) is critical. Our database makes doing that much easier. It tracks info on key ML models, both historical and SOTA (GPT-4, Claude 2, PaLM 2, etc.)

Announcing Epoch AI’s Updated Parameter, Compute and Data Trends Database

Our expanded database, which tracks the parameters, datasets, training compute, and other details of notable machine learning systems, now spans over 700 notable machine learning models.

epochai.org

2

8

48

Tamay Besiroglu

@tamaybes

2 years

We find that every 9 months, the introduction of better algorithms contribute the equivalent of a doubling of compute budgets. This is much faster than the gains from Moore’s law! That said, there's uncertainty (our 95% CI spans 4 to 25 months).

1

8

45

Tamay Besiroglu

@tamaybes

5 months

If it's possible to continue to trade off inference and training compute, we should expect similar amounts of compute to be spent on training large models and on running inference with them.

Optimally Allocating Compute Between Inference and Training

Our analysis indicates that AI labs should spend comparable resources on training and running inference, assuming they can flexibly balance compute between these tasks to maintain model performance.

epochai.org

2

4

45

Tamay Besiroglu

@tamaybes

4 years

1990s onward: the Information Age continues: Koss patents the Excel Function, Bezos patents 1-click buying, Page creates Pagerank. ~80% of top patents now Electronics/IT related. Innovation has hardly ever before been this concentrated in so few sectors.

4

8

46

Tamay Besiroglu

@tamaybes

4 months

Pleased to have been able to contribute to the first Intl Scientific Report on Advanced AI Safety. I think it's a comprehensive & balanced look at progress, risks & challenges, and a step towards a shared understanding of the trajectory of advanced AI.

International Scientific Report on the Safety of Advanced AI

An up-to-date, evidence-based report on the science of advanced Artificial Intelligence (AI) safety.

www.gov.uk

0

8

46

Tamay Besiroglu

@tamaybes

2 years

You might have expected that with large ML models being not publicly accessible and very costly to train, it would become unclear whether key impressive results would replicate. However, the reproducibility situation for these models has arguably so far been surprisingly good.

2

1

43

Tamay Besiroglu

@tamaybes

6 months

Many AI experts clearly don't think this is science fiction, and AI labs are spending hundreds of billions to make it happen. Why do economists defer so little to AI experts about the topic of what AI can or can't do?

2

3

42

Tamay Besiroglu

@tamaybes

6 months

While algorithmic progress has been rapid, our Shapley value analysis suggests that 60-95% of the performance improvements stem from increased computing power and training data, while novel algorithms account for only 5-40% of the progress. (4/10)

1

8

42

Tamay Besiroglu

@tamaybes

4 months

The authors responded, clarifying that this was the result of their optimizer stopping early due to a bad loss scale choice. They plan to update their results and release the data. We appreciate @borgeaud_s and others' openness in addressing this issue.

Sebastian Borgeaud

@borgeaud_s

5 months

Great analysis, approach 3 is finally in agreement! The loss scale was too low in our paper, resulting in premature termination of L-BFGS, and leading to bad fits. After fixing this we can reproduce your findings! We're also open sourcing the data in the paper, stay tuned :)

8

37

245

1

4

40

Tamay Besiroglu

@tamaybes

3 months

The article presents a well-articulated case that by extrapolating current AI trends—rapidly increasing compute, consistent algorithmic efficiency gains, and techniques that unlock latent capabilities—we may develop "drop-in remote workers" by 2027.

Leopold Aschenbrenner

@leopoldasch

3 months

Full series as PDF: Read online:

37

148

806

2

5

41

Tamay Besiroglu

@tamaybes

8 months

I was quoted today in a Time article on the AI progress survey and in another on AI and growth. Some journalists write well and faithfully represent the views of those they speak with, which is great.

When Might AI Outsmart Us? It Depends Who You Ask

Predicting when artificial intelligence could outsmart humans is a complicated task, and experts disagree on the answer.

time.com

0

2

37

Tamay Besiroglu

@tamaybes

1 year

AI research is advancing rapidly, but compute usage, a key metric, is often overlooked due to a lack of established practices. This omission hinders model comparisons, reproducibility and governance. Discover why this matters and find our proposals here:

2

7

37

Tamay Besiroglu

@tamaybes

8 months

Neat exposition of our speculative proposal for estimating the compute required for a scaled-up GPT to tasks like science. Yuxi does a better job explaining some of the key ideas than we do.

6

3

36

Tamay Besiroglu

@tamaybes

3 months

In early 2022 we wrote a paper finding a 4x/year rate of increase in the scale of training runs. Updated data, now 3x larger, shows this still holds. If the trend continues, we can expect further performance improvements surpassing current capabilities in the near future.

Epoch AI

@EpochAIResearch

3 months

1/ How quickly are state-of-the-art AI models growing? The amount of compute used in AI training is a critical driver of progress in AI. Our analysis of over 300 machine learning systems reveals that the amount of compute used in training is consistently being scaled up at

23

306

2K

4

8

37

Tamay Besiroglu

@tamaybes

6 months

@romanyam Want to bet in a way that I pay you today and you pay me some multiple in a few years?

1

35

Tamay Besiroglu

@tamaybes

2 years

I deleted this tweet on ML compute spending, because I’m no longer confident that AlphaGo Zero was in fact the most expensive ML experiment to date. I still think broader observation I point to is true, but I prefer to make claims only when I’m confident about factually accuracy.

1

0

34

Tamay Besiroglu

@tamaybes

3 years

7.5 years of GAN progress on face generation.

3

4

35

Tamay Besiroglu

@tamaybes

2 years

This was a thoughtful take on three reasons to be skeptical about catastrophic risk from AI: selection effects about who engages with the arguments, community epistemic problems, and issues with chains of reasoning involving imperfect concepts.

Nuño Sempere

@NunoSempere

2 years

Just posted this effort post: "My highly personal skepticism braindump on existential risk from artificial intelligence". Might be of interest to people here.

8

12

103

2

4

33

Tamay Besiroglu

@tamaybes

4 years

On the other hand, I find that progress now makes progress in the future easier. This is called a “standing-on-the-shoulders” effect (innovations today are bootstrapped by previous progress).

1

31

Tamay Besiroglu

@tamaybes

3 months

Data constraints will make scaling less efficient at around 1e29 FLOP, around 4 OOMs larger than GPT-4. This leaves a lot of room for continued scaling. However, combining massive scaling with intense overtraining might soon become a challenge.

Epoch AI

@EpochAIResearch

3 months

Are we running out of data to train language models? State-of-the-art LLMs use datasets with tens of trillions of words, and use 2-3x more per year. Our new ICML paper estimates when we might exhaust all text data on the internet. 1/12

24

127

627

2

3

32

Tamay Besiroglu

@tamaybes

9 months

Dwarkesh provides a thoughtful analysis of why scaling LLM-like systems may or may not succeed. He concludes it's 70% likely that scaling + algorithmic progress + hardware advances over the next 20-ish years will suffice. Seems reasonable to me.

Dwarkesh Patel

@dwarkesh_sp

9 months

New post: Will scaling work? This is the crux in arguments about AI timelines. In order to think through my own position, I wrote the post as a debate between a skeptic and a believer. Skeptic point 1: Data bottlenecks won't be clearer by self-play/synthetic data:

33

104

793

2

4

31

Tamay Besiroglu

@tamaybes

3 months

I discussed the scale of future AI training runs, scaling laws, the data wall, AI automation, and more on the @aipolicyus podcast. Listen here:

#8: Tamay Besiroglu on the Trends Driving Past and Future AI Progress

Tamay Besiroglu, Associate Director of Epoch AI, joined the podcast to provide a comprehensive overview of the factors shaping AI progress, from algorithmic adv

podcasts.apple.com

2

4

31

Tamay Besiroglu

@tamaybes

2 years

21 NeurIPS papers mentioned scaling laws in 2021, more than double the number in all previous proceedings. Yet, 21 papers represent ~1% of 2021-papers, so in an absolute sense scaling laws receive fairly little attention from many top ML reseachers.

2

3

29

Tamay Besiroglu

@tamaybes

18 days

How feasible is it to continue scaling up AI training at its current pace? Our analysis of power, chips, data, and latency constraints suggests it is through this decade. By 2030, models could likely exceed GPT-4 in scale to the same degree that GPT-4 exceeds GPT-2 in scale.

Epoch AI

@EpochAIResearch

18 days

1/ Can AI scaling continue through 2030? We examine whether constraints on power, chip manufacturing, training data, or data center latencies might hinder AI growth. Our analysis suggests that AI scaling can likely continue its current trend through 2030.

38

167

685

0

1

31

Tamay Besiroglu

@tamaybes

1 year

I’m very excited to see this issue on AI; many essays that I can’t wait to dig into. I also have a small contribution in the form of a debate with @mattsclancy on explosive economic growth from advanced AI (I argue that it’ll likely happen, Matt argues it likely won’t).

Asterisk

@asteriskmgzn

1 year

Announcing Issue 03: AI AI is all anyone is talking about. Our writers have been thinking about it for years.

3

21

82

1

4

31

Tamay Besiroglu

@tamaybes

10 months

Google seems to have experimented with a 50,000+ TPU training run. "To give a sense of scale, this cluster of Cloud TPU v5e chips has more AI accelerators than the TOP1 Supercomputer Frontier at Oak Ridge National Laboratory, which featured 37,888 AMD M1250X GPUs"

Tamay Besiroglu

@tamaybes

1 year

Dylan is predicting a 100k GPU cluster next year or the year after, which would enable a ~$1bn training run. Seems plausible to me.

1

23

1

8

29

Tamay Besiroglu

@tamaybes

1 year

@ylecun This also applies to things humans write, e.g., mathematical proofs with many steps, but those aren't generally 'doomed'.

3

1

29

Tamay Besiroglu

@tamaybes

6 months

This rate of algorithmic progress is much faster than the two-year doubling time of Moore's Law for hardware improvements, and faster than other domains of software, like SAT-solvers, linear programs, etc. (2/10)

1

2

28

Tamay Besiroglu

@tamaybes

3 years

@robertwiblin I agree with your conclusion. However, the correct comparison is not between AZ and no vaccine, but rather between AZ the next-best vaccine administered at some delay. I expect your conclusion still to be true, but it’s worth framing things carefully.

2

0

28

Tamay Besiroglu

@tamaybes

9 months

. @OpenAI could you please make sure to train on all publicly available LaTeX resources for an excessive amount of epochs for GPT-5? GPT-4 isn't very good at LaTeX. Think of the few basis points boost in scientific productivity this would deliver.

1

0

26

Tamay Besiroglu

@tamaybes

6 months

Paper link: FWIW, it seems like a solid paper if you're for some reason interested in the effects of a type of AI that is forever incapable of automating R&D.

2

0

27

Tamay Besiroglu

@tamaybes

5 months

Cool to see our replication of Chinchilla amongst the top ML papers of the week in what was a packed week for AI.

DAIR.AI

@dair_ai

5 months

The Top ML Papers of the Week (April 15 - April 21): - Llama 3 - Mixtral 8x22B - A Survey on RAG - How Faithful are RAG Models? - Emerging AI Agent Architectures - Chinchilla Scaling: A replication attempt ...

5

86

552

2

1

27

Tamay Besiroglu

@tamaybes

2 years

Top forecaster Steven0461 describes plausible scenarios of 2050 that leave us without transformative AI. They suggest that if there is to be an obstacle, plateauing hardware improvements is the most likely culprit

3

2

27

Tamay Besiroglu

@tamaybes

3 years

Progress in the tools for engineering mirror-image molecules by dedicated mirror-image biology labs, such as presented in this recent article, might soon enable the creation of mirror cells. This worries me.

Bioorthogonal information storage in l-DNA with a high-fidelity mirror-image Pfu DNA polymerase

Nature Biotechnology - A high-fidelity mirror-image polymerase that accurately synthesizes mirror-image DNA protects stored information from biodegradation.

www.nature.com

1

3

27

Tamay Besiroglu

@tamaybes

4 years

In my dissertation, I explored to how this story holds up for machine learning. I used a dataset on the top performing ML models on 93 machine learning benchmarks—mostly related to computer vision and NLP—and data on research input derived from data on publications.

3

2

25

Tamay Besiroglu

@tamaybes

3 years

Really excited to be joining @ProfNeilT 's lab at MIT to work in the intersection of Economics and Computer Science focusing on AI and Computing.

3

1

26

Tamay Besiroglu

@tamaybes

6 months

Paper link: Blog post: Huge thanks to @ansonwhho for his crucial role and to our collaborators @EgeErdil2 , @everysum , @robi_rahman , @CarlGuo866 , @diatkinson , @ProfNeilT & @Jsevillamol .

Algorithmic Progress in Language Models

Progress in language model performance surpasses what we’d expect from merely increasing computing resources, occurring at a pace equivalent to doubling computational power every 5 to 14 months.

epochai.org

0

2

25

Tamay Besiroglu

@tamaybes

4 years

Some background. @ChadJonesEcon , @johnvanreenen and others wrote an awesome article that found that ideas are getting harder to find: in semiconductors, agricultural production and medicine, research productivity has been declining steadily.

2

1

26

Tamay Besiroglu

@tamaybes

2 years

Standard endogenous growth theory predicts that capital-intensive R&D produces faster growth. More productive use of capital (K) → increased investment → accumulation of more K → increased productivity and output → increased investment → accumulation of more K, etc.

2

1

25

Tamay Besiroglu

@tamaybes

4 years

A “standing-on-the-shoulders” effect in ML is on the whole not that surprising: it seems that finding one approach to solving one task can often be repurposed to solve other, related tasks (e.g. transformers, attention, etc.)

1

0

24

Tamay Besiroglu

@tamaybes

1 year

Dylan is predicting a 100k GPU cluster next year or the year after, which would enable a ~$1bn training run. Seems plausible to me.

Dylan Patel

@dylan522p

1 year

@tamaybes And faster GPUs/TPUs too. Google, Meta, Microsoft/OpenAI, Baidu, Tencent, Alibaba, ByteDance all have the capability (ignoring China GPU bans) 3 of those will do it.

1

0

9

1

23

Tamay Besiroglu

@tamaybes

2 years

Our estimates imply that AI-augmented R&D would involve capital investments much larger than all R&D sectors in the US (only ~6% of spending of NSF-funded STEM labs is dedicated to things we might consider physical capital).

1

2

24

Tamay Besiroglu

@tamaybes

2 years

An economy whose R&D is augmented by ML models relies more on physical capital (compute) and less on human scientists. Capital, unlike labor, is a well-behaved economic good: it can accumulate exponentially in line with economic growth. This has important implications.

1

0

24

Tamay Besiroglu

@tamaybes

2 years

@MikePFrank This article argues the kink is the (delayed) effect of the ending of Dennard scaling; the largest HPC centres compensated for a while by increasing spending and parallelism, and that this ended around 2013. Does that seem plausible/correct to you?

Dennard Scaling Demise Puts Permanent Dent in Supercomputing

When the TOP500 list of the world's most powerful supercomputers comes out twice a year, the top-ranked machines receive the lion’s share of attention.

www.nextplatform.com

3

1

25

Tamay Besiroglu

@tamaybes

1 year

What is the consensus on how reliable the academic exams results for GPT-4 are? The GPT-4 paper contamination study looks decent, but the model often just seems to fail fairly basic 2023 highschool math problems.

4

1

23

Tamay Besiroglu

@tamaybes

3 years

Metaculus has built up what is likely the most comprehensive repository of carefully spelled out forecasts about AI and its impacts. I’m excited to organize this contest to explore how it could help ground discussions about the future of AI in terms of quantifiable predictions.

Metaculus

@metaculus

3 years

The AI Progress Essay Contest is open! Engage with the wealth of AI forecasts on Metaculus to construct an accurate picture of the timeline and impact of transformative AI. $6,500 and the Dreyfus Prize will go to the most insightful pieces:

1

4

20

0

3

25

Tamay Besiroglu

@tamaybes

6 months

We estimate the transformer architecture provided roughly 10x "compute-equivalent gain", though estimates vary by model specification. Chinchilla scaling laws provided around 2-4x gains over Kaplan depending on the scale. (6/10)

1

3

23

Tamay Besiroglu

@tamaybes

4 years

It turns out that the “standing-on-toes” effect dominates. I estimate that overall research productivity declined by between 4% to 26% (depending on which sub-field and which model).

2

1

23

Tamay Besiroglu

@tamaybes

3 months

Cool work: Predicting downstream performance based on compute could help us anticipate the capabilities of future models, but predictability has remained elusive. @RylanSchaeffer , @haileysch__ et al. explore why and & suggest the possibility of "scaling-predictable evaluations".