Tamay Besiroglu Profile Banner
Tamay Besiroglu Profile
Tamay Besiroglu

@tamaybes

3,448
Followers
738
Following
138
Media
1,286
Statuses

Thinking about economics, computing and machine learning @EpochAIResearch . prev: @MIT_CSAIL , @Cambridge_Uni

Cambridge, MA
Joined May 2018
Don't wanna be here? Send us removal request.
@tamaybes
Tamay Besiroglu
5 months
The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)
Tweet media one
17
138
918
@tamaybes
Tamay Besiroglu
1 year
We assess if AI will accelerate economic growth by as much as growth accelerated during the industrial revolution, digging into growth theory, bottlenecks, feasibility of regulation, AI reliability/alignment, etc. Takeaway: acceleration looks plausible
Tweet media one
22
108
475
@tamaybes
Tamay Besiroglu
1 year
Disappointed to see that GPT-4 fails the Monty Fall problem.
Tweet media one
49
33
456
@tamaybes
Tamay Besiroglu
2 years
Recent applications of deep learning in science and engineering, such as AlphaFold and Copilot, have been astonishing. What does standard economic growth theory say about the economic effects of its adoption in R&D? We sketch a simple picture:
4
68
416
@tamaybes
Tamay Besiroglu
4 years
A recent paper about innovation over the long run reveals a very neat snapshot of the composition of inventions over time. Using data on US patents, it identifies the following key waves:
Tweet media one
5
135
412
@tamaybes
Tamay Besiroglu
4 years
A few months ago, I wrote an economics dissertation on whether machine learning models are getting harder to find. Here’s a summary of what I found:
4
69
303
@tamaybes
Tamay Besiroglu
6 months
Language models have come a long way since 2012, when recurrent networks struggled to form coherent sentences. Our new paper finds that the compute needed to achieve a set performance level has been halving every 5 to 14 months on average. (1/10)
Tweet media one
8
55
297
@tamaybes
Tamay Besiroglu
2 months
This is misleading. The 1950 Census actually lists many occupations that have since been automated, including adding-machine operators, computers, switchboard operators, addressograph operators, lamplighters, and many more.
Tweet media one
@emollick
Ethan Mollick
1 year
Just one of the 270 jobs in the 1950 census has been eliminated by automation... elevator operator. Other jobs that were expected to be automated by tech, like bank tellers by ATMs, just shifted the nature of the job. Hopefully, AI follows this pattern.
Tweet media one
16
72
340
5
14
209
@tamaybes
Tamay Besiroglu
1 month
Submitted this to NeurIPS. I thought it would be suitable because it points out a flaw in a NeurIPS best-paper award. They didn't like it because they point out we should have just asked the authors for the data. Alas. If only we thought of that.
@tamaybes
Tamay Besiroglu
5 months
The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)
Tweet media one
17
138
918
6
13
200
@tamaybes
Tamay Besiroglu
3 months
On an internal math problem dataset, Claude 3.5 performs better than Claude 3 Opus but substantially worse than GPT-4o.
Tweet media one
10
9
187
@tamaybes
Tamay Besiroglu
1 year
The reason this is a fantastic test of reasoning abilities is that the model needs to override the inclination to pattern-match it onto the very closely related Monty hall problem which its undoudedtly seen in the training set many times over.
11
1
167
@tamaybes
Tamay Besiroglu
2 years
How much progress in machine learning has been due to advances in algorithms (architectures, optimisers, activation functions, etc.), and how much as been due to the scaling of compute or datasets? @EgeErdil2 and I provide new answers:
8
35
162
@tamaybes
Tamay Besiroglu
2 years
I recently organized a contest for @Metaculus on investigations into predictions of the future of AI. This resulted in two-dozen insightful analyses by forecasters into the prospects of transformatively advanced AI systems. Here are my short summaries of some that stood out:
6
30
134
@tamaybes
Tamay Besiroglu
1 year
Can we use scaling laws to estimate what is required to reach 'human level' on some arbitrary task? Our (speculative) framework suggests yes. We show that scaling laws provide insight into the *horizons* over which outputs are indistinguishable from human-generated outputs.
Tweet media one
4
17
132
@tamaybes
Tamay Besiroglu
5 months
We reconstructed the data by extracting the SVG from the paper, parsing out the point locations & colors, mapping the coordinates to model size & FLOP, and mapping the colors to loss values. This let us closely approximate their original dataset from just the figure. (2/9)
Tweet media one
4
2
120
@tamaybes
Tamay Besiroglu
4 months
A few weeks ago, we attempted to replicate the Chinchilla paper. We found that their estimated model fails to adequately fit the reconstructed data, that it implies inconsistent scaling policies, and that their confidence intervals are implausibly narrow.
@tamaybes
Tamay Besiroglu
5 months
The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)
Tweet media one
17
138
918
2
11
113
@tamaybes
Tamay Besiroglu
5 months
@TheSaddlePoint @drjwrae We asked for it several times.
0
1
100
@tamaybes
Tamay Besiroglu
6 months
A recent paper asseses whether AI could cause explosive growth and suggests no. It's good to have other economists seriously engage with the arguments that suggest that AI that substitutes for humans could accelerate growth, right?
Tweet media one
6
8
98
@tamaybes
Tamay Besiroglu
2 years
There seems to be evidence that members of the EA community are overindexing on recent advances in ML and forming unreasonable expectations of transformative AI this decade. @MatthewJBar and I counterbalance this by offering a $1,000 bet to the contrary.
7
10
88
@tamaybes
Tamay Besiroglu
5 months
You can reproduce all our work: Extracted data: Code to reproduce results: Code to extract data from SVG:
6
2
89
@tamaybes
Tamay Besiroglu
4 years
I found that the marginal returns of researchers are rapidly declining. There is what’s called a “standing on toes” effect: researcher productivity declines as the field grows. Because ML has recently grown very quickly, this makes better ML models much harder to find.
Tweet media one
7
26
84
@tamaybes
Tamay Besiroglu
5 months
When we fit their parametric scaling law, we get strikingly different estimates (Chi-squared p-value <1e-60!). The differences are significant for the data-scaling coefficient β and the irreducible loss E. (3/9)
Tweet media one
2
2
82
@tamaybes
Tamay Besiroglu
5 months
We have asked the authors for assistance, but we haven’t been able to get a response. (8/9)
2
0
82
@tamaybes
Tamay Besiroglu
4 months
How much does a doubling R&D effort increase innovation in software? Our new paper proposes new empirical techniques, applies these, and finds evidence of increasing returns to scale: doubling software R&D could more than double the rate of innovation.
@EpochAIResearch
Epoch AI
4 months
Could increasing returns to software R&D lead to explosive tech progress? Our new paper surveys estimation methods and finds evidence of increasing returns to scale in software R&D.
3
13
52
1
9
77
@tamaybes
Tamay Besiroglu
2 months
@justjoshinyou13 Seems like motivated reasoning.
6
0
77
@tamaybes
Tamay Besiroglu
2 years
@pmddomingos Thanks for sharing our work! Our interpretation is that there was something of a phase-transition in the early 2010s coinciding with the advent of Deep Learning, rather than there having been superexpontial growth in compute. See our paper:
Tweet media one
2
6
74
@tamaybes
Tamay Besiroglu
3 years
Guys, my Tweet where I draw a line on a graph that seperates models into 'not conscious' vs. 'maybe slightly conscious' was tongue-in-cheek. I wish I had discovered the key to the question of conscioussness, but I haven't—sorry to disappoint.
@futurism
Futurism
3 years
This debate just keeps getting spicier.
4
7
23
4
5
76
@tamaybes
Tamay Besiroglu
3 years
Seeing so many prominent ML folks ridiculing this idea is disappointing. It makes me less hopeful in the field’s ability to seriously take on some of the profound, weird and important questions that they’ll undoubtedly be faced with over the next few decades.
@ilyasut
Ilya Sutskever
3 years
it may be that today's large neural networks are slightly conscious
452
556
3K
7
6
72
@tamaybes
Tamay Besiroglu
4 years
Surprised to learn that Soros largely single-handedly bailed out Russian science following USSR collapse. The funds significantly induced scientists to remain in the science sector, and had long-lasting impacts on Russian scientific output.
4
15
70
@tamaybes
Tamay Besiroglu
3 years
I just realised that I’m the most prolific forecast operationalizer on @Metaculus , having written 272 over the last 4 years. I find spelling out forecasts forces you to be more precise and emperically-grounded about your beliefs. Here’s a guide I wrote …
Tweet media one
6
4
69
@tamaybes
Tamay Besiroglu
9 months
How much compute did Google use to train Gemini Ultra? The paper unfortunately doesn’t say, and there’s few hints in the technical report. A speculative thing to do is to extrapolate how much is needed to match Gemini on benchmarks. Doing so yields this picture.
Tweet media one
3
6
67
@tamaybes
Tamay Besiroglu
5 months
Hoffmann et al. also report extremely narrow confidence intervals for some key parameters. We calculate that you’d need about 600,000 data points to nail it down that precisely. By contrast, they likely had ~400. (5/9)
Tweet media one
1
2
68
@tamaybes
Tamay Besiroglu
1 year
A failure I notice among economists is conflating mental models for AI. They claim to but seem not to fully consider the implications of "good AI" actually able to flexibly substitute labor "My lips say 'human-level', but my heart says 'a fancier version of GPT" h/t @steve47285
Tweet media one
2
7
66
@tamaybes
Tamay Besiroglu
2 years
@jasonhickel This is so misleading. The paper compares states with similar levels of economic development. Socialist policies tend to depress econ development, so a comparison that controls for econ development will miss much (most?) of the welfare diff btwn socialist and non-socialist states
3
1
65
@tamaybes
Tamay Besiroglu
5 months
Moreover, Hoffmann et al.'s estimates imply a scaling policy inconsistent with their other results and the token-to-parameter ratio used for Chinchilla. Our estimates align better with these and have more reasonable uncertainty. (6/9)
Tweet media one
1
0
62
@tamaybes
Tamay Besiroglu
5 months
Hoffmann et al.'s estimated scaling law fits the reconstructed data very poorly compared to ours. Their residuals are not centered at 0 at all! Our model achieves a lower loss on 98% of data points. Clearly, their model does not fit the data. (4/9)
Tweet media one
1
2
62
@tamaybes
Tamay Besiroglu
6 months
The survey of thousands of AI experts shows that it is believed that the falling cost of computation was most important for AI progress in the past ten years. Increased funding and progress in AI algorithms were about on par. This seems mistaken.
Tweet media one
5
11
61
@tamaybes
Tamay Besiroglu
2 years
I’m very excited to announce Epoch. We’re working on investigating trends in Machine Learning and understanding the transition to a world with advanced AI.
@Jsevillamol
Jaime Sevilla
2 years
We are excited to announce our new research organization: Epoch! We are working on investigating AI developments and forecasting the development of Transformative AI. You can learn more in our announcement: Summary below 🧵⬇️
Tweet media one
7
19
116
2
6
54
@tamaybes
Tamay Besiroglu
1 month
@NeelNanda5 We heard back after we posted to arxiv and tweeted about it. However, we never got the data.
2
3
55
@tamaybes
Tamay Besiroglu
5 months
Hoffmann et al.’s paper has been highly influential in the language modeling community. Our analysis highlights some potential issues that warrant clarification. (7/9)
1
1
55
@tamaybes
Tamay Besiroglu
1 year
@max_paperclips It still fails when I append the prompt with "Let's think this through step-by-step and explain your reasoning"
Tweet media one
2
0
53
@tamaybes
Tamay Besiroglu
1 month
It's curious how Llama 405b's performance drops by 5 percentage points when using standard simple-evals prompts instead of its native Llama 3.1 prompts. Other models show much less sensitivity to this prompt change and fall nicely along the 45-degree line.
Tweet media one
@EpochAIResearch
Epoch AI
1 month
3/8 Evaluation settings make a difference in GPQA performance. We replicated Meta's results using the same settings they used (T=0, Llama 3.1 prompt), with average accuracy at 51.3%. But with default settings for the API we used (T=0.7, simple-evals prompts), it drops to 48.5%.
Tweet media one
1
0
11
5
7
55
@tamaybes
Tamay Besiroglu
2 years
If deep learning becomes good enough to be broadly adopted in the R&D sector, its adoption could induce an accumulation of relevant capital that could nearly double the productivity growth rate in the U.S.
Tweet media one
1
5
51
@tamaybes
Tamay Besiroglu
5 months
I'm thrilled to see that our work has apparently unified the Chinchilla scaling laws. It's great to hear that they're making the data open source!
@borgeaud_s
Sebastian Borgeaud
5 months
Great analysis, approach 3 is finally in agreement! The loss scale was too low in our paper, resulting in premature termination of L-BFGS, and leading to bad fits. After fixing this we can reproduce your findings! We're also open sourcing the data in the paper, stay tuned :)
8
37
245
0
7
53
@tamaybes
Tamay Besiroglu
9 months
Pleased about this work. We wanted to know how much compute is possible with current tech, and derived some bounds. Result: using the world's current energy consumption and maximally efficient GPUs yields 1e35 FP16 ± 0.7 OOMs, about 10B-fold more than GPT-4.
@ansonwhho
Anson Ho
9 months
What are the limits to the energy efficiency of CMOS microprocessors? In our new paper, published in the International Conference on Rebooting Computing, we propose a simple model to shed light on this question:
2
9
51
2
5
49
@tamaybes
Tamay Besiroglu
3 years
@ilyasut Our recent paper on compute trends in ML has some insights:
Tweet media one
@ohlennart
Lennart Heim
3 years
**ML training compute has been doubling every 6 months since 2010!** Our preprint "Compute Trends Across Three Eras of Machine Learning" is out. 🧵 Thread below ↓ 1/
Tweet media one
25
239
857
1
22
50
@tamaybes
Tamay Besiroglu
6 months
Unfortunately, that's not what this is. The authors rule out the possibility of AI broadly substituting for humans, asserting it's "science fiction" and dismiss the arguments that are premised on this.
Tweet media one
Tweet media two
3
2
49
@tamaybes
Tamay Besiroglu
11 months
Keeping tabs on what's happening in AI (who is scaling how fast, how much data or what architectures) is critical. Our database makes doing that much easier. It tracks info on key ML models, both historical and SOTA (GPT-4, Claude 2, PaLM 2, etc.)
2
8
48
@tamaybes
Tamay Besiroglu
2 years
We find that every 9 months, the introduction of better algorithms contribute the equivalent of a doubling of compute budgets. This is much faster than the gains from Moore’s law! That said, there's uncertainty (our 95% CI spans 4 to 25 months).
Tweet media one
1
8
45
@tamaybes
Tamay Besiroglu
5 months
If it's possible to continue to trade off inference and training compute, we should expect similar amounts of compute to be spent on training large models and on running inference with them.
2
4
45
@tamaybes
Tamay Besiroglu
4 years
1990s onward: the Information Age continues: Koss patents the Excel Function, Bezos patents 1-click buying, Page creates Pagerank. ~80% of top patents now Electronics/IT related. Innovation has hardly ever before been this concentrated in so few sectors.
Tweet media one
4
8
46
@tamaybes
Tamay Besiroglu
4 months
Pleased to have been able to contribute to the first Intl Scientific Report on Advanced AI Safety. I think it's a comprehensive & balanced look at progress, risks & challenges, and a step towards a shared understanding of the trajectory of advanced AI.
0
8
46
@tamaybes
Tamay Besiroglu
2 years
You might have expected that with large ML models being not publicly accessible and very costly to train, it would become unclear whether key impressive results would replicate. However, the reproducibility situation for these models has arguably so far been surprisingly good.
2
1
43
@tamaybes
Tamay Besiroglu
6 months
Many AI experts clearly don't think this is science fiction, and AI labs are spending hundreds of billions to make it happen. Why do economists defer so little to AI experts about the topic of what AI can or can't do?
Tweet media one
2
3
42
@tamaybes
Tamay Besiroglu
6 months
While algorithmic progress has been rapid, our Shapley value analysis suggests that 60-95% of the performance improvements stem from increased computing power and training data, while novel algorithms account for only 5-40% of the progress. (4/10)
Tweet media one
1
8
42
@tamaybes
Tamay Besiroglu
4 months
The authors responded, clarifying that this was the result of their optimizer stopping early due to a bad loss scale choice. They plan to update their results and release the data. We appreciate @borgeaud_s and others' openness in addressing this issue.
@borgeaud_s
Sebastian Borgeaud
5 months
Great analysis, approach 3 is finally in agreement! The loss scale was too low in our paper, resulting in premature termination of L-BFGS, and leading to bad fits. After fixing this we can reproduce your findings! We're also open sourcing the data in the paper, stay tuned :)
8
37
245
1
4
40
@tamaybes
Tamay Besiroglu
3 months
The article presents a well-articulated case that by extrapolating current AI trends—rapidly increasing compute, consistent algorithmic efficiency gains, and techniques that unlock latent capabilities—we may develop "drop-in remote workers" by 2027.
@leopoldasch
Leopold Aschenbrenner
3 months
Full series as PDF: Read online:
37
148
806
2
5
41
@tamaybes
Tamay Besiroglu
8 months
I was quoted today in a Time article on the AI progress survey and in another on AI and growth. Some journalists write well and faithfully represent the views of those they speak with, which is great.
0
2
37
@tamaybes
Tamay Besiroglu
1 year
AI research is advancing rapidly, but compute usage, a key metric, is often overlooked due to a lack of established practices. This omission hinders model comparisons, reproducibility and governance. Discover why this matters and find our proposals here:
2
7
37
@tamaybes
Tamay Besiroglu
8 months
Neat exposition of our speculative proposal for estimating the compute required for a scaled-up GPT to tasks like science. Yuxi does a better job explaining some of the key ideas than we do.
Tweet media one
6
3
36
@tamaybes
Tamay Besiroglu
3 months
In early 2022 we wrote a paper finding a 4x/year rate of increase in the scale of training runs. Updated data, now 3x larger, shows this still holds. If the trend continues, we can expect further performance improvements surpassing current capabilities in the near future.
@EpochAIResearch
Epoch AI
3 months
1/ How quickly are state-of-the-art AI models growing? The amount of compute used in AI training is a critical driver of progress in AI. Our analysis of over 300 machine learning systems reveals that the amount of compute used in training is consistently being scaled up at
Tweet media one
23
306
2K
4
8
37
@tamaybes
Tamay Besiroglu
6 months
@romanyam Want to bet in a way that I pay you today and you pay me some multiple in a few years?
1
1
35
@tamaybes
Tamay Besiroglu
2 years
I deleted this tweet on ML compute spending, because I’m no longer confident that AlphaGo Zero was in fact the most expensive ML experiment to date. I still think broader observation I point to is true, but I prefer to make claims only when I’m confident about factually accuracy.
Tweet media one
1
0
34
@tamaybes
Tamay Besiroglu
3 years
7.5 years of GAN progress on face generation.
Tweet media one
3
4
35
@tamaybes
Tamay Besiroglu
2 years
This was a thoughtful take on three reasons to be skeptical about catastrophic risk from AI: selection effects about who engages with the arguments, community epistemic problems, and issues with chains of reasoning involving imperfect concepts.
@NunoSempere
Nuño Sempere
2 years
Just posted this effort post: "My highly personal skepticism braindump on existential risk from artificial intelligence". Might be of interest to people here.
8
12
103
2
4
33
@tamaybes
Tamay Besiroglu
4 years
On the other hand, I find that progress now makes progress in the future easier. This is called a “standing-on-the-shoulders” effect (innovations today are bootstrapped by previous progress).
1
1
31
@tamaybes
Tamay Besiroglu
3 months
Data constraints will make scaling less efficient at around 1e29 FLOP, around 4 OOMs larger than GPT-4. This leaves a lot of room for continued scaling. However, combining massive scaling with intense overtraining might soon become a challenge.
@EpochAIResearch
Epoch AI
3 months
Are we running out of data to train language models? State-of-the-art LLMs use datasets with tens of trillions of words, and use 2-3x more per year. Our new ICML paper estimates when we might exhaust all text data on the internet. 1/12
Tweet media one
24
127
627
2
3
32
@tamaybes
Tamay Besiroglu
9 months
Dwarkesh provides a thoughtful analysis of why scaling LLM-like systems may or may not succeed. He concludes it's 70% likely that scaling + algorithmic progress + hardware advances over the next 20-ish years will suffice. Seems reasonable to me.
@dwarkesh_sp
Dwarkesh Patel
9 months
New post: Will scaling work? This is the crux in arguments about AI timelines. In order to think through my own position, I wrote the post as a debate between a skeptic and a believer. Skeptic point 1: Data bottlenecks won't be clearer by self-play/synthetic data:
Tweet media one
33
104
793
2
4
31
@tamaybes
Tamay Besiroglu
2 years
21 NeurIPS papers mentioned scaling laws in 2021, more than double the number in all previous proceedings. Yet, 21 papers represent ~1% of 2021-papers, so in an absolute sense scaling laws receive fairly little attention from many top ML reseachers.
Tweet media one
2
3
29
@tamaybes
Tamay Besiroglu
18 days
How feasible is it to continue scaling up AI training at its current pace? Our analysis of power, chips, data, and latency constraints suggests it is through this decade. By 2030, models could likely exceed GPT-4 in scale to the same degree that GPT-4 exceeds GPT-2 in scale.
@EpochAIResearch
Epoch AI
18 days
1/ Can AI scaling continue through 2030? We examine whether constraints on power, chip manufacturing, training data, or data center latencies might hinder AI growth. Our analysis suggests that AI scaling can likely continue its current trend through 2030.
38
167
685
0
1
31
@tamaybes
Tamay Besiroglu
1 year
I’m very excited to see this issue on AI; many essays that I can’t wait to dig into. I also have a small contribution in the form of a debate with @mattsclancy on explosive economic growth from advanced AI (I argue that it’ll likely happen, Matt argues it likely won’t).
@asteriskmgzn
Asterisk
1 year
Announcing Issue 03: AI AI is all anyone is talking about. Our writers have been thinking about it for years.
3
21
82
1
4
31
@tamaybes
Tamay Besiroglu
10 months
Google seems to have experimented with a 50,000+ TPU training run. "To give a sense of scale, this cluster of Cloud TPU v5e chips has more AI accelerators than the TOP1 Supercomputer Frontier at Oak Ridge National Laboratory, which featured 37,888 AMD M1250X GPUs"
@tamaybes
Tamay Besiroglu
1 year
Dylan is predicting a 100k GPU cluster next year or the year after, which would enable a ~$1bn training run. Seems plausible to me.
1
1
23
1
8
29
@tamaybes
Tamay Besiroglu
1 year
@ylecun This also applies to things humans write, e.g., mathematical proofs with many steps, but those aren't generally 'doomed'.
3
1
29
@tamaybes
Tamay Besiroglu
6 months
This rate of algorithmic progress is much faster than the two-year doubling time of Moore's Law for hardware improvements, and faster than other domains of software, like SAT-solvers, linear programs, etc. (2/10)
Tweet media one
1
2
28
@tamaybes
Tamay Besiroglu
3 years
@robertwiblin I agree with your conclusion. However, the correct comparison is not between AZ and no vaccine, but rather between AZ the next-best vaccine administered at some delay. I expect your conclusion still to be true, but it’s worth framing things carefully.
2
0
28
@tamaybes
Tamay Besiroglu
9 months
. @OpenAI could you please make sure to train on all publicly available LaTeX resources for an excessive amount of epochs for GPT-5? GPT-4 isn't very good at LaTeX. Think of the few basis points boost in scientific productivity this would deliver.
1
0
26
@tamaybes
Tamay Besiroglu
6 months
Paper link: FWIW, it seems like a solid paper if you're for some reason interested in the effects of a type of AI that is forever incapable of automating R&D.
2
0
27
@tamaybes
Tamay Besiroglu
5 months
Cool to see our replication of Chinchilla amongst the top ML papers of the week in what was a packed week for AI.
@dair_ai
DAIR.AI
5 months
The Top ML Papers of the Week (April 15 - April 21): - Llama 3 - Mixtral 8x22B - A Survey on RAG - How Faithful are RAG Models? - Emerging AI Agent Architectures - Chinchilla Scaling: A replication attempt ...
5
86
552
2
1
27
@tamaybes
Tamay Besiroglu
2 years
Top forecaster Steven0461 describes plausible scenarios of 2050 that leave us without transformative AI. They suggest that if there is to be an obstacle, plateauing hardware improvements is the most likely culprit
Tweet media one
3
2
27
@tamaybes
Tamay Besiroglu
3 years
Progress in the tools for engineering mirror-image molecules by dedicated mirror-image biology labs, such as presented in this recent article, might soon enable the creation of mirror cells. This worries me.
1
3
27
@tamaybes
Tamay Besiroglu
4 years
In my dissertation, I explored to how this story holds up for machine learning. I used a dataset on the top performing ML models on 93 machine learning benchmarks—mostly related to computer vision and NLP—and data on research input derived from data on publications.
3
2
25
@tamaybes
Tamay Besiroglu
3 years
Really excited to be joining @ProfNeilT 's lab at MIT to work in the intersection of Economics and Computer Science focusing on AI and Computing.
3
1
26
@tamaybes
Tamay Besiroglu
4 years
Some background. @ChadJonesEcon , @johnvanreenen and others wrote an awesome article that found that ideas are getting harder to find: in semiconductors, agricultural production and medicine, research productivity has been declining steadily.
2
1
26
@tamaybes
Tamay Besiroglu
2 years
Standard endogenous growth theory predicts that capital-intensive R&D produces faster growth. More productive use of capital (K) → increased investment → accumulation of more K → increased productivity and output → increased investment → accumulation of more K, etc.
Tweet media one
2
1
25
@tamaybes
Tamay Besiroglu
4 years
A “standing-on-the-shoulders” effect in ML is on the whole not that surprising: it seems that finding one approach to solving one task can often be repurposed to solve other, related tasks (e.g. transformers, attention, etc.)
1
0
24
@tamaybes
Tamay Besiroglu
1 year
Dylan is predicting a 100k GPU cluster next year or the year after, which would enable a ~$1bn training run. Seems plausible to me.
@dylan522p
Dylan Patel
1 year
@tamaybes And faster GPUs/TPUs too. Google, Meta, Microsoft/OpenAI, Baidu, Tencent, Alibaba, ByteDance all have the capability (ignoring China GPU bans) 3 of those will do it.
1
0
9
1
1
23
@tamaybes
Tamay Besiroglu
2 years
Our estimates imply that AI-augmented R&D would involve capital investments much larger than all R&D sectors in the US (only ~6% of spending of NSF-funded STEM labs is dedicated to things we might consider physical capital).
1
2
24
@tamaybes
Tamay Besiroglu
2 years
An economy whose R&D is augmented by ML models relies more on physical capital (compute) and less on human scientists. Capital, unlike labor, is a well-behaved economic good: it can accumulate exponentially in line with economic growth. This has important implications.
1
0
24
@tamaybes
Tamay Besiroglu
2 years
@MikePFrank This article argues the kink is the (delayed) effect of the ending of Dennard scaling; the largest HPC centres compensated for a while by increasing spending and parallelism, and that this ended around 2013. Does that seem plausible/correct to you?
3
1
25
@tamaybes
Tamay Besiroglu
1 year
What is the consensus on how reliable the academic exams results for GPT-4 are? The GPT-4 paper contamination study looks decent, but the model often just seems to fail fairly basic 2023 highschool math problems.
4
1
23
@tamaybes
Tamay Besiroglu
3 years
Metaculus has built up what is likely the most comprehensive repository of carefully spelled out forecasts about AI and its impacts. I’m excited to organize this contest to explore how it could help ground discussions about the future of AI in terms of quantifiable predictions.
@metaculus
Metaculus
3 years
The AI Progress Essay Contest is open! Engage with the wealth of AI forecasts on Metaculus to construct an accurate picture of the timeline and impact of transformative AI. $6,500 and the Dreyfus Prize will go to the most insightful pieces:
1
4
20
0
3
25
@tamaybes
Tamay Besiroglu
6 months
We estimate the transformer architecture provided roughly 10x "compute-equivalent gain", though estimates vary by model specification. Chinchilla scaling laws provided around 2-4x gains over Kaplan depending on the scale. (6/10)
Tweet media one
Tweet media two
1
3
23
@tamaybes
Tamay Besiroglu
4 years
It turns out that the “standing-on-toes” effect dominates. I estimate that overall research productivity declined by between 4% to 26% (depending on which sub-field and which model).
2
1
23
@tamaybes
Tamay Besiroglu
3 months
Cool work: Predicting downstream performance based on compute could help us anticipate the capabilities of future models, but predictability has remained elusive. @RylanSchaeffer , @haileysch__ et al. explore why and & suggest the possibility of "scaling-predictable evaluations".
@RylanSchaeffer
Rylan Schaeffer
3 months
❤️‍🔥❤️‍🔥Excited to share our new paper ❤️‍🔥❤️‍🔥 **Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?** w/ @haileysch__ @BrandoHablando @gabemukobi @varunrmadan @herbiebradley @ai_phd @BlancheMinerva @sanmikoyejo 1/N
Tweet media one
8
54
262
0
1
24
@tamaybes
Tamay Besiroglu
4 years
@kristjanmoore I also think that this is more likely than not (I’m around 60% confident). Here’s my track record on 220 questions.
Tweet media one
4
3
24