Ross Taylor @rosstaylor90 Twitter profile | Pikagi

Pikagi

Ross Taylor

@rosstaylor90

6,606

Followers

982

Following

23

Media

2,183

Statuses

Something new 🐠. Prev: reasoning lead @metaai , LLaMA 2/3, @paperswithcode co-creator, Galactica LLM lead, Atlas ML (acq by Meta), sports betting, UK Treasury

▽²f = 0

https://t.co/q3dGCyZRjY

Joined March 2012

Don't wanna be here? Send us removal request.

Pinned Tweet

@rosstaylor90

Ross Taylor

1 year

Here’s my ICML talk on teaching LLMs to reason (video and slides). Like everyone else, I can’t talk about what I’m working on right now, but I tried to provide a useful overview of the history of LLMs and reasoning, current areas of focus, and potential directions. Enjoy!

9

40

314

Last Seen Profiles

@preetikapTOI

@w18943851

@aziz_alduwaisan

@MarshArcana

@lavenderHaze101

@WkwkBang63720

@bokeplokalmalam

@Binor_stw69

@soffibunda

@stw_pdg

@stw_pdg

@girlscoutband

@tplboard

@hottakesa

@Coach_Merci

@szngzs

@Vincent_Rz

@srap12701739

@root34_39

@wanderinphantom

@DaydayJellyfam

@dhsh67162742

@lanesmart094

@monaauraa

@dul_turkporno

@ryo_shiraishi

@ProAssology

@fayuiz

@saraymnn

@joewrig70896514

@skwires_

@ImlahRowan

@k52053945

@RedEOBrien

@dimitribosch

@WONGBIMA7

@rosstaylor90

Ross Taylor

11 months

I am the first author of the Galactica paper and have been quiet about it for a year. Maybe I will write a blog post talking about what actually happened, but if you want the TLDR: 1. Galactica was a base model trained on scientific literature and modalities. 2. We approached

@sharongoldman

Sharon Goldman

11 months

One year ago — 2 weeks before @OpenAI released ChatGPT — @Meta released Galactica. The LLM was public for only 3 days, but its lessons led to decisions around Llama's release. Thanks to @jpineau1 for chatting w/ me and h/t to @ylecun Read here: ⏬

10

80

486

94

325

3K

@rosstaylor90

Ross Taylor

6 months

I left Meta yesterday. Nothing but positive things to say: FAIR and GenAI are great places to do research and engineering. Will miss my colleagues! LLMs have shown how magical deep learning can be in a data-rich regime. But many domains remain data-constrained, which prevents

32

47

951

@rosstaylor90

Ross Taylor

11 months

Why are LLMs bad at reasoning? One theory says this is due to weaknesses in maximum likelihood, where the probability mass “overgeneralises” to low quality solutions. Because our pretraining objective (likelihood) doesn’t transfer to our evaluation objective (accuracy), the

51

40

409

@rosstaylor90

Ross Taylor

10 months

Controversial take: open LLM leaderboards have been a net negative to the field as they’ve encouraged leaderboard hacking, training on in-domain datasets (likely test sets too), GPT distillation and other practices that confound comparison. On @paperswithcode we never allowed

@agihippo

yi

10 months

"open source is catching up"

2

1

34

13

21

242

@rosstaylor90

Ross Taylor

3 years

Our mission at @paperswithcode is to to index all scientific information and then convert this information into useful knowledge. The datasets index for ML is getting really comprehensive! What’s next? Stay tuned 🙃.

@paperswithcode

Papers with Code

@paperswithcode

3 years

🎉 We've just crossed 5000 Datasets! 🎉 We now index and organize more than 5000 research datasets for machine learning. A huge thanks to the research community for their ongoing contributions. Browse the full catalogue here:

Tweet media one

15

484

2K

2

31

239

@rosstaylor90

Ross Taylor

2 years

A year’s journey; glad to get this out! The vision is to build a megafunction that models all of Nature. Small steps with this work. We broke a few rules about training LLMs on the way, with some great results.

@paperswithcode

Papers with Code

@paperswithcode

2 years

🪐 Introducing Galactica. A large language model for science. Can summarize academic literature, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more. Explore and get weights:

285

2K

8K

8

28

242

@rosstaylor90

Ross Taylor

1 year

Fun fact, when Attention is All You Need came out, Twitter was mostly focused on the SeLU paper (and its appendix). Not a one off either; there are countless other examples of Twitter picking the wrong “winner”. The best way to predict the future is to do the work yourself and

6

18

199

@rosstaylor90

Ross Taylor

2 years

Not sure we ever detailed our experience training the 120B Galactica model, but tldr: have a mechanism in place for skipping batches, lower LR after periods of sustained instability, use longer warmup to hedge against bad init, pray to the AI Gods 🙏.

4

11

194

@rosstaylor90

Ross Taylor

7 months

What’s worse is that bad public evaluation leads to a race to the bottom. Because of Prisoners’ Dilemma, if everyone else is benchmark hacking, you have a strong incentive to benchmark hack yourself. Case in point: - There’s no such thing as a “base model” anymore as people are

@aidangomez

Aidan Gomez

7 months

I don't think people have fully internalized just how broken public evaluation of models is.

37

43

501

14

8

114

@rosstaylor90

Ross Taylor

1 year

ICML is my first ever ML conference (I’m antisocial 😅). Some observations: - People are really excited about LLaMA 2 and open source LLMs in general. - RLHF seems to be working well for everyone in most LLM domains (chatbots, code, reasoning). - Hawaiian shirt % is pretty low. I

1

3

93

@rosstaylor90

Ross Taylor

1 year

It’s disingenuous to speak of the risks of open source AI without acknowledging the risks of closed source AI. Is it wise to have concentration of extreme power with a few large, unaccountable organisations? If there are guardians, who chooses the guardians?

3

11

84

@rosstaylor90

Ross Taylor

11 months

Was not expecting a hastily written, early morning Galactica retrospective to bounce so much - given it happened 3000 years ago in AI time. To close the topic, here’s a great talk below by Sean Murray on No Man’s Sky and how they “engoodened” things (after some initial missteps)

3

7

82

@rosstaylor90

Ross Taylor

11 months

@DrJimFan Nice writeup! Quick correction though: ORMs still learn a per token loss, and can learn to assign credit to intermediate steps. Covered an example here in my talk :

Tweet media one

2

4

80

@rosstaylor90

Ross Taylor

2 years

100% the worst LLM take is that weights shouldn’t store knowledge. They might not store *all* knowledge, but their compiled knowledge in weight memory is the source of their creativity (and value) over retrieval approaches.

6

3

79

@rosstaylor90

Ross Taylor

5 months

“Some guy” 👀 #mypension

@itsandrewgao

andrew gao

5 months

some guy is sitting on pip install agi

23

35

588

2

1

75

@rosstaylor90

Ross Taylor

1 month

Internal reasoning tokens, circa 2022. <work> 🙂 (It’s a shame people aren’t exploring similar ideas for visual reasoning. Good reasoning requires a visual and a propositional code)

Tweet media one

6

3

69

@rosstaylor90

Ross Taylor

1 month

This was always going to happen given the hype and money in the field + poor evaluation standards. People need to be way more sceptical of new model releases. Far too many instances of “benchmark hill-climbing grift” going on as a vehicle to create hype. This case is

@shinboson

𝞍 Shin Megami Boson 𝞍

1 month

A story about fraud in the AI research community: On September 5th, Matt Shumer, CEO of OthersideAI, announces to the world that they've made a breakthrough, allowing them to train a mid-size model to top-tier levels of performance. This is huge. If it's real. It isn't.

Tweet media one

115

731

7K

4

6

52

@rosstaylor90

Ross Taylor

11 months

@coppola_ai Good research, bad launch. Bad launch not because of stupidity but because of loss of situational awareness due to excessive workload. To fix: if your team is operating above capacity, then make sure to have good internal feedback sources that can help you see the wood from the

3

4

66

@rosstaylor90

Ross Taylor

11 months

@ChurchillMic Yes, to be clear I think the commentary was completely overblown. We were directionally right (and early) with what we were doing, but the way the demo was executed was wrong. So it’s not really apologetic, more like “here’s what happened in case you were wondering”

1

0

63

@rosstaylor90

Ross Taylor

6 months

What people get wrong about foundation models for science: too much focus on generating new discoveries and ideas - way too little focus on instrumentation. The biggest driver of scientific progress is new instruments. So the biggest impact of deep learning will be accelerating

5

4

62

@rosstaylor90

Ross Taylor

5 months

Every time we tried this it had little benefit, but still had plenty of people asserting to me as a fact that “training on code helps with reasoning”. Reality: the biggest gains come from training on the same domain. If your downstream task involves LaTeX math, then the biggest

@HeinrichKuttler

heiner

@HeinrichKuttler

5 months

Transfer of skills (e.g., train on coding to help with 'reasoning') is more often asserted than demonstrated.

2

5

46

6

1

57

@rosstaylor90

Ross Taylor

7 months

What happened, three possibilities: 1. They both use the same annotation provider for instruction tuning, who has provided the same prompt/answer data to two different companies… 2. The annotators themselves are using the same language models to help annotate (eg GPT-4), which

@seshubon

seshu bonam

7 months

WHAT? @inflectionAI is just a claude-3-sonnet wrapper? care to explain? 🐒 Produces the exact same answer word to word for a custom query i asked 🤯

Tweet media one

Tweet media two

Tweet media three

Tweet media four

67

66

907

6

1

49

@rosstaylor90

Ross Taylor

3 months

The fundamental tension of doing ambitious projects: Long periods in the wilderness trying to make new things work. Meanwhile, the world continues to move around you, harvesting the low-hanging fruit. Delayed gratification is hard, but worth it!

0

3

46

@rosstaylor90

Ross Taylor

10 months

My 2024 wishes for for open source / open science in ML: - Less free-riding on GPT outputs; instead more open innovation in post-training to obtain outputs of a similar or better quality. - More understanding that RLHF is not a capability “nerf” but a capability enhancer:

1

2

40

@rosstaylor90

Ross Taylor

3 months

LLMs are trained to imitate human outputs; not human latents. This explains the majority of the alignment problem, as well as deficits in areas such as reasoning - where we do not observe mental scaffolding (internal context) on the internet - we only observe output context.

6

4

40

@rosstaylor90

Ross Taylor

1 month

Congrats to @OpenAI on the amazing results! o1-mini results particularly impressive. My guess: larger models are more “wasteful” in spending more time in the forward pass on easier tokens. Therefore it’s better to have a smaller model that can more efficiently allocate compute

@OpenAI

OpenAI

1 month

We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.

987

4K

18K

1

3

39

@rosstaylor90

Ross Taylor

28 days

@deliprao To be fair to OpenAI, the early GPT papers were a gold mine and I’m not sure a lot of the open LLM efforts would have been possible without them. So I think “parasitic” is way too harsh! (My bigger gripe: their fuelling of anon hype accounts which have polluted the X feed and

2

1

37

@rosstaylor90

Ross Taylor

2 years

Thanks to everyone for the kind words on the Galactica paper. Will not be commenting on the wider release, but I am optimistic that LLMs will evolve from tools of association (creativity + search + idea generation) into aligned models that preserve factualness. Stay tuned!

1

1

33

@rosstaylor90

Ross Taylor

28 days

I don’t subscribe to the view that “ideas are cheap, execution is everything”, in the sense that good intuition is hard to come by. But what is true is that new ideas are incredibly fragile. At an organisation that prioritises reactive plays, these ideas will die without

3

4

32

@rosstaylor90

Ross Taylor

1 year

Great paper. Last year we had similar problem where Galactica could do SMILES -> IUPAC but not the inverse. Solution was to augment the data and shuffle the PubChem layout (lol). Simple rephrasing of existing datasets is likely to yield large benefits for generalisation.

@OwainEvans_UK

Owain Evans

1 year

Does a language model trained on “A is B” generalize to “B is A”? E.g. When trained only on “George Washington was the first US president”, can models automatically answer “Who was the first US president?” Our new paper shows they cannot!

Tweet media one

175

706

4K

3

3

31

@rosstaylor90

Ross Taylor

1 year

I don’t think anyone has a grasp on what society will look like with each person’s intelligence increased by an order of magnitude. But it does not immediately follow that restricting power and access to an elite minority is better than decentralising this power. Not obvious.

4

5

30

@rosstaylor90

Ross Taylor

9 months

System 2 is fiction. It does not exist as an area in the brain. It might be a useful psychological abstraction, but it doesn’t tell you anything useful about how to achieve goal-directed behaviour computationally.

13

0

27

@rosstaylor90

Ross Taylor

4 months

Great paper with lots of ablations breaking down the benefits of Mamba layers versus self-attention. TLDR: Mamba layers worse than self-attention on in-context learning and context-based information retrieval. Hybrid model seems to get best of both worlds*. Also nice tidbit:

@ctnzr

Bryan Catanzaro

4 months

A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less

Tweet media one

18

77

451

1

3

28

@rosstaylor90

Ross Taylor

1 year

I signed. Closed source AI is more dangerous than open source AI in the short-medium run due to lack of transparency about how models are built and developed. Letting just a few companies develop advanced AI makes the problem worse, not better. We need more eyeballs on weights

0

5

26

@rosstaylor90

Ross Taylor

5 months

What would you build if you took deep learning seriously? Ten years ago the answer was AGI: a term that would make you look kooky in research circles. Now every post-ChatGPT startup has AGI as their mission. It’s no longer a sign of ambition for a new company. Maybe a slightly

3

0

27

@rosstaylor90

Ross Taylor

5 months

I’m pleased to release a new version of my popular Python library agi today: Enjoy! #feeltheagi

Tweet card media

AGI with a convenient pip install

@rosstaylor90

Ross Taylor

5 months

“Some guy” 👀 #mypension

2

1

75

4

1

27

@rosstaylor90

Ross Taylor

5 months

One of great joys of life: meeting lots of talented people (new and old faces) working on big problems. Thanks NY & SF - now back to London for a bit 🇬🇧.

Tweet media one

1

1

27

@rosstaylor90

Ross Taylor

3 months

Congrats to my former colleagues for this exciting launch! Special shoutout to @iliyanzarov , @ViktorKerkez , @tonyjhartshorn , @louvishh , anirudh and others for the reasoning work! 🙂

@AIatMeta

AI at Meta

6 months

Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes. Today's release includes the first two Llama 3

351

1K

6K

0

0

25

@rosstaylor90

Ross Taylor

16 days

If you optimised AI releases for Twitter traction a few years ago, you could expect it to be fairly targeted, well-informed feedback. But nowadays feedback is dominated by a low-quality hype factor. This leads to difficult tradeoffs about how you talk about your work, and

2

1

23

@rosstaylor90

Ross Taylor

3 months

@_xjdr Everyone on X assumes that good LLM performance is due to “fancy secret method”, but more often than not it’s just solid execution of well-established recipes.

0

0

21

@rosstaylor90

Ross Taylor

5 months

Credit rating agencies had misaligned incentives in the 2000s: the providers of the products they rated were the ones paying them. (My first job was regulating CDOs post-crisis, lol) Similarly a company that sells data to frontier firms for LLMs is probably not the right one to

1

2

21

@rosstaylor90

Ross Taylor

5 months

At Google I/O!

Tweet media one

3

2

21

@rosstaylor90

Ross Taylor

15 days

@agarwl_ Niche (?) one that I’m recommending a lot recently: .

Tweet card media

Small-scale proxies for large-scale Transformer training instabilities

Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales....

0

0

21

@rosstaylor90

Ross Taylor

5 months

There is a surprising amount of value just by taking common assertions people make, ask “Why?” a couple of times, and find out that it has no grounding other than the fact that lots of people keep saying it. The introspective version of this: if you’ve believed anything for 5

1

1

20

@rosstaylor90

Ross Taylor

2 years

@OctothorpeVoid Yep similar to PaLM, when you have a bad spike, record the batch index and skip it. You can also sometimes reduce the LR by a very small magnitude and the spike won't happen (which suggests spikes are due to an unstable loss surface, not the data per say).

1

0

19

@rosstaylor90

Ross Taylor

10 months

@JPobserver This is a popular way to explain things. I’m a bit sceptical of system 1/system 2 as it’s a psychological abstraction that doesn’t actually tell you how the brain works. I think you can get a better idea of what LLMs might be missing by looking at how the prefrontal cortex

1

0

19

@rosstaylor90

Ross Taylor

8 months

AGI test: When will AI be able to work as a human-level wedding planner? Specs: - Talks to couple, understands their preferences - Rings up venues, negotiates prices etc - Hires musicians, catering, arranges transport - Designs and pushes out the website - Sends out invites -

4

0

18

@rosstaylor90

Ross Taylor

2 years

We evaluated regularly to see the effect of spikes. While val loss usually recovered quickly, downstream eval showed it would sometimes forget a task (e.g. Yes/No QA) and take a long while to recover. So when debugging spikes, don’t just look at val loss…

1

1

16

@rosstaylor90

Ross Taylor

11 months

@erhartford @ylecun @didntdrinkwater @paulsutter Nougat was another product of the Galactica team, by the way 🙂

0

0

14

@rosstaylor90

Ross Taylor

2 years

Please follow @misterkardas for insights on LLM training, who was the reason training ran so smoothly for us.

1

0

14

@rosstaylor90

Ross Taylor

11 months

This is great! Sorely needed benchmark.

@tsawada_ml

Tom Sawada

11 months

GPQA: A Graduate-Level Google-Proof Q&A Benchmark abs: pdf: - a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry; - experts who have or are pursuing PhDs in the

Tweet media one

Tweet media two

Tweet media three

Tweet media four

0

6

33

0

1

13

@rosstaylor90

Ross Taylor

3 months

X is a study on how most people form expectations adaptively rather than rationally. Explains everything from “e/acc” internal contradictions on AI safety, to “LLMs can’t do x” arguments that are outdated in a few months. The big changes are when progress undermines current

0

0

14

@rosstaylor90

Ross Taylor

1 year

These analyses are disingenuous without comparisons to search engines.

@DrNikkiTeran

Nikki Teran

1 year

Will releasing the weights of large language models grant widespread access to pandemic agents? Turns out, yes, probably. 1/5

Tweet media one

60

103

452

1

1

14

@rosstaylor90

Ross Taylor

7 months

@BlancheMinerva As an example, WikiHow and HellaSwag.

3

0

14

@rosstaylor90

Ross Taylor

6 months

Rui is amazing and you guys should all follow him.

@magpie_rayhou

Rui Hou

6 months

Excited to release a preview version of Llama3 with superb performance to the community! More to come soon!

2

4

29

1

0

14

@rosstaylor90

Ross Taylor

1 year

People doing reasoning self-correction papers: please ablate against simple majority voting. Thanks 🙏 As with many reasoning papers this year, the bulk of the performance increase below comes from GPT-4 distillation, not the method that is introduced — which is likely beaten

@_akhaliq

AK

1 year

Learning From Mistakes Makes LLM Better Reasoner paper page: Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve this capability, this work proposes Learning from Mistakes (LeMa),

Tweet media one

1

83

351

0

2

14

@rosstaylor90

Ross Taylor

2 years

I’d like to thank the entire @paperswithcode team for their incredible work on this. @misterkardas @rbstojnic @g_cucurull @omarsar0 @ThomasScialom @tonyjhartshorn @ViktorKerkez @iliyanzarov , Lukas Blecher, Andrew Poulton, @andrewkuanop . Honored to work with them.

2

0

14

@rosstaylor90

Ross Taylor

2 months

Somehow missed this when it was originally posted - awesome work! #chemlactica

@YerevaNN

YerevaNN

3 months

1/12 We are happy to announce the release of our language models for optimizing small organic molecules. Built on top of Galactica and Gemma, they come in three sizes: Chemlactica-125M, Chemlactica-1.3B, and Chemma-2B. The preprint is on arxiv:

1

13

66

0

0

13

@rosstaylor90

Ross Taylor

4 months

Amazing work! Congrats to the team #tokenizenature

@alexrives

Alex Rives

4 months

We have trained ESM3 and we're excited to introduce EvolutionaryScale. ESM3 is a generative language model for programming biology. In experiments, we found ESM3 can simulate 500M years of evolution to generate new fluorescent proteins. Read more:

147

839

3K

0

0

13

@rosstaylor90

Ross Taylor

9 months

@ylecun Yes, I’m with you on the inference process (simulating consequences, evaluating them, and planning), and the idea we can use inference compute for policy improvement. I think what irks me about system 1/2 as a metaphor is: 1) When we overuse two-factor metaphors, people end up

3

1

12

@rosstaylor90

Ross Taylor

5 months

@AlbalakAlon Nice paper! I understand this through an NFL type argument. There is no such thing as “data quality” because everything (even spam) is a task to be learnt. However in practice there is a subset of tasks you care about more. That means there is a “free lunch” by reweighting the

1

1

13

@rosstaylor90

Ross Taylor

1 year

@agihippo I wanted to scream when I saw that post this morning. Massive elephant in the room called “pretraining data” that influencers couldn’t seem to see 🐘…

2

0

13

@rosstaylor90

Ross Taylor

1 year

@soumithchintala The Act of Creation by Arthur Koestler is the best philosophy read on this. My favourite anecdote on how far away AI is: the structure of benzene was discovered by someone having a dream of a snake eating its own tail 🙂.

0

1

11

@rosstaylor90

Ross Taylor

7 months

Update: it looks like the user posted the Claude response in another conversation with Pi, so it echoed the same response, per @inflectionAI . Guessing game over…

@rosstaylor90

Ross Taylor

7 months

What happened, three possibilities: 1. They both use the same annotation provider for instruction tuning, who has provided the same prompt/answer data to two different companies… 2. The annotators themselves are using the same language models to help annotate (eg GPT-4), which

6

1

49

3

0

12

@rosstaylor90

Ross Taylor

2 months

Had a lot of fun talking to Nathan a few weeks back about various topics in LLM land! Note: my thought-to-speech module was a little off due to jet lag — so the text transcript is good if you prefer that medium ☺️.

@natolambert

Nathan Lambert

2 months

Finally got to chat with @rosstaylor90 -- exactly why I started this series. So much signal on the LLM life cycle from training to demos. Reasoning, Galactica, demo backlash, post-training, sft vs rlhf, LLMs for science, realistic agents, PRMs, and other topics

2

11

62

0

0

11

@rosstaylor90

Ross Taylor

3 months

The real LLM benchmark: be useful for the best human experts in each field.

@teortaxesTex

Teortaxes▶️

3 months

> Ed Witten finds Claude's insights on quantum mechanics meh Humans will reach superhuman performance in goalpost-moving before AI gets superhuman at science

10

4

80

2

0

11

@rosstaylor90

Ross Taylor

8 months

One of the problems with LLM creativity is that new methods and tools have relatively few mentions in the literature, so creativity can’t rely as much on (flexible) weight memory; and instead must be in-context using some form of retrieval. We often think of creativity as the

0

0

11

@rosstaylor90

Ross Taylor

2 years

If you believe LLMs don’t have meaning because they are “ungrounded”, then you also believe there is no such thing as history. My take on the English civil war is “ungrounded” because it is based on inference from historical texts, not 17th century “world knowledge”.

3

0

10

@rosstaylor90

Ross Taylor

6 years

- The latest ML research with code implementations in several frameworks ( @TensorFlow , @PyTorch , ..). Also lots of pretty gifs...

1

4

10

@rosstaylor90

Ross Taylor

1 month

@_xjdr Traditional CoT data doesn’t have steps like “alternatively, maybe we could try x” (branching) or “actually, that’s wrong” (self-correction) —> so the easiest thing people should try is making a SFT dataset with these types of step, then do PPO on top, as the model will now have

3

0

10

@rosstaylor90

Ross Taylor

11 months

Next up, Code Llama in a rubber duck?

@Mascobot

Marco Mascorro

1 year

This Mistral Parrot 🦜 has Mistral 7B running locally and you can talk it 😅 By @antimatter15

10

21

163

2

0

10

@rosstaylor90

Ross Taylor

5 months

Finally got a chance to visit the Computer History Museum in Mountain View. So inspiring! Was fun to think of the future sections that will be added. The GenAI boom of the 2020s…and the more important things that come after :)

Tweet media one

Tweet media two

Tweet media three

Tweet media four

1

0

9

@rosstaylor90

Ross Taylor

2 months

@rtk254 It’s not a question of focus but a consequence of Moravec’s paradox, no?

1

0

9

@rosstaylor90

Ross Taylor

1 month

LLMs cling too much to the training corpus. Or: knowledge is not the same as intelligence.

@wtgowers

Timothy Gowers @wtgowers

1 month

Basically, it was determined to treat this problem in the same way as the famous 8x8-board-with-two-corners-removed problem, and nothing I said could shake its conviction that that was the style of the correct answer. 5/5

4

5

94

1

0

9

@rosstaylor90

Ross Taylor

1 year

On point. Amazing that last November with Galactica some people thought “not always accurate” implied “has no use case” in math or science… The whole point is to have a human in the loop with an artificial association cortex that makes useful connections.

@blader

Siqi Chen

1 year

Terence Tao on his experience with GPT4 in mathematical research: "The 2023-level AI can already generate suggestive hints and promising leads to a working mathematician and participate actively in the decision-making process."

Tweet media one

29

299

1K

1

2

9

@rosstaylor90

Ross Taylor

2 years

@Francis_YAO_ We found that with a well curated niche corpus, with Galactica, you can outperform OPT and BLOOM in general benchmarks… so something was up with the pre-training data for those models.

0

0

9

@rosstaylor90

Ross Taylor

10 months

@_arohan_ @paperswithcode Canary questions are a great idea. Maybe this is possible to measure with some of the existing benchmarks where there are (undeliberate) incorrect labels.

0

0

7

@rosstaylor90

Ross Taylor

1 year

Great post by @ylecun

@ylecun

Yann LeCun

1 year

@tegmark @RishiSunak @vonderleyen Altman, Hassabis, and Amodei are the ones doing massive corporate lobbying at the moment. They are the ones who are attempting to perform a regulatory capture of the AI industry. You, Geoff, and Yoshua are giving ammunition to those who are lobbying for a ban on open AI R&D. If

316

1K

6K

0

2

7

@rosstaylor90

Ross Taylor

2 months

Very cool

@yutasenzai

Yuta Senzai

2 months

New preprint post! We show that motor commands in the superior colliculus shift the internal representation of heading during REM sleep despite the immobility of sleeping mice. Thus, the brain simulates actions and their consequences during REM sleep.🧵1/7

14

114

451

0

1

7

@rosstaylor90

Ross Taylor

2 months

This is great!

@cHHillee

Horace He

2 months

For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. 1/10

Tweet media one

20

260

1K

0

0

7

@rosstaylor90

Ross Taylor

7 months

@agihippo The field gets highly correlated when things start to work (lol)

0

0

7

@rosstaylor90

Ross Taylor

2 years

The “CoT capability comes from code” theory makes no sense, especially for MMLU. The reasoning subset of that benchmark depends on equation recall and substitution/rearranging. Code is not relevant; similar pre-training data is. Hence strong Minerva and Galactica results…

1

0

7

@rosstaylor90

Ross Taylor

5 months

Thanks to @misterkardas for informing me that someone finally found my PyPi Easter egg — four years later.

1

0

7

@rosstaylor90

Ross Taylor

1 year

People think domain specific models -> smaller models, but actually its the opposite. If you are data constrained you need to have *larger* models for fixed compute. (Why we went for 120B Galactica not 70B).

3

0

7

@rosstaylor90

Ross Taylor

5 months

@sudeeppillai @itsandrewgao This is what a fast takeoff looks like

Tweet media one

0

0

6

@rosstaylor90

Ross Taylor

2 years

Hallucination is a problem of metacognition - knowing what you don’t know. The solution is not designing models to know less…Talk about throwing the baby out with the bathwater.

1

0

6

@rosstaylor90

Ross Taylor

7 months

@giffmana @BlancheMinerva Even if you exclude the WikiHow extracts in the test set, you still get a big boost from using the rest of the corpus. I mention this example because it’s public with the Gemini paper, but there are lots of other examples like this where you can find a small, highly in-domain

1

0

4

@rosstaylor90

Ross Taylor

1 year

Treat the rate of each as Poisson distribution. Then B - G is skellam for each hospital. Mean is 0 for each hospital as boys and girls equally likely , but variance of larger hospital (B + G) is higher, which means P(k=0) is lower. So the likelihood is higher for the larger

@wagieeacc

Martin Shkreli (e/acc)

1 year

A silly math question posed in our discord: There are two hospitals in a city, a small and a bigger one. In which of them is the likelihood higher of more boys than girls being born in one day? (Provided that girls and boys are on average born equally frequently).

171

21

953

1

0

5

@rosstaylor90

Ross Taylor

2 years

This year we will see a lot of work pursuing the vision of Model-Machine Symbiosis. Great paper below!

@arankomatsuzaki

Aran Komatsuzaki

@arankomatsuzaki

2 years

Toolformer: Language Models Can Teach Themselves to Use Tools Presents Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. abs:

Tweet media one

24

243

1K

1

0

6

@rosstaylor90

Ross Taylor

11 months

@cunha_tristan I think it’s more productive to think in terms of benchmarks where you think some definition of reasoning is necessary. For example, most people would agree mathematics requires reasoning. If LLMs do well at unseen mathematics tasks, then that would suggest they can at least

3

0

6

@rosstaylor90

Ross Taylor

5 months

Btw this is not to dismiss efforts like @scale_AI ‘s SEAL leaderboard, which are welcome and well-intentioned. But worth mentioning the incentive problem now as it shows problems with evaluation are much deeper than they appear on the surface.

0

0

5

@rosstaylor90

Ross Taylor

2 years

This is really cool. They use Galactica base with their LMX approach to perform symbolic regression. Unlike other SR methods, which require *a lot* of hand-crafted design choices, Galactica just acts as the generator and achieves comparable results. 🤙

Tweet media one

0

0

6