Ross Taylor Profile Banner
Ross Taylor Profile
Ross Taylor

@rosstaylor90

6,606
Followers
982
Following
23
Media
2,183
Statuses

Something new 🐠. Prev: reasoning lead @metaai , LLaMA 2/3, @paperswithcode co-creator, Galactica LLM lead, Atlas ML (acq by Meta), sports betting, UK Treasury

▽²f = 0
Joined March 2012
Don't wanna be here? Send us removal request.
Pinned Tweet
@rosstaylor90
Ross Taylor
1 year
Here’s my ICML talk on teaching LLMs to reason (video and slides). Like everyone else, I can’t talk about what I’m working on right now, but I tried to provide a useful overview of the history of LLMs and reasoning, current areas of focus, and potential directions. Enjoy!
9
40
314
@rosstaylor90
Ross Taylor
11 months
I am the first author of the Galactica paper and have been quiet about it for a year. Maybe I will write a blog post talking about what actually happened, but if you want the TLDR: 1. Galactica was a base model trained on scientific literature and modalities. 2. We approached
@sharongoldman
Sharon Goldman
11 months
One year ago — 2 weeks before @OpenAI released ChatGPT — @Meta released Galactica. The LLM was public for only 3 days, but its lessons led to decisions around Llama's release. Thanks to @jpineau1 for chatting w/ me and h/t to @ylecun Read here: ⏬
10
80
486
94
325
3K
@rosstaylor90
Ross Taylor
6 months
I left Meta yesterday. Nothing but positive things to say: FAIR and GenAI are great places to do research and engineering. Will miss my colleagues! LLMs have shown how magical deep learning can be in a data-rich regime. But many domains remain data-constrained, which prevents
32
47
951
@rosstaylor90
Ross Taylor
11 months
Why are LLMs bad at reasoning? One theory says this is due to weaknesses in maximum likelihood, where the probability mass “overgeneralises” to low quality solutions. Because our pretraining objective (likelihood) doesn’t transfer to our evaluation objective (accuracy), the
51
40
409
@rosstaylor90
Ross Taylor
10 months
Controversial take: open LLM leaderboards have been a net negative to the field as they’ve encouraged leaderboard hacking, training on in-domain datasets (likely test sets too), GPT distillation and other practices that confound comparison. On @paperswithcode we never allowed
@agihippo
yi
10 months
"open source is catching up"
2
1
34
13
21
242
@rosstaylor90
Ross Taylor
3 years
Our mission at @paperswithcode is to to index all scientific information and then convert this information into useful knowledge. The datasets index for ML is getting really comprehensive! What’s next? Stay tuned 🙃.
@paperswithcode
Papers with Code
3 years
🎉 We've just crossed 5000 Datasets! 🎉 We now index and organize more than 5000 research datasets for machine learning. A huge thanks to the research community for their ongoing contributions. Browse the full catalogue here:
Tweet media one
15
484
2K
2
31
239
@rosstaylor90
Ross Taylor
2 years
A year’s journey; glad to get this out! The vision is to build a megafunction that models all of Nature. Small steps with this work. We broke a few rules about training LLMs on the way, with some great results.
@paperswithcode
Papers with Code
2 years
🪐 Introducing Galactica. A large language model for science. Can summarize academic literature, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more. Explore and get weights:
285
2K
8K
8
28
242
@rosstaylor90
Ross Taylor
1 year
Fun fact, when Attention is All You Need came out, Twitter was mostly focused on the SeLU paper (and its appendix). Not a one off either; there are countless other examples of Twitter picking the wrong “winner”. The best way to predict the future is to do the work yourself and
6
18
199
@rosstaylor90
Ross Taylor
2 years
Not sure we ever detailed our experience training the 120B Galactica model, but tldr: have a mechanism in place for skipping batches, lower LR after periods of sustained instability, use longer warmup to hedge against bad init, pray to the AI Gods 🙏.
4
11
194
@rosstaylor90
Ross Taylor
7 months
What’s worse is that bad public evaluation leads to a race to the bottom. Because of Prisoners’ Dilemma, if everyone else is benchmark hacking, you have a strong incentive to benchmark hack yourself. Case in point: - There’s no such thing as a “base model” anymore as people are
@aidangomez
Aidan Gomez
7 months
I don't think people have fully internalized just how broken public evaluation of models is.
37
43
501
14
8
114
@rosstaylor90
Ross Taylor
1 year
ICML is my first ever ML conference (I’m antisocial 😅). Some observations: - People are really excited about LLaMA 2 and open source LLMs in general. - RLHF seems to be working well for everyone in most LLM domains (chatbots, code, reasoning). - Hawaiian shirt % is pretty low. I
1
3
93
@rosstaylor90
Ross Taylor
1 year
It’s disingenuous to speak of the risks of open source AI without acknowledging the risks of closed source AI. Is it wise to have concentration of extreme power with a few large, unaccountable organisations? If there are guardians, who chooses the guardians?
3
11
84
@rosstaylor90
Ross Taylor
11 months
Was not expecting a hastily written, early morning Galactica retrospective to bounce so much - given it happened 3000 years ago in AI time. To close the topic, here’s a great talk below by Sean Murray on No Man’s Sky and how they “engoodened” things (after some initial missteps)
3
7
82
@rosstaylor90
Ross Taylor
11 months
@DrJimFan Nice writeup! Quick correction though: ORMs still learn a per token loss, and can learn to assign credit to intermediate steps. Covered an example here in my talk :
Tweet media one
2
4
80
@rosstaylor90
Ross Taylor
2 years
100% the worst LLM take is that weights shouldn’t store knowledge. They might not store *all* knowledge, but their compiled knowledge in weight memory is the source of their creativity (and value) over retrieval approaches.
6
3
79
@rosstaylor90
Ross Taylor
5 months
“Some guy” 👀 #mypension
@itsandrewgao
andrew gao
5 months
some guy is sitting on pip install agi
23
35
588
2
1
75
@rosstaylor90
Ross Taylor
1 month
Internal reasoning tokens, circa 2022. <work> 🙂 (It’s a shame people aren’t exploring similar ideas for visual reasoning. Good reasoning requires a visual and a propositional code)
Tweet media one
6
3
69
@rosstaylor90
Ross Taylor
1 month
This was always going to happen given the hype and money in the field + poor evaluation standards. People need to be way more sceptical of new model releases. Far too many instances of “benchmark hill-climbing grift” going on as a vehicle to create hype. This case is
@shinboson
𝞍 Shin Megami Boson 𝞍
1 month
A story about fraud in the AI research community: On September 5th, Matt Shumer, CEO of OthersideAI, announces to the world that they've made a breakthrough, allowing them to train a mid-size model to top-tier levels of performance. This is huge. If it's real. It isn't.
Tweet media one
115
731
7K
4
6
52
@rosstaylor90
Ross Taylor
11 months
@coppola_ai Good research, bad launch. Bad launch not because of stupidity but because of loss of situational awareness due to excessive workload. To fix: if your team is operating above capacity, then make sure to have good internal feedback sources that can help you see the wood from the
3
4
66
@rosstaylor90
Ross Taylor
11 months
@ChurchillMic Yes, to be clear I think the commentary was completely overblown. We were directionally right (and early) with what we were doing, but the way the demo was executed was wrong. So it’s not really apologetic, more like “here’s what happened in case you were wondering”
1
0
63
@rosstaylor90
Ross Taylor
6 months
What people get wrong about foundation models for science: too much focus on generating new discoveries and ideas - way too little focus on instrumentation. The biggest driver of scientific progress is new instruments. So the biggest impact of deep learning will be accelerating
5
4
62
@rosstaylor90
Ross Taylor
5 months
Every time we tried this it had little benefit, but still had plenty of people asserting to me as a fact that “training on code helps with reasoning”. Reality: the biggest gains come from training on the same domain. If your downstream task involves LaTeX math, then the biggest
@HeinrichKuttler
heiner
5 months
Transfer of skills (e.g., train on coding to help with 'reasoning') is more often asserted than demonstrated.
2
5
46
6
1
57
@rosstaylor90
Ross Taylor
7 months
What happened, three possibilities: 1. They both use the same annotation provider for instruction tuning, who has provided the same prompt/answer data to two different companies… 2. The annotators themselves are using the same language models to help annotate (eg GPT-4), which
@seshubon
seshu bonam
7 months
WHAT? @inflectionAI is just a claude-3-sonnet wrapper? care to explain? 🐒 Produces the exact same answer word to word for a custom query i asked 🤯
Tweet media one
Tweet media two
Tweet media three
Tweet media four
67
66
907
6
1
49
@rosstaylor90
Ross Taylor
3 months
The fundamental tension of doing ambitious projects: Long periods in the wilderness trying to make new things work. Meanwhile, the world continues to move around you, harvesting the low-hanging fruit. Delayed gratification is hard, but worth it!
0
3
46
@rosstaylor90
Ross Taylor
10 months
My 2024 wishes for for open source / open science in ML: - Less free-riding on GPT outputs; instead more open innovation in post-training to obtain outputs of a similar or better quality. - More understanding that RLHF is not a capability “nerf” but a capability enhancer:
1
2
40
@rosstaylor90
Ross Taylor
3 months
LLMs are trained to imitate human outputs; not human latents. This explains the majority of the alignment problem, as well as deficits in areas such as reasoning - where we do not observe mental scaffolding (internal context) on the internet - we only observe output context.
6
4
40
@rosstaylor90
Ross Taylor
1 month
Congrats to @OpenAI on the amazing results! o1-mini results particularly impressive. My guess: larger models are more “wasteful” in spending more time in the forward pass on easier tokens. Therefore it’s better to have a smaller model that can more efficiently allocate compute
@OpenAI
OpenAI
1 month
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.
987
4K
18K
1
3
39
@rosstaylor90
Ross Taylor
28 days
@deliprao To be fair to OpenAI, the early GPT papers were a gold mine and I’m not sure a lot of the open LLM efforts would have been possible without them. So I think “parasitic” is way too harsh! (My bigger gripe: their fuelling of anon hype accounts which have polluted the X feed and
2
1
37
@rosstaylor90
Ross Taylor
2 years
Thanks to everyone for the kind words on the Galactica paper. Will not be commenting on the wider release, but I am optimistic that LLMs will evolve from tools of association (creativity + search + idea generation) into aligned models that preserve factualness. Stay tuned!
1
1
33
@rosstaylor90
Ross Taylor
28 days
I don’t subscribe to the view that “ideas are cheap, execution is everything”, in the sense that good intuition is hard to come by. But what is true is that new ideas are incredibly fragile. At an organisation that prioritises reactive plays, these ideas will die without
3
4
32
@rosstaylor90
Ross Taylor
1 year
Great paper. Last year we had similar problem where Galactica could do SMILES -> IUPAC but not the inverse. Solution was to augment the data and shuffle the PubChem layout (lol). Simple rephrasing of existing datasets is likely to yield large benefits for generalisation.
@OwainEvans_UK
Owain Evans
1 year
Does a language model trained on “A is B” generalize to “B is A”? E.g. When trained only on “George Washington was the first US president”, can models automatically answer “Who was the first US president?” Our new paper shows they cannot!
Tweet media one
175
706
4K
3
3
31
@rosstaylor90
Ross Taylor
1 year
I don’t think anyone has a grasp on what society will look like with each person’s intelligence increased by an order of magnitude. But it does not immediately follow that restricting power and access to an elite minority is better than decentralising this power. Not obvious.
4
5
30
@rosstaylor90
Ross Taylor
9 months
System 2 is fiction. It does not exist as an area in the brain. It might be a useful psychological abstraction, but it doesn’t tell you anything useful about how to achieve goal-directed behaviour computationally.
13
0
27
@rosstaylor90
Ross Taylor
4 months
Great paper with lots of ablations breaking down the benefits of Mamba layers versus self-attention. TLDR: Mamba layers worse than self-attention on in-context learning and context-based information retrieval. Hybrid model seems to get best of both worlds*. Also nice tidbit:
@ctnzr
Bryan Catanzaro
4 months
A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less
Tweet media one
18
77
451
1
3
28
@rosstaylor90
Ross Taylor
1 year
I signed. Closed source AI is more dangerous than open source AI in the short-medium run due to lack of transparency about how models are built and developed. Letting just a few companies develop advanced AI makes the problem worse, not better. We need more eyeballs on weights
0
5
26
@rosstaylor90
Ross Taylor
5 months
What would you build if you took deep learning seriously? Ten years ago the answer was AGI: a term that would make you look kooky in research circles. Now every post-ChatGPT startup has AGI as their mission. It’s no longer a sign of ambition for a new company. Maybe a slightly
3
0
27
@rosstaylor90
Ross Taylor
5 months
I’m pleased to release a new version of my popular Python library agi today: Enjoy! #feeltheagi
@rosstaylor90
Ross Taylor
5 months
“Some guy” 👀 #mypension
2
1
75
4
1
27
@rosstaylor90
Ross Taylor
5 months
One of great joys of life: meeting lots of talented people (new and old faces) working on big problems. Thanks NY & SF - now back to London for a bit 🇬🇧.
Tweet media one
1
1
27
@rosstaylor90
Ross Taylor
3 months
Congrats to my former colleagues for this exciting launch! Special shoutout to @iliyanzarov , @ViktorKerkez , @tonyjhartshorn , @louvishh , anirudh and others for the reasoning work! 🙂
@AIatMeta
AI at Meta
6 months
Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes. Today's release includes the first two Llama 3
351
1K
6K
0
0
25
@rosstaylor90
Ross Taylor
16 days
If you optimised AI releases for Twitter traction a few years ago, you could expect it to be fairly targeted, well-informed feedback. But nowadays feedback is dominated by a low-quality hype factor. This leads to difficult tradeoffs about how you talk about your work, and
2
1
23
@rosstaylor90
Ross Taylor
3 months
@_xjdr Everyone on X assumes that good LLM performance is due to “fancy secret method”, but more often than not it’s just solid execution of well-established recipes.
0
0
21
@rosstaylor90
Ross Taylor
5 months
Credit rating agencies had misaligned incentives in the 2000s: the providers of the products they rated were the ones paying them. (My first job was regulating CDOs post-crisis, lol) Similarly a company that sells data to frontier firms for LLMs is probably not the right one to
1
2
21
@rosstaylor90
Ross Taylor
5 months
At Google I/O!
Tweet media one
3
2
21
@rosstaylor90
Ross Taylor
5 months
There is a surprising amount of value just by taking common assertions people make, ask “Why?” a couple of times, and find out that it has no grounding other than the fact that lots of people keep saying it. The introspective version of this: if you’ve believed anything for 5
1
1
20
@rosstaylor90
Ross Taylor
2 years
@OctothorpeVoid Yep similar to PaLM, when you have a bad spike, record the batch index and skip it. You can also sometimes reduce the LR by a very small magnitude and the spike won't happen (which suggests spikes are due to an unstable loss surface, not the data per say).
1
0
19
@rosstaylor90
Ross Taylor
10 months
@JPobserver This is a popular way to explain things. I’m a bit sceptical of system 1/system 2 as it’s a psychological abstraction that doesn’t actually tell you how the brain works. I think you can get a better idea of what LLMs might be missing by looking at how the prefrontal cortex
1
0
19
@rosstaylor90
Ross Taylor
8 months
AGI test: When will AI be able to work as a human-level wedding planner? Specs: - Talks to couple, understands their preferences - Rings up venues, negotiates prices etc - Hires musicians, catering, arranges transport - Designs and pushes out the website - Sends out invites -
4
0
18
@rosstaylor90
Ross Taylor
2 years
We evaluated regularly to see the effect of spikes. While val loss usually recovered quickly, downstream eval showed it would sometimes forget a task (e.g. Yes/No QA) and take a long while to recover. So when debugging spikes, don’t just look at val loss…
1
1
16
@rosstaylor90
Ross Taylor
11 months
@erhartford @ylecun @didntdrinkwater @paulsutter Nougat was another product of the Galactica team, by the way 🙂
0
0
14
@rosstaylor90
Ross Taylor
2 years
Please follow @misterkardas for insights on LLM training, who was the reason training ran so smoothly for us.
1
0
14
@rosstaylor90
Ross Taylor
11 months
This is great! Sorely needed benchmark.
@tsawada_ml
Tom Sawada
11 months
GPQA: A Graduate-Level Google-Proof Q&A Benchmark abs: pdf: - a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry; - experts who have or are pursuing PhDs in the
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
6
33
0
1
13
@rosstaylor90
Ross Taylor
3 months
X is a study on how most people form expectations adaptively rather than rationally. Explains everything from “e/acc” internal contradictions on AI safety, to “LLMs can’t do x” arguments that are outdated in a few months. The big changes are when progress undermines current
0
0
14
@rosstaylor90
Ross Taylor
1 year
These analyses are disingenuous without comparisons to search engines.
@DrNikkiTeran
Nikki Teran
1 year
Will releasing the weights of large language models grant widespread access to pandemic agents? Turns out, yes, probably. 1/5
Tweet media one
60
103
452
1
1
14
@rosstaylor90
Ross Taylor
7 months
@BlancheMinerva As an example, WikiHow and HellaSwag.
3
0
14
@rosstaylor90
Ross Taylor
6 months
Rui is amazing and you guys should all follow him.
@magpie_rayhou
Rui Hou
6 months
Excited to release a preview version of Llama3 with superb performance to the community! More to come soon!
2
4
29
1
0
14
@rosstaylor90
Ross Taylor
1 year
People doing reasoning self-correction papers: please ablate against simple majority voting. Thanks 🙏 As with many reasoning papers this year, the bulk of the performance increase below comes from GPT-4 distillation, not the method that is introduced — which is likely beaten
@_akhaliq
AK
1 year
Learning From Mistakes Makes LLM Better Reasoner paper page: Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve this capability, this work proposes Learning from Mistakes (LeMa),
Tweet media one
1
83
351
0
2
14
@rosstaylor90
Ross Taylor
2 years
I’d like to thank the entire @paperswithcode team for their incredible work on this. @misterkardas @rbstojnic @g_cucurull @omarsar0 @ThomasScialom @tonyjhartshorn @ViktorKerkez @iliyanzarov , Lukas Blecher, Andrew Poulton, @andrewkuanop . Honored to work with them.
2
0
14
@rosstaylor90
Ross Taylor
2 months
Somehow missed this when it was originally posted - awesome work! #chemlactica
@YerevaNN
YerevaNN
3 months
1/12 We are happy to announce the release of our language models for optimizing small organic molecules. Built on top of Galactica and Gemma, they come in three sizes: Chemlactica-125M, Chemlactica-1.3B, and Chemma-2B. The preprint is on arxiv:
1
13
66
0
0
13
@rosstaylor90
Ross Taylor
4 months
Amazing work! Congrats to the team #tokenizenature
@alexrives
Alex Rives
4 months
We have trained ESM3 and we're excited to introduce EvolutionaryScale. ESM3 is a generative language model for programming biology. In experiments, we found ESM3 can simulate 500M years of evolution to generate new fluorescent proteins. Read more:
147
839
3K
0
0
13
@rosstaylor90
Ross Taylor
9 months
@ylecun Yes, I’m with you on the inference process (simulating consequences, evaluating them, and planning), and the idea we can use inference compute for policy improvement. I think what irks me about system 1/2 as a metaphor is: 1) When we overuse two-factor metaphors, people end up
3
1
12
@rosstaylor90
Ross Taylor
5 months
@AlbalakAlon Nice paper! I understand this through an NFL type argument. There is no such thing as “data quality” because everything (even spam) is a task to be learnt. However in practice there is a subset of tasks you care about more. That means there is a “free lunch” by reweighting the
1
1
13
@rosstaylor90
Ross Taylor
1 year
@agihippo I wanted to scream when I saw that post this morning. Massive elephant in the room called “pretraining data” that influencers couldn’t seem to see 🐘…
2
0
13
@rosstaylor90
Ross Taylor
1 year
@soumithchintala The Act of Creation by Arthur Koestler is the best philosophy read on this. My favourite anecdote on how far away AI is: the structure of benzene was discovered by someone having a dream of a snake eating its own tail 🙂.
0
1
11
@rosstaylor90
Ross Taylor
7 months
Update: it looks like the user posted the Claude response in another conversation with Pi, so it echoed the same response, per @inflectionAI . Guessing game over…
@rosstaylor90
Ross Taylor
7 months
What happened, three possibilities: 1. They both use the same annotation provider for instruction tuning, who has provided the same prompt/answer data to two different companies… 2. The annotators themselves are using the same language models to help annotate (eg GPT-4), which
6
1
49
3
0
12
@rosstaylor90
Ross Taylor
2 months
Had a lot of fun talking to Nathan a few weeks back about various topics in LLM land! Note: my thought-to-speech module was a little off due to jet lag — so the text transcript is good if you prefer that medium ☺️.
@natolambert
Nathan Lambert
2 months
Finally got to chat with @rosstaylor90 -- exactly why I started this series. So much signal on the LLM life cycle from training to demos. Reasoning, Galactica, demo backlash, post-training, sft vs rlhf, LLMs for science, realistic agents, PRMs, and other topics
2
11
62
0
0
11
@rosstaylor90
Ross Taylor
3 months
The real LLM benchmark: be useful for the best human experts in each field.
@teortaxesTex
Teortaxes▶️
3 months
> Ed Witten finds Claude's insights on quantum mechanics meh Humans will reach superhuman performance in goalpost-moving before AI gets superhuman at science
10
4
80
2
0
11
@rosstaylor90
Ross Taylor
8 months
One of the problems with LLM creativity is that new methods and tools have relatively few mentions in the literature, so creativity can’t rely as much on (flexible) weight memory; and instead must be in-context using some form of retrieval. We often think of creativity as the
0
0
11
@rosstaylor90
Ross Taylor
2 years
If you believe LLMs don’t have meaning because they are “ungrounded”, then you also believe there is no such thing as history. My take on the English civil war is “ungrounded” because it is based on inference from historical texts, not 17th century “world knowledge”.
3
0
10
@rosstaylor90
Ross Taylor
6 years
- The latest ML research with code implementations in several frameworks ( @TensorFlow , @PyTorch , ..). Also lots of pretty gifs...
1
4
10
@rosstaylor90
Ross Taylor
1 month
@_xjdr Traditional CoT data doesn’t have steps like “alternatively, maybe we could try x” (branching) or “actually, that’s wrong” (self-correction) —> so the easiest thing people should try is making a SFT dataset with these types of step, then do PPO on top, as the model will now have
3
0
10
@rosstaylor90
Ross Taylor
11 months
Next up, Code Llama in a rubber duck?
@Mascobot
Marco Mascorro
1 year
This Mistral Parrot 🦜 has Mistral 7B running locally and you can talk it 😅 By @antimatter15
10
21
163
2
0
10
@rosstaylor90
Ross Taylor
5 months
Finally got a chance to visit the Computer History Museum in Mountain View. So inspiring! Was fun to think of the future sections that will be added. The GenAI boom of the 2020s…and the more important things that come after :)
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
9
@rosstaylor90
Ross Taylor
2 months
@rtk254 It’s not a question of focus but a consequence of Moravec’s paradox, no?
1
0
9
@rosstaylor90
Ross Taylor
1 month
LLMs cling too much to the training corpus. Or: knowledge is not the same as intelligence.
@wtgowers
Timothy Gowers @wtgowers
1 month
Basically, it was determined to treat this problem in the same way as the famous 8x8-board-with-two-corners-removed problem, and nothing I said could shake its conviction that that was the style of the correct answer. 5/5
4
5
94
1
0
9
@rosstaylor90
Ross Taylor
1 year
On point. Amazing that last November with Galactica some people thought “not always accurate” implied “has no use case” in math or science… The whole point is to have a human in the loop with an artificial association cortex that makes useful connections.
@blader
Siqi Chen
1 year
Terence Tao on his experience with GPT4 in mathematical research: "The 2023-level AI can already generate suggestive hints and promising leads to a working mathematician and participate actively in the decision-making process."
Tweet media one
29
299
1K
1
2
9
@rosstaylor90
Ross Taylor
2 years
@Francis_YAO_ We found that with a well curated niche corpus, with Galactica, you can outperform OPT and BLOOM in general benchmarks… so something was up with the pre-training data for those models.
0
0
9
@rosstaylor90
Ross Taylor
10 months
@_arohan_ @paperswithcode Canary questions are a great idea. Maybe this is possible to measure with some of the existing benchmarks where there are (undeliberate) incorrect labels.
0
0
7
@rosstaylor90
Ross Taylor
1 year
Great post by @ylecun
@ylecun
Yann LeCun
1 year
@tegmark @RishiSunak @vonderleyen Altman, Hassabis, and Amodei are the ones doing massive corporate lobbying at the moment. They are the ones who are attempting to perform a regulatory capture of the AI industry. You, Geoff, and Yoshua are giving ammunition to those who are lobbying for a ban on open AI R&D. If
316
1K
6K
0
2
7
@rosstaylor90
Ross Taylor
2 months
Very cool
@yutasenzai
Yuta Senzai
2 months
New preprint post! We show that motor commands in the superior colliculus shift the internal representation of heading during REM sleep despite the immobility of sleeping mice. Thus, the brain simulates actions and their consequences during REM sleep.🧵1/7
14
114
451
0
1
7
@rosstaylor90
Ross Taylor
2 months
This is great!
@cHHillee
Horace He
2 months
For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. 1/10
Tweet media one
20
260
1K
0
0
7
@rosstaylor90
Ross Taylor
7 months
@agihippo The field gets highly correlated when things start to work (lol)
0
0
7
@rosstaylor90
Ross Taylor
2 years
The “CoT capability comes from code” theory makes no sense, especially for MMLU. The reasoning subset of that benchmark depends on equation recall and substitution/rearranging. Code is not relevant; similar pre-training data is. Hence strong Minerva and Galactica results…
1
0
7
@rosstaylor90
Ross Taylor
5 months
Thanks to @misterkardas for informing me that someone finally found my PyPi Easter egg — four years later.
1
0
7
@rosstaylor90
Ross Taylor
1 year
People think domain specific models -> smaller models, but actually its the opposite. If you are data constrained you need to have *larger* models for fixed compute. (Why we went for 120B Galactica not 70B).
3
0
7
@rosstaylor90
Ross Taylor
5 months
@sudeeppillai @itsandrewgao This is what a fast takeoff looks like
Tweet media one
0
0
6
@rosstaylor90
Ross Taylor
2 years
Hallucination is a problem of metacognition - knowing what you don’t know. The solution is not designing models to know less…Talk about throwing the baby out with the bathwater.
1
0
6
@rosstaylor90
Ross Taylor
7 months
@giffmana @BlancheMinerva Even if you exclude the WikiHow extracts in the test set, you still get a big boost from using the rest of the corpus. I mention this example because it’s public with the Gemini paper, but there are lots of other examples like this where you can find a small, highly in-domain
1
0
4
@rosstaylor90
Ross Taylor
1 year
Treat the rate of each as Poisson distribution. Then B - G is skellam for each hospital. Mean is 0 for each hospital as boys and girls equally likely , but variance of larger hospital (B + G) is higher, which means P(k=0) is lower. So the likelihood is higher for the larger
@wagieeacc
Martin Shkreli (e/acc)
1 year
A silly math question posed in our discord: There are two hospitals in a city, a small and a bigger one. In which of them is the likelihood higher of more boys than girls being born in one day? (Provided that girls and boys are on average born equally frequently).
171
21
953
1
0
5
@rosstaylor90
Ross Taylor
2 years
This year we will see a lot of work pursuing the vision of Model-Machine Symbiosis. Great paper below!
@arankomatsuzaki
Aran Komatsuzaki
2 years
Toolformer: Language Models Can Teach Themselves to Use Tools Presents Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. abs:
Tweet media one
24
243
1K
1
0
6
@rosstaylor90
Ross Taylor
11 months
@cunha_tristan I think it’s more productive to think in terms of benchmarks where you think some definition of reasoning is necessary. For example, most people would agree mathematics requires reasoning. If LLMs do well at unseen mathematics tasks, then that would suggest they can at least
3
0
6
@rosstaylor90
Ross Taylor
5 months
Btw this is not to dismiss efforts like @scale_AI ‘s SEAL leaderboard, which are welcome and well-intentioned. But worth mentioning the incentive problem now as it shows problems with evaluation are much deeper than they appear on the surface.
0
0
5
@rosstaylor90
Ross Taylor
2 years
This is really cool. They use Galactica base with their LMX approach to perform symbolic regression. Unlike other SR methods, which require *a lot* of hand-crafted design choices, Galactica just acts as the generator and achieves comparable results. 🤙
Tweet media one
0
0
6