Shayne Longpre Profile Banner
Shayne Longpre Profile
Shayne Longpre

@ShayneRedford

4,861
Followers
1,052
Following
181
Media
2,000
Statuses

Lead the Data Provenance Initiative. PhD @MIT . 🇨🇦 Prev: @Google Brain, Apple, Stanford. Interests: AI/ML/NLP, Data-centric AI, transparency & societal impact

Boston
Joined February 2015
Don't wanna be here? Send us removal request.
@ShayneRedford
Shayne Longpre
2 months
✨New Preprint ✨ How are shifting norms on the web impacting AI? We find: 📉 A rapid decline in the consenting data commons (the web) ⚖️ Differing access to data by company, due to crawling restrictions (e.g.🔻26% OpenAI, 🔻13% Anthropic) ⛔️ Robots.txt preference protocols
Tweet media one
10
94
256
@ShayneRedford
Shayne Longpre
2 years
✨New Paper✨What’s the best completely public competitor to #ChatGPT ? Flan-T5 beats all public models we tested: Flan-T5 3B ▶️ T0++ 3B ▶️ OPT-IML 175B ▶️ GLM-130B ▶️ Flan 2021 3B ▶️ NIv2 3B We release the @GoogleAI 🌟Flan Collection🌟data + methods for Instruction Tuning! 1/
Tweet media one
Tweet media two
24
249
1K
@ShayneRedford
Shayne Longpre
7 months
New Resource: Foundation Model Development Cheatsheet for best practices We compiled 250+ resources & tools for: 🔭 sourcing data 🔍 documenting & audits 🌴 environmental impact ☢️ risks & harms eval 🌍 release & monitoring With experts from @AiEleuther , @allen_ai ,
4
148
624
@ShayneRedford
Shayne Longpre
9 months
✏️ AI Terminology Updates 2022 ➡️ 2024 NLP ➡️ "language modeling" Multi-task training ➡️ instruction tuning Finetuning ➡️ "post-training" Semantic parsing ➡️ API tool use Robustness ➡️ Red teaming Train/test split ➡️ train/train split Transfer learning ➡️ "it's already in the
21
80
573
@ShayneRedford
Shayne Longpre
11 months
📢Announcing the🌟Data Provenance Initiative🌟 🧭A rigorous public audit of 1800+ instruct/align datasets 🔍Explore/filter sources, creators & license conditions ⚠️We see a rising divide between commercially open v closed licensed data 🌐: 1/
10
148
462
@ShayneRedford
Shayne Longpre
1 year
We're releasing our 2nd lecture playlist for Evaluating Generative AI/LLMs, taught at @MIT . Intro to 3 challenge areas: 1⃣ Bias & Toxicity 2⃣ Hallucination & Factuality 3⃣ Robustness & Consistency 📽️: 1/🧵
2
88
388
@ShayneRedford
Shayne Longpre
2 years
📢 A 🧵on the future of NLP model inputs. What are the options and where are we going? 🔭 1. Task-specific finetuning (FT) 2. Zero-shot prompting 3. Few-shot prompting 4. Chain of thought (CoT) 5. Parameter-efficient finetuning (PEFT) 6. Dialog [1/]
Tweet media one
Tweet media two
10
83
375
@ShayneRedford
Shayne Longpre
1 year
#NewPaperAlert When and where does pretraining (PT) data matter? We conduct the largest published PT data study, varying: 1⃣ Corpus age 2⃣ Quality/toxicity filters 3⃣ Domain composition We have several recs for model creators… 📜: 1/ 🧵
Tweet media one
12
88
360
@ShayneRedford
Shayne Longpre
2 years
🔭 A 🧵 on @OpenAI LLM "Alignment" (e.g. #ChatGPT ) Q: How does this differ from publicly available "Instruction Tuning" (IT)? A: Proprietary Alignment is actually 3 separate components: 1⃣ Instruction tuning 2⃣ ➕ Open-ended generation/creative prompts 3⃣ ➕ Human feedback 1/
Tweet media one
2
48
250
@ShayneRedford
Shayne Longpre
7 months
Independent AI research should be valued and protected. In an open letter signed by over a 100 researchers, journalists, and advocates, we explain how AI companies should support it going forward. 1/
Tweet media one
7
78
229
@ShayneRedford
Shayne Longpre
1 year
Excited to present the 🥮Flan Collection🥮 poster at #ICML2023 tomorrow. Come say hello at Exhibition Hall 1, #130 ! A highlight 🧵on Flan-PaLM 🌴, Flan-T5 and Flan data in research and production. 1/
Tweet media one
2
40
225
@ShayneRedford
Shayne Longpre
2 years
📢 A 🧵 on the Trends in NLP Datasets. What’s changed since SQuAD was all the rage in 2016? A: A LOT. 🔭 1. Generic ➡️ Niche Tasks 2. Task-specific Training+Eval ➡️ Eval Only 3. Dataset ➡️ Benchmark ➡️ Massive Collections 4. Datasets ➡️ Diagnostics 1/
Tweet media one
4
46
224
@ShayneRedford
Shayne Longpre
1 year
A wave of new work shows how **brittle** "Alignment"/RLHF safety methods are. ⛓️ Prompt jailbreaks are easy 🚂 Finetuning away safety (even #OpenAI API) is simple and likely undetectable 🤖 LLMs can auto-generate their own jailbreaks... 1/ 🧵
7
44
207
@ShayneRedford
Shayne Longpre
2 years
I started compiling training, inference, and data accessibility for the major LLMs. 1⃣ Is OSS? Has Playground? API? 2⃣ Is pretraining data open? searchable? 3⃣ Links, notes, etc Please take a look, and LMK whats missing! 1/
6
35
167
@ShayneRedford
Shayne Longpre
3 months
📢 Excited to announce the expanded 🌟Responsible Foundation Model Development Cheatsheet🌟 ➡️ A Survey & Review of Tools🛠️& Resources🧮🗝️📚. ➡️ We ask what1⃣what responsible practices developers can adopt, &2⃣ what tools are missing, misused, or under-used in the AI dev
Tweet media one
1
46
133
@ShayneRedford
Shayne Longpre
10 months
🚨OpenAI publicly revoked ByteDance’s API access for training on API data. What ripple effects could come from this precedent? We already know synthetic API-generated data has grown massively: Many startups depend on it… 1/
5
17
127
@ShayneRedford
Shayne Longpre
6 months
A 🧵 on my favorite, influential works on "Data Measurements" 🚂 Datasets drive AI progress 📚 But... massive datasets remain impenetrable & poorly understood for *years* 🔍 Data forensics uncover their mysteries 1/
1
30
128
@ShayneRedford
Shayne Longpre
2 years
📢 For those who missed @naacl , here’s a selection of papers I really enjoyed! 📚 Mainly multi-linguality, prompting, fairness and ethics, but some other ideas too! #NAACL2022 🧵 1/
1
20
122
@ShayneRedford
Shayne Longpre
5 months
🔭 New perspective piece at #ICML2024 & @MIT AI Impact Award winner🎉 🌟Data Authenticity, Consent, and Provenance for AI Are All Broken: What Will It Take to Fix Them?🌟 w/ @RobertMahari @naana_om @wwbrannon @TobinSouth @katyilonka @alex_pentland @jad_kabbara 🔗:
Tweet media one
10
37
121
@ShayneRedford
Shayne Longpre
2 years
Too many new papers to read this week? 📚 See the key take-aways of our new Flan-PaLM work in a 7⃣ minute video. ➕Scaling to 540B model ➕Scaling finetuning to 1.8K tasks ➕CoT tuning ➡️ 🌟MMLU SOTA 75.2% 🌟Better usability 🌟Better CoT Reasoning 🌐:
3
28
118
@ShayneRedford
Shayne Longpre
1 year
This semester my @CCCatMIT co-instructors and I taught #MIT 's first post- #ChatGPT Generative AI course, covering: ➡️Uses and new abilities ➡️LM Evaluation ➡️AI-mediated communication ➡️Societal challenges 📜 Syllabus + reading list 📚: 1/
Tweet media one
2
18
115
@ShayneRedford
Shayne Longpre
2 years
Sharing my *rough* slides from a @CCCatMIT February reading group. Covers "NLP Training Trends for Large Language Models" (LLM) and a survey of 4 new interesting papers: FLAN, T0, ExT5, MetaICL! 📚: [1/6]
Tweet media one
Tweet media two
2
23
113
@ShayneRedford
Shayne Longpre
3 years
📢📜 #NLPaperAlert 🌟Knowledge Conflicts in QA🌟- what happens when facts learned in training contradict facts given at inference time? 🤔 How can we mitigate hallucination + improve OOD generalization? 📈 Find out in our #EMNLP2021 paper! [1/n]
Tweet media one
4
16
114
@ShayneRedford
Shayne Longpre
2 years
Best feeling: when you find out your work was presented in the 500+ person grad class you once took! 💪💪 Stanford CS224N – NLP w/ Deep Learning 📽️: #phdlife #NLProc
1
15
106
@ShayneRedford
Shayne Longpre
11 months
📢 We are expanding the instruct/align datasets in the 🌟Data Provenance Collection🌟 Are there any great/new ones not covered? Available at:
Tweet media one
8
29
105
@ShayneRedford
Shayne Longpre
1 year
We can now see why @OpenAI allegedly uses MoE models -- stronger performance📈 for fewer FLOPs 🚀. Awesome work led by @shengs1123 evaluates instruction tuned MoE models
@shengs1123
Sheng Shen
1 year
A Winning Combination for Large Language Models TL;DR: Did you find MoE models generalize worse than dense models on downstream tasks? Not any more at the age of instruction tuning! Surprisingly, we see the “1 + 1 > 2” effect when it comes to MoE + Instruction Tuning. [1/4]
Tweet media one
8
65
375
3
14
98
@ShayneRedford
Shayne Longpre
1 year
The Flan Collection is now available for direct download. 🌐: Thank you to the prolific @EnricoShippole for generating 10s of GBs of tasks, and @Hou_Le for several bug fixes & improvements!
1
12
99
@ShayneRedford
Shayne Longpre
10 months
Excited to moderate 2 fantastic panels today with @yizhongwyz and @qinyuan_ye at the #NeurIPS2023 @itif_workshop Instruction Workshop! Looking forward to chatting with you there! Schedule:
Tweet media one
Tweet media two
1
13
51
@ShayneRedford
Shayne Longpre
1 year
Recorded the 2nd part of my talk at @databricks , covering Results & Take-Aways from training Flan-T5 and Flan-PaLM 🍮 📽️: (11m) ➡️ In short, how far can we get just with academic datasets (w/o ChatGPT distillation or human preferences)? 1/
1
16
94
@ShayneRedford
Shayne Longpre
3 months
Super appreciative of the recognition from #NAACL2024 — our Pretrainer’s Guide won an 🌟Outstanding Paper Award🌟🏆 This was a year long analysis into pretraining age, quality & toxicity data filters. Gratitude to our team 🙏🏼 @gyauney @emilyrreif @katherine1ee @ada_rob
@ShayneRedford
Shayne Longpre
1 year
#NewPaperAlert When and where does pretraining (PT) data matter? We conduct the largest published PT data study, varying: 1⃣ Corpus age 2⃣ Quality/toxicity filters 3⃣ Domain composition We have several recs for model creators… 📜: 1/ 🧵
Tweet media one
12
88
360
12
19
95
@ShayneRedford
Shayne Longpre
4 years
There are few options for evaluating multilingual question answering outside of English, and especially few for open domain question answering. Yi Lu, @jodaiber , and I are excited to release an Open QA evaluation set, spanning 26 languages. [1/n]
Tweet media one
1
35
86
@ShayneRedford
Shayne Longpre
2 months
Headed to 🛬🇦🇹 Vienna #ICML2024 Reach out if you'd like to chat or catch up! Work together w/ collaborators: - A Safe Harbor for AI Evaluation ⛴️() -- Tuesday 10:30 am Oral - On the Societal Impact of Open Foundation Models () --
9
11
92
@ShayneRedford
Shayne Longpre
1 year
Recorded the 1st part of my talk on the Data and Methods used in the Flan Collection 🍮 (accepted to @icmlconf Honolulu 🏝️) 📽️: (12m) Thanks again to @vagabondjack and @databricks for inviting me! 1/
2
9
78
@ShayneRedford
Shayne Longpre
2 years
What dates📅 can @OpenAI , @AnthropicAI , @CohereAI models reliably answer questions for?🔭 I binary-search through "future" Wiki events to find out. Results ❌🟰❌documentation: #GPT4 ➡️~Dec 19 ('21) #ChatGPT ➡️~Oct 24 Claude v1.2➡️~Oct 10 Cohere XL Nightly➡️~Apr 24 ('22) 1/🧵
Tweet media one
3
11
79
@ShayneRedford
Shayne Longpre
3 years
📢📜 #NLPaperAlert 🌟Active Learning over Multiple Domains in NLP🌟 In new NLP tasks, OOD unlabeled data sources can be useful. But which ones? We try active learning ♻️, domain shift 🧲, and multi-domain sampling🔦 methods to see what works [1/]
Tweet media one
2
18
74
@ShayneRedford
Shayne Longpre
2 years
🔭 How to reduce #LLM generation toxicity/bias? I'm surprised this finding hasn't received any attention: Instruction Tuning (e.g. Flan, T0) reduces toxic generations A LOT ✨ w/o any Human Feedback ✨. ➡️ I.e. #ChatGPT -esque Human values alignment w/o human feedback. 1/
Tweet media one
Tweet media two
4
14
74
@ShayneRedford
Shayne Longpre
1 month
Honored for the Data Provenance Initiative to be awarded the Infrastructure Grant Award, by @mozilla ! 🎉🎉🎉 As part of this grant, we were invited to present at MozFest House Amsterdam, where we gave an early look at trends in the AI data supply chain: 📽️
Tweet media one
7
13
74
@ShayneRedford
Shayne Longpre
10 months
Awesome round-up on synthetic data by @natolambert ! Related: we found a massive increase in synthetic data use in 2023, enabling training on more diverse tasks w/ longer outputs. 🔗:
Tweet media one
@natolambert
Nathan Lambert
10 months
The mega-post on synthetic data. I leave no stone unturned: * What synthetic data is for LLMs * Instructions vs preferences vs critiques * Constitutional AI review & insight * Mistral & other's use in pretraining * OpenAI Superalignment * Examples in the open * other topics
Tweet media one
12
88
531
2
9
72
@ShayneRedford
Shayne Longpre
1 year
This framework is very powerful. Forget Langchain, if you want to create a pipeline of modules I would try DSPy (e.g. chaining multi-hop retrieval, w/ APIs, your own models, and promoting techniques)
@lateinteraction
Omar Khattab
1 year
🚨Announcing 𝗗𝗦𝗣𝘆, the framework for solving advanced tasks w/ LMs. Express *any* pipeline as clean, Pythonic control flow. Just ask DSPy to 𝗰𝗼𝗺𝗽𝗶𝗹𝗲 your modular code into auto-tuned chains of prompts or finetunes for GPT, Llama, and/or T5.🧵
Tweet media one
Tweet media two
24
138
644
0
5
70
@ShayneRedford
Shayne Longpre
2 years
Take a break from Twitter politics and checkout some new LLM responses! 🍰🌴 FLAN-PaLM 🍰🌴 Response of the Day #1 A pretty nice example of Complex Verbal Reasoning
Tweet media one
5
6
64
@ShayneRedford
Shayne Longpre
1 year
🔭 Use instruction tuning 🛠️ for better results 🎯 and reduced compute costs 🍃. A 🧵 Q: Communally what is costlier, finetuning (FT) or pretraining (PT)? A: FT. Why? 1⃣Top 50 NLP models are 70% all downloads ( @huggingface ) 2⃣50 PTs ➡️ 290M+ FTs (assuming 1 FT/download) 1/
Tweet media one
1
12
63
@ShayneRedford
Shayne Longpre
2 years
🚨 Update: Super excited to work with @barret_zoph and @_jasonwei this summer as a @GoogleBrain Student Researcher. Feeling fortunate to work with such accomplished ML/NLP researchers and an awesome team!
3
3
59
@ShayneRedford
Shayne Longpre
2 years
Just diagnosed with a severe case of ChatGPT Twitter fatigue.
3
2
59
@ShayneRedford
Shayne Longpre
3 years
Some personal news. Heading to Boston this fall to start my MS/PhD at with Prof. Deb Roy at the @medialab @MIT . Excited to apply NLP to social + civic challenges — please reach out if you have research ideas/readings! [1/3]
3
4
58
@ShayneRedford
Shayne Longpre
2 years
➡️ It's promising these results don't use any RLHF data, or human "alignment", which is expensive to collect and less publicly available. We hope this release supports the open source community, and improves instruction tuning methods and research! 7/
4
2
56
@ShayneRedford
Shayne Longpre
1 year
Recorded the final part of my talk at @databricks , covering Data Selection Trade-Offs for Instruction Tuning 📽️: (10m) Trade-offs: 1⃣ Permissive vs Restrictive Licensing 2⃣ Traditional NLP performance vs creative generation Are these correlated? 1/
2
11
55
@ShayneRedford
Shayne Longpre
2 months
Was a little nervous for this, but grateful to discuss our work with @juleshyman on @YahooFinance ! @RobertMahari and I discuss emerging challenges for data creators and developers in the AI supply chain—part of the Data Provenance Initiative recent study:
@YahooFinance
Yahoo Finance
2 months
The content used to train AI models used to be plentiful on the internet, but now the sources of that data are cracking down on who has access, according to a study. @ShayneRedford and @RobertMahari discuss how this could impact AI moving forward:
4
7
21
5
6
54
@ShayneRedford
Shayne Longpre
2 years
📢 #NLPaperAlert 🌟Active Learning Over Multiple Domains in Natural Language Tasks 🌟 To appear at Workshop on Distribution Shifts (DistShift) at #NeurIPS2022 later this week! 📜: This is honestly the hardest ML problem I’ve ever worked on. 👉🧵
Tweet media one
2
13
53
@ShayneRedford
Shayne Longpre
9 months
ByteDance v OpenAI⚠️, LAION-5B CSAM☢️ & NYT v OpenAI🛑 illustrate rising lockdown + legal risk on data. Need more informed training data selection? 🔗 Detailed licenses, terms, sources, properties. 📢 Come help us build it! All open sourced. 1/ 🧵
1
19
52
@ShayneRedford
Shayne Longpre
5 months
🚨 New paper + models! Evaluating LLMs using closed-source LLMs has limited transparency, controllability, and affordability. Incredible work by @seungonekim significantly improves all these factors, w/ open models for either relative or absolute response scoring. ⬇️
@seungonekim
Seungone Kim
5 months
#NLProc Introducing 🔥Prometheus 2, an open-source LM specialized on evaluating other language models. ✅Supports both direct assessment & pairwise ranking. ✅ Improved evaluation capabilities compared to its predecessor. ✅Can assess based on user-defined evaluation criteria.
Tweet media one
3
43
168
1
11
52
@ShayneRedford
Shayne Longpre
6 months
The 🌟Data Provenance Initiative🌟 has added dozens of new math 🧮 dataset collections! @ArielNLee @manandey @mhamdy_res have added: 🦆Open Platypus (10) 🧑‍🔬MetaMathQA (8) 🦣Mammoth Math Instruct (13) 🔗 1/
1
17
51
@ShayneRedford
Shayne Longpre
6 months
Excited to see our 🍮Flan-Palm🌴 work finally published in @JmlrOrg 2024! Looking back, I see this work as pushing hard on scaling: post-training data, models, prompting, & eval. We brought together the methods and findings of many awesome prior works, scaled them up, and
1
9
51
@ShayneRedford
Shayne Longpre
8 months
🌿Aya🌿 Dataset & mega-multilingual LLM is out!🚀 By far the most international & exciting collaboration I'm lucky to be part of. @CohereForAI meetings often spanned 12+ time zones Amazing leadership by Madeline, @mziizm @ahmetustun89 @sarahookr ++ 🔗:
@CohereForAI
Cohere For AI
8 months
Today, we’re launching Aya, a new open-source, massively multilingual LLM & dataset to help support under-represented languages. Aya outperforms existing open-source models and covers 101 different languages – more than double covered by previous models.
77
371
1K
1
11
51
@ShayneRedford
Shayne Longpre
2 years
The latest wave of moves to @OpenAI @AnthropicAI @CohereAI etc from Big Tech seems to herald a new chapter to the so called AI/ML “academic brain drain” Academia ➡️ Industry ➡️ Startup
1
5
49
@ShayneRedford
Shayne Longpre
1 year
In March, we were privileged to host @_jasonwei 's guest lecture on “Emergence in Large Language Models”, for broader audiences, covering: ➡️ Intuitions, abilities and limitations ➡️ Scaling & trends ➡️ Emergence, Chain-of-thought, Flan 📽️: 1/
1
6
49
@ShayneRedford
Shayne Longpre
1 year
Thanks for having me! Enjoyed talking to @vagabondjack and the awesome Databricks teams behind Dolly v2 about instruction tuning and open sourcing. Will share out my slides and a video soon!
@vagabondjack
Mike Conover
1 year
Happening now! @ShayneRedford speaking on LLM's as part of the Data Talks speaker series at Databricks. Will be tweeting insights and slides from our inspiring speaker as we go.
Tweet media one
4
40
223
1
5
47
@ShayneRedford
Shayne Longpre
1 year
📢 The🌟Foundation Model Transparency Index🌟 scores 10 developers on 💯 indicators. 1⃣ All 10 score poorly, particularly Data, Labor, Compute 2⃣ Transparency is possible! 82/100 are scored by >1 3⃣ Transparency is a precondition for informed & responsible AI policy. 1/
Tweet media one
4
15
47
@ShayneRedford
Shayne Longpre
2 years
If Flan-T5 XXL (11B) was too small, there’s a new and improved 20B Flan-UL2 open sourced, by @YiTayML ! The new best Open Source model on MMLU and Big Bench Hard
@YiTayML
Yi Tay
2 years
New open source Flan-UL2 20B checkpoints :) - Truly open source 😎 No forms! 🤭 Apache license 🔥 - Best OS model on MMLU/Big-Bench hard 🤩 - Better than Flan-T5 XXL & competitive to Flan-PaLM 62B. - Size ceiling of Flan family just got higher! Blog:
51
343
2K
0
10
47
@ShayneRedford
Shayne Longpre
7 months
🌟Data Selection🌟 for text has quickly become one of the most important choices for LLMs ✍️ @AlbalakAlon comprehensively formalizes & surveys data filtering ✂️, mixing🥣, dedupliation 🧑‍🤝‍🧑, & selection for a variety of criteria. Loved contributing to the Multi-task/Instruction
@AlbalakAlon
Alon Albalak
7 months
{UCSB|AI2|UW|Stanford|MIT|UofT|Vector|Contextual AI} present a survey on🔎Data Selection for LLMs🔍 Training data is a closely guarded secret in industry🤫with this work we narrow the knowledge gap, advocating for open, responsible, collaborative progress
Tweet media one
10
77
307
0
4
46
@ShayneRedford
Shayne Longpre
2 years
#NLPaperAlert Proud of our new work! New findings, SOTA results, and, my personal favourite, 🌟 Open sourcing new models 🌟, significantly better than current T5s!
@quocleix
Quoc Le
2 years
New open-source language model from Google AI: Flan-T5 🍮 Flan-T5 is instruction-finetuned on 1,800+ language tasks, leading to dramatically improved prompting and multi-step reasoning abilities. Public models: Paper:
Tweet media one
40
488
2K
2
5
45
@ShayneRedford
Shayne Longpre
6 months
@KreutzerJulia et al. break down multilingual web data (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4), by language representation, quality, and characteristics: Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets 📜 10/
Tweet media one
Tweet media two
1
3
44
@ShayneRedford
Shayne Longpre
2 years
Is it just me or is the “AGI” misinfo really peaking lately. Below: Repurposes and misquotes other speculative mis-infographic. And this tweet has massive engagement.
Tweet media one
5
5
43
@ShayneRedford
Shayne Longpre
2 years
The 🌟Flan Collection🌟 (1st used in Flan-PaLM ): ➕ Merges Flan 2021, P3, NIv2, CoT instruction-datasets into 1800+ dataset collection ➕ Data augmentations and mixing strategies ➕ 100s new templates 2/
Tweet media one
1
4
43
@ShayneRedford
Shayne Longpre
1 year
Excited to co-host the Instruction Tuning & Instruction Following (ITIF) Workshop at #NeurIPS2023 . 👷🛠️🤖 An incredible line up of topics and speakers. ➡️ Submission opening soon: Stay tuned!
@itif_workshop
Instruction Workshop, NeurIPS 2023
1 year
Excited to announce the Workshop on 👷🛠️🤖 Instruction Tuning & Instruction Following 👷🛠️🤖 (ITIF) at #NeurIPS2023 ! 📅 Join us on Dec 15 in New Orleans 🛠️ Submit by Oct 1 👷 See speaker lineup 🔗 1/
Tweet media one
Tweet media two
1
17
63
0
5
43
@ShayneRedford
Shayne Longpre
2 years
🍮🌴 FLAN-PaLM 🍮🌴 Response of the Day #2 Apparently it has a sense of humour!
Tweet media one
1
2
42
@ShayneRedford
Shayne Longpre
6 months
📢 Want to automatically generate your bibtex for 1000s of @huggingface text datasets? @chien_vu1692 just added this feature + data summaries for: ➡️ huge collections like Flan, P3, Aya... ➡️ popular OpenAI-generated datasets ➡️ ~2.5k+ datasets & growing 🔗:
Tweet media one
0
14
41
@ShayneRedford
Shayne Longpre
2 years
Is it just me, or are a lot of the best new NLP papers on my feed all getting accepted to *Findings* of #EMNLP2022 ? Findings >= Full conference?
2
0
41
@ShayneRedford
Shayne Longpre
2 years
@soniajoseph_ Votes for: ➡️ Model compression/acceleration ➡️ Interpretability ➡️ Multimodalities ➡️ Human-AI interaction (unintended consequences, human health, etc)
1
1
40
@ShayneRedford
Shayne Longpre
1 year
✨New Paper, Model, Data, & Eval ✨ What’s the best completely public, commercially-viable code LLM? ➡️Introducing, 🐙OctoCoder 🐙, w/ a new instruction code dataset, and expanded task+language HumanEval. ➡️Led by @Muennighoff , these achieve (non-closed) 🌟SotA🌟! 1/
@Muennighoff
Niklas Muennighoff
1 year
How to instruction tune Code LLMs w/o #GPT4 data? Releasing 🐙🤖OctoCoder & OctoGeeX: 46.2 on HumanEval🌟SoTA🌟of commercial LLMs 🐙📚CommitPack: 4TB of Git Commits 🐙🎒HumanEvalPack: HumanEval extended to 3 tasks & 6 lang 📜 💻 1/9
Tweet media one
Tweet media two
7
88
292
1
8
39
@ShayneRedford
Shayne Longpre
2 years
Grateful to be featured on the @GoogleAI blog with @ada_rob !
@GoogleAI
Google AI
2 years
Today we’re releasing a new collection of tasks, templates and methods for instruction tuning of #ML models. Training on this collection can enable language models to reason more competently over arbitrary, unseen tasks. Learn all about it at:
Tweet media one
9
71
390
0
3
38
@ShayneRedford
Shayne Longpre
2 years
Heading to @naacl today 🛫 So excited for my 1st in-person conference since NeuRIPS 2019! Please reach out to grab coffee and talk research! #NAACL2022
0
1
36
@ShayneRedford
Shayne Longpre
7 months
This work is unique in that every collaborator co-led a section or modality, bringing their expertise without which the cheatsheet wouldn’t be possible. Special thanks to @BlancheMinerva @soldni @YJernite @AlbalakAlon @gabriel_ilharco @sayashk @kevin_klyman @kylelostat
0
3
35
@ShayneRedford
Shayne Longpre
7 months
Looking for permissively licensed biomedical🧬text data? @mhamdy_res has added several data collections to the 🌟Data Provenance Initiative🌟: ➡️ MedInstruct ➡️ ChatDoctor ➡️ Medical Meadow ➡️ PMC-LLaMA Instructions 🔗 1/
2
7
35
@ShayneRedford
Shayne Longpre
2 years
Funny how we got it all flipped. AI in Science Fiction 📚🤖: highly analytical/factual, but struggles with humor and emotion. (eg Data from Star Trek) AI now 💬: great humor and creativity, but logically inconsistent/factually negligent (eg #ChatGPT )
2
4
35
@ShayneRedford
Shayne Longpre
10 months
@yizhongwyz & @qinyuan_ye moderating the 2nd @itif_workshop panel now @nazneenrajani @colinraffel @haozhangml and @tatsu_hashimoto provide their spicy🌶️ takes on evaluation, open vs closed, and how we close the performance gap!
Tweet media one
0
6
32
@ShayneRedford
Shayne Longpre
11 months
Context: A Crisis in Data Transparency ➡️Instruct/align finetuning often compiles 100s of datasets ➡️How can devs filter for datasets without legal/ethical risk, and understand the resulting data composition? 2/
Tweet media one
2
5
34
@ShayneRedford
Shayne Longpre
2 years
Q: But why are the results strong? Our breakdown of the Flan Collection shows *why* it works. The most important methods: 🌟Finding 1🌟 Fine-tuning on zero-shot and few-shot prompts together significantly improves both settings (not a trade-off)! 4/
Tweet media one
1
2
32
@ShayneRedford
Shayne Longpre
5 months
🚨 New #ICML2024 position piece. The most overlooked risks of AI stem from autonomous weaponry For 4 reasons: 1⃣ Arms race w/ ⬇️ human oversight 2⃣ Reduces cost of starting conflicts 3⃣ Evades accountability 4⃣ Battlefield errors aren’t considered costly See our work led by
Tweet media one
3
13
33
@ShayneRedford
Shayne Longpre
1 year
@suchenzang Respectfully, this feels a little unfair. Three awesome, nuanced threads (by Aran, Yizhong, and the new paper Yi cites, linked below) all suggest different training data, architecture, and vocab each lead to different advantages, without any **one size fits all**. T0, Flan-T5,
1
0
32
@ShayneRedford
Shayne Longpre
1 year
In Honolulu for #ICML2023 23-29 🏝️🌺 Msg me if you'd like to meet/chat! Presenting the Flan Collection🥮 (Wedn 11-12:30), and a new initiative on Data Provenance at the Workshop, Saturday!
0
3
32
@ShayneRedford
Shayne Longpre
2 years
🌟Finding 3 🌟 The Flan-T5 model converges higher and faster than T5 on single-task fine-tuning. ➡️ Recommendation: Use Flan-T5 as your base model for new tasks. ✅Better computational-efficiency and performance! 6/
Tweet media one
Tweet media two
5
3
31
@ShayneRedford
Shayne Longpre
11 months
📢 We're planning to rapidly expand the coverage of finetuning datasets in the Data Provenance Initiative. ➡️Our Goal: provide detailed source+license annotations for all popular instruct/align datasets. 🌟If you'd like to contribute, reach out!🌟 1/
@ShayneRedford
Shayne Longpre
11 months
📢Announcing the🌟Data Provenance Initiative🌟 🧭A rigorous public audit of 1800+ instruct/align datasets 🔍Explore/filter sources, creators & license conditions ⚠️We see a rising divide between commercially open v closed licensed data 🌐: 1/
10
148
462
4
11
31
@ShayneRedford
Shayne Longpre
2 years
This yields the best performing instruction tuning collection that has been compiled and released into one repo. See our survey Figure of the prior works we built on to produce this compilation. 3/
Tweet media one
2
2
30
@ShayneRedford
Shayne Longpre
6 months
📢 Check out @_anthonychen and my invited talk at the at the @USC_ISI Natural Language Seminar: 📜 "The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI" Thank you @HJCH0 for hosting!
2
10
31
@ShayneRedford
Shayne Longpre
2 years
The new bible for Transformer architecture/objective choices!
@ada_rob
Adam Roberts
2 years
I've found myself referring to this table in the appendix of our recent paper () A LOT so I thought I'd point it out for those who might benefit from it. Basic upshot is that ED:MLM works really well, especially for classification tasks, even in 0shot.
Tweet media one
5
13
94
0
2
30
@ShayneRedford
Shayne Longpre
11 months
Really excited to see our Pretrainer's Guide work get a shoutout in this amazing new Dataset! "Quality" and "Toxicity" filters for pretraining data can have a massive impact and is important to get right.
@togethercompute
Together AI
11 months
We are excited to release RedPajama-Data-v2: 30 trillion filtered & de-duplicated tokens from 84 CommonCrawl dumps, 25x larger than our first dataset. It exposes a diverse range of quality annotations so you can slice & weight the data for LLM training.
Tweet media one
20
281
1K
2
4
30
@ShayneRedford
Shayne Longpre
5 months
Congrats to a phenomenal team of collaborators! 👏🏼🎉
@CohereForAI
Cohere For AI
5 months
🪴 We’re thrilled about the acceptance of "Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model" - major congrats to @ahmetustun89 , @viraataryabumi , @yong_zhengxin , @weiyinko_ml , @mrdanieldsouza , @lekeonilude , @NeelBhandari9 , @singhshiviii , Hui-Lee Ooi,
Tweet media one
1
11
33
0
6
29
@ShayneRedford
Shayne Longpre
4 years
QA models are surprisingly accurate on partial, OOD inputs. Yi, @chrisdubois , and I ask what factors affect this? Perplexingly we find this phenomenon invariant to random seed, architecture, pretraining, even training domain! See our short preprint [1/5]
1
8
30
@ShayneRedford
Shayne Longpre
1 year
Question: What common benchmarks should all Instruct/Chat-LLMs be evaluated on? (We're considering a special track for @itif_workshop #NeurIPS2023 to promote more fair comparison.) 1/
4
5
29
@ShayneRedford
Shayne Longpre
10 months
The importance of understanding what’s in your data 👈
@samleecole
Samantha Cole
10 months
big breaking news: LAION just removed its datasets, following a study from Stanford that found thousands of instances of suspected child sexual abuse material
2
593
2K
0
3
28
@ShayneRedford
Shayne Longpre
7 months
Our goals: ➡️ Lower barriers to newer community members ➡️ Bring awareness to thoughtful tools & good practices, not just speed of development ➡️ Iterate a live, evolving resource (🌟you can contribute!🌟) 2/
Tweet media one
1
2
27
@ShayneRedford
Shayne Longpre
1 year
📢 #NLPaperAlert Incredible work led by @seungonekim @jshin491 & Yejin Cho! ➡️ Ranking responses is subjective ➡️ Do you prefer short vs long, formal vs informal, creative vs precise responses? ➡️ 🔥Prometheus 🔥➡️Open LM trained to respect custom criteria Check it out!
@seungonekim
Seungone Kim
1 year
Excited to present 🔥Prometheus, a fully open-source evaluator LM that is on par with GPT-4 evaluation when the “appropriate” reference materials are appended! * Could generalize to customized score rubrics * Shows high correlation with both human evaluators & GPT-4 evaluation
9
51
348
0
6
28
@ShayneRedford
Shayne Longpre
4 months
Fantastic work, w/ a new, thoughtful, task-diverse, fine-grained eval benchmark, BigGen Bench. + Eval scaling laws + Task difficulty rankings + Task metric +/- from base ➡️ chat models + A trove of other insights! Hard to believe @seungonekim is an *incoming* PhD!
@seungonekim
Seungone Kim
4 months
🤔How can we systematically assess an LM's proficiency in a specific capability without using summary measures like helpfulness or simple proxy tasks like multiple-choice QA? Introducing the ✨BiGGen Bench, a benchmark that directly evaluates nine core capabilities of LMs.
Tweet media one
7
54
186
2
4
28
@ShayneRedford
Shayne Longpre
2 years
🌟Finding 2🌟 Input inversion and data source balancing (as proposed and corroborated by MetaICL, T0, OPT-IML and others...) are incredibly important for successful instruction tuning. See our ablations Table. 5/
Tweet media one
2
3
26
@ShayneRedford
Shayne Longpre
2 months
Lastly, this research was the result of incredible dedication from a team of international collaborators, from undergrads to professors, designers, social scientists, AI developers, and independent researchers, all on a volunteer basis. A heartfelt thank you to our awesome team
0
3
27
@ShayneRedford
Shayne Longpre
9 months
💯% this 👇 We don’t need anymore bi-weekly model releases with 0.3% improvement on MMLU (or radar plot)
@soldni
Luca Soldaini 🎀
9 months
my one 2024 ML wish: more careful, thought out, open language model research—stop rushing! neither e/acc nor closed-door AI, sorry to disappoint 😌
1
4
81
1
1
27
@ShayneRedford
Shayne Longpre
4 years
Proud of our 1st EMNLP Findings paper!! 🎉🎉 w/ @LeBrave2016 and @chrisdubois Q: How effective are common NLP data augmentations with pre-trained transformers? A: Often times, not very. (Negative result!) [1/4]
1
7
25