Shunyu Yao Profile
Shunyu Yao

@ShunyuYao12

10,580
Followers
954
Following
79
Media
660
Statuses

Language agents (ReAct, Reflexion, Tree of Thoughts) for digital automation (WebShop, SWE-bench, SWE-agent)

Joined June 2020
Don't wanna be here? Send us removal request.
@ShunyuYao12
Shunyu Yao
2 months
almost forget to post: i've joined @OpenAI ! time to convert the research vision to reality, and expect something exciting to drop out :)
83
18
1K
@ShunyuYao12
Shunyu Yao
1 year
Still use ⛓️Chain-of-Thought (CoT) for all your prompting? May be underutilizing LLM capabilities🤠 Introducing 🌲Tree-of-Thought (ToT), a framework to unleash complex & general problem solving with LLMs, through a deliberate ‘System 2’ tree search.
Tweet media one
93
594
3K
@ShunyuYao12
Shunyu Yao
5 months
I've defended my PhD! "Language Agents: From Next-Token Prediction to Digital Automation" - Talk (WebShop, SWE-bench, ReAct, ToT, CoALA, and on the future of agents): - Thesis (covers even more):
Tweet media one
45
64
814
@ShunyuYao12
Shunyu Yao
5 months
I will present my thesis defense tomorrow! Language Agents: From Next-Token Prediction to Digital Automation - 10am EST on Thursday, May 2 - - WebShop, ReAct, ToT, CoALA - Briefly: SWE-bench/agent - Thoughts on the future of language agents
Tweet media one
24
54
670
@ShunyuYao12
Shunyu Yao
1 year
🧠🦾ReAct -> 🔥FireAct Most language agents prompt LMs - ReAct, AutoGPT, ToT, Generative Agents, ... - Which is expensive, slow, and non-robust😢 Most fine-tuned LMs not for agents... FireAct asks: WHY NOT? Paper, code, data, ckpts: (1/5)
Tweet media one
6
112
483
@ShunyuYao12
Shunyu Yao
1 year
Language Agents are cool & fast-moving, but no systematic way to understand & design them.. So we use classical CogSci & AI insights to propose Cognitive Architectures for Language Agents (🐨CoALA)! w/ great @tedsumers @karthik_r_n @cocosci_lab (1/6)
Tweet media one
10
116
464
@ShunyuYao12
Shunyu Yao
7 months
Solving >10% of our SWE-Bench () is THE most impressive result in 2024 so far, and a milestone for the research and application of AI agents. Congrats @cognition_labs !
@cognition_labs
Cognition
7 months
Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is
5K
11K
45K
14
37
444
@ShunyuYao12
Shunyu Yao
2 years
Large Language Models (LLM) are 🔥in 2 ways: 1.🧠Reason via internal thoughts (explain jokes, math reasoning..) 2.💪Act in external worlds (SayCan, ADEPT ACT-1, WebGPT..) But so far 🧠and💪 remain distinct methods/tasks... Why not 🧠+💪? In our new work ReAct, we show 1+1>>2!
@_akhaliq
AK
2 years
ReAct: Synergizing Reasoning and Acting in Language Models abs:
Tweet media one
3
26
150
10
66
389
@ShunyuYao12
Shunyu Yao
1 year
The art of programming is interactive. Why should coding benchmarks be "seq2seq"? Thrilled to present 🔄InterCode, next-gen framework of coding tasks as standard RL tasks (action=code, observation=execution feedback) paper, code, data, pip: (1/7)
Tweet media one
11
69
387
@ShunyuYao12
Shunyu Yao
1 year
Code released at , thanks for waiting! It's intentionally kept minimalistic (core ~ 100 lines), though some features (e.g. variable breadth across steps) can be easily added to improve perf & reduce cost. (1/2)
@ShunyuYao12
Shunyu Yao
1 year
Still use ⛓️Chain-of-Thought (CoT) for all your prompting? May be underutilizing LLM capabilities🤠 Introducing 🌲Tree-of-Thought (ToT), a framework to unleash complex & general problem solving with LLMs, through a deliberate ‘System 2’ tree search.
Tweet media one
93
594
3K
6
73
305
@ShunyuYao12
Shunyu Yao
5 months
Coding is the frontier of AI. Excited to push the two frontiers of AI coding: 1. SWE(-bench/agent) 2. Olympiad programming (this tweet) Introduce USACO benchmark: * inference methods (RAG/reflect) help a bit: 9->20% * human feedback helps a lot: 0->86%!
Tweet media one
8
50
303
@ShunyuYao12
Shunyu Yao
5 months
SWE-agent paper: If you care about - SWE: check detailed design & analysis - agents: check principles & insights for agent-computer interface (ACI) - HCI/CogSci: check parallels in ACI/understanding LMs! chat w/ @jyangballin @_carlosejimenez @iclr_conf !
Tweet media one
0
29
201
@ShunyuYao12
Shunyu Yao
6 months
Extremely excited to open-source our SWE-agent that achieves SoTA on SWE-bench😃 Turns out ReAct + Agent-Computer Interface (ACI) can go a long way, very excited about the implications for SWE and beyond!
@jyangballin
John Yang
6 months
SWE-agent is our new system for autonomously solving issues in GitHub repos. It gets similar accuracy to Devin on SWE-bench, takes 93 seconds on avg + it's open source! We designed a new agent-computer interface to make it easy for GPT-4 to edit+run code
Tweet media one
64
426
2K
5
16
166
@ShunyuYao12
Shunyu Yao
1 year
Major update: we made a pip package for ToT! pip install tree-of-thoughts-llm Learn more about how to use ToT for your use cases:
Tweet media one
@ShunyuYao12
Shunyu Yao
1 year
Still use ⛓️Chain-of-Thought (CoT) for all your prompting? May be underutilizing LLM capabilities🤠 Introducing 🌲Tree-of-Thought (ToT), a framework to unleash complex & general problem solving with LLMs, through a deliberate ‘System 2’ tree search.
Tweet media one
93
594
3K
3
44
168
@ShunyuYao12
Shunyu Yao
1 month
We have confirmed a list of awesome speakers and panelists for our #NeurIPS2024 workshop on System 2 Reasoning at Scale! Come on Dec 15!
Tweet media one
3
9
136
@ShunyuYao12
Shunyu Yao
15 days
This is just the beginning of a new paradigm. So much more to come :)
@OpenAI
OpenAI
15 days
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.
985
4K
18K
4
2
122
@ShunyuYao12
Shunyu Yao
1 year
A summary thread of our recent (i.e. after GPT-3) work in language agents, in tweets👇 ( also provides a nice summary --- I might be the first researcher that includes "tweet" column for publication?🤷)
4
17
120
@ShunyuYao12
Shunyu Yao
17 days
Will talk about history and future of llm agents next monday - stay tuned!
@dawnsongtweets
Dawn Song
21 days
Large Language Model Agents is the next frontier. Really excited to announce our Berkeley course on LLM Agents, also available for anyone to join as a MOOC, starting Sep 9 (Mon) 3pm PT! 📢 Sign up & join us:
Tweet media one
26
254
1K
1
6
115
@ShunyuYao12
Shunyu Yao
10 months
I'll give an oral talk about Tree of Thoughts @NeurIPSConf at 3:45-4pm CST on Dec 13 (4C), with the poster session right after ( #410 ). I'm also on the faculty job market this year, so DM me if you wanna chat😃 (Other posters: InterCode #522 , Reflexion #508 , 5-7pm Dec 14)
@ShunyuYao12
Shunyu Yao
1 year
Still use ⛓️Chain-of-Thought (CoT) for all your prompting? May be underutilizing LLM capabilities🤠 Introducing 🌲Tree-of-Thought (ToT), a framework to unleash complex & general problem solving with LLMs, through a deliberate ‘System 2’ tree search.
Tweet media one
93
594
3K
2
13
100
@ShunyuYao12
Shunyu Yao
1 year
What to do if someone implemented my work (tree of thoughts) but failed to acknowledge what is the offical repo, and have more stars than offical repo, and might mislead people about the content of the work (i.e. implementation might not be paper's ideas)
7
22
97
@ShunyuYao12
Shunyu Yao
2 months
Very excited for this convergence of interest :)
@OpenAI
OpenAI
2 months
We're releasing a new iteration of SWE-bench, in collaboration with the original authors, to more reliably evaluate AI models on their ability to solve real-world software issues.
409
522
3K
0
3
95
@ShunyuYao12
Shunyu Yao
6 months
People still surprised by such things across pairs among ReAct ToT Reflexion CoALA WebShop SWE-bench SWE-agent😂
Tweet media one
7
3
94
@ShunyuYao12
Shunyu Yao
1 year
Write a sentence with "dog, frisbee, catch, throw" 👉Too easy for 7B LM... Will (constrained) text generation (CTG) "die out" like many other NLP tasks, in face of LLM? 👉Excited to introduce 🐕COLLIE, next-gen CTG that even challenges GPT-4! (1/n)
Tweet media one
2
13
94
@ShunyuYao12
Shunyu Yao
1 year
Meme aside, Check SWE-bench that hits many checks for a good benchmark - hard but useful to solve, easy to evaluate - automatically constructed from real GitHub issues and pull requests - challenge super long context, retrieval, coding, etc - can easily update with new instances
Tweet media one
@_carlosejimenez
carlos
1 year
Can LMs 🤖 replace programmers 🧑‍💻? - Not yet! Our new benchmark, SWE-bench, tests models on solving real issues from GitHub. Claude 2 & GPT-4 get <5% acc. 🔗 See our leaderboard, paper, code, data: 🧵
Tweet media one
12
90
474
1
10
88
@ShunyuYao12
Shunyu Yao
1 year
We show huge gains on 3 new tasks GPT-4 can't solve directly or with CoT (hard to find!) due to a need for planning / searching: game of 24, creative writing, crosswords.
Tweet media one
3
1
82
@ShunyuYao12
Shunyu Yao
1 year
Not at #ICML2023 but happy to finally release a @princeton_nlp blog post written by me and @karthik_r_n on the opportunities and risks of language agents should be a fun 10min read! it's a very new subject, so please leave any comments here👇
Tweet media one
2
24
83
@ShunyuYao12
Shunyu Yao
7 days
Huge thanks to @dawnsongtweets @xinyun_chen_ for inviting me to such a timely, well-organized, and extremely popular class! Check out the recording for my talk on the history and overview of llm agents - hope u like it 😀
Tweet media one
@dawnsongtweets
Dawn Song
7 days
Join us for 3rd lecture on agent frameworks in our LLM Agents MOOC, @chi_wang_ @GoogleDeepMind & @jerryjliu0 @llama_index , 3:10pm Sep 23: , with 7K+ registered learners, 3.5K+ members in discord! 🎉📢 Huge thanks to @ShunyuYao12 @OpenAI for 2nd lecture
Tweet media one
6
38
199
3
2
84
@ShunyuYao12
Shunyu Yao
9 months
If you ever learn a bit of computer systems or programming, you know the most intriguing and magical idea in CS is memory. Same (will be true) for AI or at least the study of autonomous agents.
1
4
76
@ShunyuYao12
Shunyu Yao
1 year
ToT achieves 10x perf by leveraging LLM's ability to 1. generate diverse choices of intermediate "thoughts" toward problem solving 2. self-evaluate thoughts via deliberate reasoning With 3. search algorithms (e.g., bfs/dfs) that help systematically explore the problem space
Tweet media one
4
8
73
@ShunyuYao12
Shunyu Yao
6 months
When I first saw Tree of Thoughts I also asked myself this😀 great exploration into if next-token prediction can simulate search, and if you're interested in this you probably also wanna check out last paragraph
@noahdgoodman
noahdgoodman
6 months
When I first saw Tree of Thoughts, I asked myself: If language models can reason better by searching, why don't they do it themselves during Chain of Thought? Some possible answers (and a new paper): 🧵
8
48
343
3
0
74
@ShunyuYao12
Shunyu Yao
1 year
New preprint time :) We propose Referral-Augmented Retrieval (RAR), an extremely simple augmentation technique that significantly improves zero-shot information retrieval. Led by awesome undergrad @_michaeltang_ , w/ @jyangballin @karthik_r_n
Tweet media one
1
11
66
@ShunyuYao12
Shunyu Yao
1 year
If intelligence is "emergent complex behavior", then Autonomous Language Agents (ALA) like BabyAGI and AutoGPT start to enter that arena? Will revise my slides & a blogpost draft about ALA w.r.t. recent progress and share soon Quick thoughts👇 (1/n)
@blader
Siqi Chen
1 year
the top three trending repos on github are all self-prompting “primitive agi” projects: 1) babyagi by @yoheinakajima 2) autogpt by @SigGravitas 3) jarvis by @Microsoft these + scaling gets you the rest of the way there.
Tweet media one
51
301
2K
2
11
62
@ShunyuYao12
Shunyu Yao
2 years
What if you had a bot you could just instruct in English to shop online for you? Check out our latest work 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents w/ @__howardchen @jyangballin , @karthik_r_n @princeton_nlp
1
14
61
@ShunyuYao12
Shunyu Yao
1 year
For example, on game of 24 ("4 9 10 13"->“(13-9)*(10-4)=24”), CoT only solves 4% games --- and already fails 60% of games after generating just first 3 words! Why? LM token-by-token decoding does not allow lookahead, backtrack, or exploration of different thoughts globally.
Tweet media one
2
3
58
@ShunyuYao12
Shunyu Yao
1 year
One of the most fundamental insights in CS: program is just a text. Two of the most amazing inventions in CS: LM and compiler. Both can turn program into some other text: reasoning or execution feedback Late night wild thoughts…
2
2
59
@ShunyuYao12
Shunyu Yao
1 year
Bonus 2: If you care more about "hardcore" language (instead of math/word) problems, or just enjoy CHAOS, check out our invented creative writing task! How would you find a way to chain random sentences into a coherent passage? Would you similarly plan and compare?😀
Tweet media one
9
1
57
@ShunyuYao12
Shunyu Yao
3 months
Excited to share what I did @SierraPlatform with @noahrshinn pedram and @karthik_r_n ! 𝜏-bench evaluates critical agent capabilities omitted by current benchmarks: robustness, complex rule following, and human interaction skills. Try it out!
@karthik_r_n
Karthik Narasimhan
3 months
Excited to release 𝜏-bench (TAU for Tool-Agent-User ⚒️-🤖-🧑), a new benchmark to evaluate AI agents' performance and reliability in real-world settings with dynamic user and tool interaction. Paper: , Blog:
4
26
117
2
11
57
@ShunyuYao12
Shunyu Yao
7 months
Is it just me or haystack is too stupid for benchmarking long context capabilities? If swe-bench is too hard, at least we need tasks that require reasoning over multiple parts of long context in multiple steps
8
1
53
@ShunyuYao12
Shunyu Yao
6 months
Will visit @agihouse_org for the first time this Saturday and talk about SWE-agent, Agent-Computer Interface (ACI), and answer questions😃
@jyangballin
John Yang
6 months
SWE-agent is our new system for autonomously solving issues in GitHub repos. It gets similar accuracy to Devin on SWE-bench, takes 93 seconds on avg + it's open source! We designed a new agent-computer interface to make it easy for GPT-4 to edit+run code
Tweet media one
64
426
2K
0
7
53
@ShunyuYao12
Shunyu Yao
5 months
@karthik_r_n @princeton_nlp @PrincetonPLI My thesis into 2 takeaways: 1. Language is the general-purpose representation for various external environments and internal thoughts. Thus, language agent is general. 2. Language reasoning can be seen as internal actions for agents. Thus, language agent is special.
0
4
52
@ShunyuYao12
Shunyu Yao
1 year
Read more about - how ToT modularizes thought decomposition/generation/evaluation & search algorithm to suit diverse tasks - formal framework (rare in prompting era) - many more experiments & findings - Inspirations from CogSci & Root of AI (eg )!
2
2
50
@ShunyuYao12
Shunyu Yao
2 years
Updates: - Jupyter notebooks to try out ReAct prompting with GPT-3: - 5-min video explaining ReAct: - Oral presentations at NeurIPS FMDM, EMNLP EvoNLP & NILLI workshops, happy to chat in New Orleans/Abu Dhabi and meet new friends!
@ShunyuYao12
Shunyu Yao
2 years
Large Language Models (LLM) are 🔥in 2 ways: 1.🧠Reason via internal thoughts (explain jokes, math reasoning..) 2.💪Act in external worlds (SayCan, ADEPT ACT-1, WebGPT..) But so far 🧠and💪 remain distinct methods/tasks... Why not 🧠+💪? In our new work ReAct, we show 1+1>>2!
10
66
389
7
7
50
@ShunyuYao12
Shunyu Yao
2 months
Will we use llm as a text tool or llm use us as multimodal tool? After all we might be good at and enjoy different things😀
2
0
48
@ShunyuYao12
Shunyu Yao
3 years
Hierarchical structure is a core aspect of language syntax. Recurrent networks can systematically process recursion by emulating stacks, but can self-attention networks? If so, how? Our #ACL2021 paper shed lights into this fundamental issue! (1/5)
Tweet media one
1
6
46
@ShunyuYao12
Shunyu Yao
6 months
in some sense, math is the first programming language, and mathematician's mind (+scratchpad) is the first compiler
3
1
45
@ShunyuYao12
Shunyu Yao
2 years
Going to NeurIPS and present my Phd papers in person for the FIRST time😀! Anyone interested in WebShop (), ReAct (), building language agents, language grounding/interaction/theory, NLP + RL...Let's DM and SCHEDULE A CHAT😆! (1/3)
2
2
42
@ShunyuYao12
Shunyu Yao
6 months
What we call reasoning in AI is algorithm in CS
2
3
41
@ShunyuYao12
Shunyu Yao
1 year
Had a great hour w/ @hwchase17 @charles_irl @mbusigin @yoheinakajima talking about autonomous language agents, ReAct, LangChain, BabyAGI, context management, critic, safety, and many more. Look forward to more @LangChainAI webinars, they're awesome! Replay at the same link 👇
@hwchase17
Harrison Chase
1 year
Our webinar on agents starts in 1 hour It's the most popular webinar we've hosted yet, so we had to bring in the best possible moderator: @charles_irl Come join Charles, myself, @ShunyuYao12 , @mbusigin and @yoheinakajima for some lively discussion :)
19
48
350
3
6
41
@ShunyuYao12
Shunyu Yao
6 months
SWE-agent led by amazing @jyangballin @_carlosejimenez , first authors of SWE-bench Besides code base, also check out our discord
@_carlosejimenez
carlos
6 months
SWE-Agent is an open-source software engineering agent with a 12.3% resolve rate on SWE-Bench! Check out SWE-agent in action at Repo:
30
112
561
2
5
40
@ShunyuYao12
Shunyu Yao
1 year
Check out our new preprint led by @RTomMcCoy and thank him for getting me to know the word 'ember'🔥 Tldr: language models (LMs) are not humans, just like planes are not birds. So analyzing LMs shouldn't just use human behavior or performance tests!
@RTomMcCoy
Tom McCoy
1 year
🤖🧠NEW PAPER🧠🤖 Language models are so broadly useful that it's easy to forget what they are: next-word prediction systems Remembering this fact reveals surprising behavioral patterns: 🔥Embers of Autoregression🔥 (counterpart to "Sparks of AGI") 1/8
Tweet media one
36
292
1K
3
2
40
@ShunyuYao12
Shunyu Yao
1 year
Thanks @USC_ISI @HJCH0 ! I talked about Formulation (CoALA) and Evaluation (Collie/InterCode/WebShop) of language agents, two directions that I find - important but understudied - academia could uniquely contribute! slides: video:
@HJCH0
Justin Cho 조현동
1 year
We had the pleasure of @ShunyuYao12 give us a talk at USC ISI's NL Seminar "On Formulating and Evaluating Language Agents🤖" Check out his recorded talk to learn about a unified taxonomy for work on language agents and the next steps forward on evaluating them for complex tasks!
1
0
11
1
4
38
@ShunyuYao12
Shunyu Yao
2 years
ICLR week! Finally muster up a long-due tweet for our spotlight work: Linking Emergent and Natural Languages via Cospus Transfer paper: code: poster: Apr 27 13:30 - 15:30 EDT 1/n
Tweet media one
2
8
39
@ShunyuYao12
Shunyu Yao
1 year
Bonus: if you're a crosswords fan, check out how ToT plays 😀 We improve game success from 1% -> 20%, but incorporating better search algorithms (e.g. how you maintain your thoughts) and heuristics (e.g. how you prune) should further enhance LLM!
Tweet media one
1
3
37
@ShunyuYao12
Shunyu Yao
2 years
@AdeptAILabs This is super cool! We had a similar research idea in one domain (shopping), but it'd be much more powerful to train a multitask general language agent
1
4
37
@ShunyuYao12
Shunyu Yao
4 years
@ybisk @_jessethomason_ I find it hard too...
Tweet media one
1
0
36
@ShunyuYao12
Shunyu Yao
4 years
Happy to announce our new #emnlp2020 paper “Keep CALM and Explore: Language Models for Action Generation in Text-based Games” is online! w/ Rohan, @mhauskn , @karthik_r_n arxiv: code: more below (1/n)
1
9
36
@ShunyuYao12
Shunyu Yao
4 years
For autonomous tasks with language (e.g. text games), how much does an agent rely on language semantics vs. memorization? Our #NAACL2021 paper (, joint w/ @karthik_r_n , @mhauskn ) proposes ablation studies with surprising findings and useful insights! (1/3)
Tweet media one
1
2
32
@ShunyuYao12
Shunyu Yao
2 years
Flying to #EMNLP for the first #NLP conference in my life, despite being an OLD fourth year phd student😂😂 Would love to meet new friends and chat about language grounding and interaction!
2
0
32
@ShunyuYao12
Shunyu Yao
6 months
Attention is all we need. Memory is all we have.
0
2
31
@ShunyuYao12
Shunyu Yao
9 months
Just realize another analog of humans and LLMs: we develop and evaluate them on what is easy to evaluate (SAT or MMLU) then set them out on what is hard to evaluate.
1
5
30
@ShunyuYao12
Shunyu Yao
1 year
@noahshinn024 et al did Reflexion in Mar 2023, and tons of LLM-critic projects since. Still, we worked on Reflexion v2 . What for? - clean & general conceptual framework via language agent/RL - strong empirical perf on more diverse & complex tasks (1/n)
Tweet media one
2
2
28
@ShunyuYao12
Shunyu Yao
11 months
Agent is not just future but now
0
3
27
@ShunyuYao12
Shunyu Yao
2 years
1. ReAct > 🧠/💪only methods, e.g. - On knowledge reasoning tasks, interacting with wiki API obtains new knowledge and avoids hallucination. - On decision making tasks, sparse+flexible thoughts can decompose goal, plan actions, induce commonsense, track progress, adjust plan..
Tweet media one
1
1
27
@ShunyuYao12
Shunyu Yao
7 months
This meme will fly high as swe agents fly high
@ShunyuYao12
Shunyu Yao
1 year
Meme aside, Check SWE-bench that hits many checks for a good benchmark - hard but useful to solve, easy to evaluate - automatically constructed from real GitHub issues and pull requests - challenge super long context, retrieval, coding, etc - can easily update with new instances
Tweet media one
1
10
88
0
1
27
@ShunyuYao12
Shunyu Yao
10 months
Can someone tell me whats qstar
9
1
26
@ShunyuYao12
Shunyu Yao
1 year
Tree of Thoughts is a serious paper and serious research, not a github star chasing leverage. I appreciate any implementation of any of my work, but it should link to what is the offical implementation to avoid confusion and abuse.
2
2
26
@ShunyuYao12
Shunyu Yao
7 months
@cognition_labs we will release some interesting research results on coding agents soon😀
0
0
26
@ShunyuYao12
Shunyu Yao
5 months
Congrats @andrewwhite01 and team! Still one of the coolest applications of ReAct and language agents, and one of my personal favorite!
@andrewwhite01
Andrew White 🐦‍⬛
5 months
ChemCrow is out today in @NatMachIntell ! ChemCrow is an agent that uses chem tools and a cloud-based robotic lab for open-ended chem tasks. It’s been a journey to get to publication and I’d like to share some history about it. It started back in 2022. 1/8
Tweet media one
16
74
460
0
2
25
@ShunyuYao12
Shunyu Yao
1 year
Great blogpost about recent advances in Autonomous Language Agents -- now can try them all in LangChain
@hwchase17
Harrison Chase
1 year
🤖Autonomous Agents & Agent Simulations🤖 Four agent-related projects (AutoGPT, BabyAGI, CAMEL, and Generative Agents) have exploded recently We wrote a blog on they differ from previous @LangChainAI agents and how we've incorporated some key ideas
31
168
828
0
4
25
@ShunyuYao12
Shunyu Yao
10 months
Cool to see followup efforts using Tree of Thoughts () for important applications (LLM safety and jailbreaking) ... And a growth of Tree-of-x work😂
@aminkarbasi
Amin Karbasi
10 months
In collaboration with @robusthq , yesterday we shared "Tree of Attacks" a method than can jailbreak @OpenAI GPT-4 like 90% of the times. It was just covered in @wired
Tweet media one
4
50
236
1
0
25
@ShunyuYao12
Shunyu Yao
1 year
Excited about this work on emergent communication (EC)! EC's been a tricky subject (i.e. lots of toy papers), but IMO the true potential is unleashing soon. Simplest reason: we're running out of language by humans on Internet. Have to use machine's self-generated language soon!
@YaoMarkMu1
Yao Mark Mu
1 year
Our new paper, EC^2, has been published in CVPR2023. It presents a novel video-language per-training scheme via emergent communication for few-shot embodied control. Project page: Paper:
Tweet media one
0
8
39
2
2
24
@ShunyuYao12
Shunyu Yao
1 year
@DrJimFan 's "no-gradient architecture" is exactly what we call "verbal reinforcement learning". Awesome progress in this direction using a great testbed! It is fair to say we (significantly) haven't reached the capabilitity limit out of just calling gpt4 apis. Still much to do!
@DrJimFan
Jim Fan
1 year
What if we set GPT-4 free in Minecraft? ⛏️ I’m excited to announce Voyager, the first lifelong learning agent that plays Minecraft purely in-context. Voyager continuously improves itself by writing, refining, committing, and retrieving *code* from a skill library. GPT-4 unlocks
365
2K
9K
3
2
23
@ShunyuYao12
Shunyu Yao
9 months
Never done pair programming but did pair prompting for the first time today😂
0
0
22
@ShunyuYao12
Shunyu Yao
2 years
3. ReAct naturally produces more interpretable and trustworthy trajs, where humans can - inspect fact source (internal vs. external) - check reasoning basis of decisions - modify model thoughts for policy edit on-the-go, an exciting new paradigm for human-machine interaction!
Tweet media one
1
1
20
@ShunyuYao12
Shunyu Yao
2 years
2. ReAct generalizes strongly, both in few-shot prompting and finetuning. e.g. - On WebShop/AlfWorld, 1/2-shot ReAct outperforms imitation learning w/ 3k/100k samples by 10/34%! - Using LLM ReAct trajs, finetuned smaller LMs outperforms LLM and finetuned🧠/💪only models!
Tweet media one
1
1
19
@ShunyuYao12
Shunyu Yao
7 months
Tom is AWESOME to work with, from most high-level ideas to most low-level details. Being his postdoc would be great fun and growing experience!
@RTomMcCoy
Tom McCoy
7 months
I am hoping to hire a postdoc who would start in Fall 2024. If you are interested in the intersection of linguistics, cognitive science, and AI, I encourage you to apply! Please see this link for details:
Tweet media one
2
53
164
1
1
19
@ShunyuYao12
Shunyu Yao
5 months
misalignment of ai and humans is not as dangerous as misalignment of our own behavior and desire, or misalignment among our various desires, or misalignment among various us
1
1
17
@ShunyuYao12
Shunyu Yao
6 months
Guys, before achieving AGI, we need to solve zork-1?
2
2
19
@ShunyuYao12
Shunyu Yao
2 years
ReAct = Synergize [Rea]soning and [Act]ing in LM How? ReAct prompts LLM with human task-solving trajectories with interleaving 🧠flexible thoughts and 💪domain-specific actions, so that it can generate both. Why? Strong generalization on VERY diverse tasks + ALIGNMENT benefits!
Tweet media one
1
0
19
@ShunyuYao12
Shunyu Yao
2 years
As GPT context length keeps increasing, will all retrieval become in-context retrieval? Or is (traditional) retrieval the key to increasing context length...
2
2
18
@ShunyuYao12
Shunyu Yao
9 months
Anyone with updated research belief after #NeurIPS2023 week?
2
1
18
@ShunyuYao12
Shunyu Yao
1 year
Very cool work, analyzing the risks and robustness of ReAct agents across scenarios and base LLMs! This direction will be very important.
@YangjunR
Yangjun Ruan
1 year
Should you let LMs control your email? terminal? bank account? or even your smart home?🤔 🔥Introducing ToolEmu for identifying risks associated with LM agents at scale! 🛠️Featuring LM-emulation of tools & automated realistic risk detection 🚨GPT4 is risky in 40% of our cases!
6
48
173
0
4
17
@ShunyuYao12
Shunyu Yao
9 months
If #GPT4 is open sourced tmr, what would/could you even do with it? How is that gonna change AI?
7
0
17
@ShunyuYao12
Shunyu Yao
1 year
It's crazy most web agent work in academia is still doing MiniWoB...
1
0
17
@ShunyuYao12
Shunyu Yao
2 years
I guess the success of diffusion in vision and chain of thought in language shares something in common: solving things step by step is just good
2
1
17
@ShunyuYao12
Shunyu Yao
1 year
@srush_nlp 🤣i would say higher but not exponential, given search has heuristics (eg bfs prune breadth, dfs prune subtree). But hopefully we can (and should) use open and free models soon
1
1
17
@ShunyuYao12
Shunyu Yao
11 months
Is the prompt engineering mess after GPT-3 till now similar to "network architecture/loss engineering" mess after AlexNet till now?
2
0
17
@ShunyuYao12
Shunyu Yao
8 months
Cool work and great extension of WebShop () --- We've been thinking about making it personalized for some years, it's great someone starts doing it!
@yang_zonghan
Zonghan Yang
8 months
We all know that __alignment__ elicits the capability of foundation models (FMs), while __agents__ autonomize FMs as copilots. Well, I'd say the two are intrinsically intertwined! Excited to introduce the principles of unified alignment for agents: (1/N)
5
11
62
2
2
15
@ShunyuYao12
Shunyu Yao
7 months
Is the timing coincidence or planned? I think planned😂
@var_epsilon
varepsilon
8 months
mogged again
78
1K
14K
2
0
16
@ShunyuYao12
Shunyu Yao
1 year
@kushirosea Our work explores simpler search algos like bfs/dfs, but mcts is def a natural todo!
2
0
16
@ShunyuYao12
Shunyu Yao
10 months
When is language model enough and when is language agent needed? Seems a good probe is whether humans speak or write. Eg. We can answer most questions fluently in everyday conversations, but need iterative revisions for writing math proof, blogposts, code, etc.
1
0
16
@ShunyuYao12
Shunyu Yao
2 years
Check the paper for MUCH MORE findings and insights! With @jezhao @Dian_Yu0 Nan Izhak @karthikn @caoyuan33 @googleai @princetonnlp
2
1
15
@ShunyuYao12
Shunyu Yao
1 year
Going to #ACL2023NLP next week (10-12), lmk if you want to chat😀
0
0
15
@ShunyuYao12
Shunyu Yao
1 year
I also first saw Langchain when it implemented ReAct at 0.0.3 with <100 stars. Now it's 0.0.131 with 20k+ stars. A lot of hard work! Great demos every day (a lot with super easy zero-shot-react-agent!). Congrats @hwchase17 and @LangChainAI and look forward to the future!
@milesgrimshaw
Miles Grimshaw
1 year
I first saw Langchain on Twitter when Harrison implemented the ReAct paper, exposing the LLM’s reasoning: . I was impressed w/ the elegant abstractions he wrote & captivated by the possibilities of orchestrating LLMs as an intelligence layer.
1
0
6
0
0
15