Aman Madaan @aman_madaan Twitter profile

Last Seen Profiles

@Kloppholic

@bokeplokalmalam

@SDCoyotesWBB

@WashAutoShow

@cuiruanjun25188

@KrumBoysBB

@paschalcypriann

@WarszawaBezSmog

@Nataa_draws

@AquinasBaseball

@misovski

@BinorRaja

@nicolashumann

@kakiutiyuuiti

@PeoplewiseVN

@chanko_120seira

@teaharuu

@WNoipan

@Cad0i

@danielcabraltk

@mandyland_viz

@selnor1983

@michospace

@rohitfanboys

@HinaGul57

@urayon1

@nemurinemuzzz

@then0wnow

@LaSandia_Alv3rs

@ygtecito

@hishouharuma11

@nxngxch11

@kagurabachi_x

@lovrlwtt

@OhioStateBME

@sinsonic

Aman Madaan

@aman_madaan

1 year

Can LLMs enhance their own output without human guidance? In some cases, yes! With Self-Refine, LLMs generate feedback on their work, use it to improve the output, and repeat this process. Self-Refine improves GPT-3.5/4 outputs for a wide range of tasks.

7

52

247

Aman Madaan

@aman_madaan

10 months

Language model APIs now come in all shapes and sizes ( @OpenAI , @AnthropicAI , @togethercompute ), with prices varying by up to 50x (Ada < Llama7b < Chatgpt < GPT-4). It makes sense to mix and match them, using smaller models for simpler queries and saving the $$ for the more

4

32

180

Aman Madaan

@aman_madaan

9 months

Fun exercise with Self-Refine + GPT4-V. Generate tikz code for an object using GPT-4 → render the image → use GPT4-V to get feedback on the image and improve tikz code → repeat! Colab: Also worth checking out if you just want to start playing with

Aman Madaan

@aman_madaan

1 year

Can LLMs enhance their own output without human guidance? In some cases, yes! With Self-Refine, LLMs generate feedback on their work, use it to improve the output, and repeat this process. Self-Refine improves GPT-3.5/4 outputs for a wide range of tasks.

7

52

247

6

20

117

Aman Madaan

@aman_madaan

9 months

Chain-of-thought prompting is amazing, but why does it work? Talk to @ayazdanb as he presents our #EMNLP2023 work on What makes Chain-of-Thought Prompting Effective? We use counterfactual prompting to attempt to answer this question using LLMs + realistic reasoning datasets.

Amir Yazdanbakhsh

@ayazdanb

9 months

At @emnlpmeeting 2023, we are presenting our work on understanding the underlying mechanisms behind chain of thought. We designed a suite of counterfactual prompts and systematically manipulating different elements of examples and testing their consequences on model behavior. /1

1

17

0

21

105

Aman Madaan

@aman_madaan

9 months

Looking forward to discussing our recent work on using inference-time compute for effective reasoning at #NeurIPS2023 ! 🗓️ Self-Refine: Iterative Refinement with Self-Feedback, Wed 13 Dec 5 p.m., Great Hall & Hall B1+B2 (level 1) Poster #324 🗓️ AutoMix:

3

19

87

Aman Madaan

@aman_madaan

1 year

If you've lost count of all the cool prompting papers and want a 30 min overview of the key techniques, come in tomorrow for the Complex Reasoning in Natural Language Tutorial @ #ACL2023 ! More @ Please join for Q/A + suggestions!

Wenting Zhao

@wzhao_nlp

1 year

Heading to #ACL2023 🚀 My collaborators @megamor2 @billyuchenlin @michiyasunaga @aman_madaan @taoyds and I will be presenting a cutting-edge tutorial on Complex Reasoning in Natural Language - diving into recent methods for accurate, robust & trustworthy reasoning systems🤖 1/2

2

11

49

1

9

83

Aman Madaan

@aman_madaan

1 year

Join us today to discuss code generation, reasoning, LLMs, self-refinement, and their intersection as we present PaL today at #ICML2023 ! Looking forward to seeing you! Exhibit Hall 1 @ 2pm local time

Uri Alon

@urialon1

2 years

📢 New Paper: Program-aided Language models Prompting methods such as chain-of-thought ( @_jasonwei ) employ LLM for decomposing the problem into steps *and* solving each step. Instead, PaL decomposes the problem into *programmatic* steps and solves using a Python interpreter. 1/4

13

103

509

1

6

50

Aman Madaan

@aman_madaan

11 months

Sally leaves her marble in a basket, and Anne moves it to a box. GPT-4 can probably tell you that Sally *still* thinks the marble is in the basket. But then things get interesting. Can GPT-4 infer that in this situation, Sally is the person who might need help? Sounds easy

Pei Zhou

@peizNLP

11 months

Can LLMs translate reasoning into decision-making insights? Bad news: NO! Without any help, LLMs "thinking" doesn't really translate into "doing". Good news: A little bit of structure goes FaR! We present Foresee and Reflect (FaR), a 0-shot reasoning mechanism that boosts

9

59

254

2

10

48

Aman Madaan

@aman_madaan

4 months

Missing #ICLR2024 , but if you're interested in algorithmic optimization, do checkout our work presented by @alex_shypula @ 10:45AM, Halle B, Poster #255 ! @urialon1 @ayazdanb @gneubig

1

4

44

Aman Madaan

@aman_madaan

1 year

Self-consistency is a powerful way to push the performance of LLMs. The idea is simple: draw 40 samples and use the majority as the answer. But what if the first 10 outputs are the same? Do we still wait for all 40? In Adaptive-consistency, show that you can do much better! 1/3

Pranjal Aggarwal

@PranjalAggarw16

1 year

🤔Can we reduce the cost of reasoning in large language models while maintaining accuracy? Introducing our new paper: "Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs" a dynamic alternative to Self-Consistency! 🌐: 🧵

1

11

60

1

5

43

Aman Madaan

@aman_madaan

8 months

@abacaj Or automix them

1

4

41

Aman Madaan

@aman_madaan

2 years

Thanks Yannic for covering our work! It was fun to discuss the promise and limitations of memprompt. Deploying GPT-3 like models at scale will require lightweight models that can "fix" errors locally without retraining, memprompt shows one way to do it!

Yannic Kilcher 🇸🇨

@ykilcher

2 years

Check out this interview with Aman Madaan ( @aman_madaan ) and Niket Tandon on "Memory-assisted prompt editing to improve GPT-3 after deployment". Learn how to improve and personalize GPT-3 even after deployment, without any access to its internals!

1

7

34

3

8

34

Aman Madaan

@aman_madaan

6 months

Really nice work! Part of it is also quite simple to implement in just a few (<100) lines with torch/hf. Here is a notebook that implements and runs algorithm 1 in the paper, and correctly guesses 4096 as one of the candidates for `h` for `mistralai/Mistral-7B-v0.1`. Works

Aran Komatsuzaki

@arankomatsuzaki

6 months

Google presents: Stealing Part of a Production Language Model - Extracts the projection matrix of OpenAI’s ada and babbage LMs for <$20 - Confirms that their hidden dim is 1024 and 2048, respectively - Also recovers the exact hidden dim size of gpt-3.5-turbo

16

148

957

1

2

35

Aman Madaan

@aman_madaan

11 months

Nice work, and echoes our findings in Self-Refine! Refining creative outputs works extremely well, but spotting errors for challenging reasoning problems is a hit or miss. One of the most exciting areas right now! 1/2

Jie Huang

@jefffhj

11 months

Can LLMs Self-Correct Their Reasoning? Recent studies (self-refine, self-critique, etc.) suggest LLMs possess a great ability to self-correct their responses. However, our research indicates LLMs cannot self-correct their reasoning intrinsically. [1/n]

5

69

387

2

34

Aman Madaan

@aman_madaan

2 years

@peterjliu This is one of the two hypotheses we have to explain why LMs trained on code are so good at structured commonsense generation & reasoning (paper: ). The other is that the training data has tons of commonsense. I like yours better.

1

28

Aman Madaan

@aman_madaan

7 months

For many tasks, there is usually one correct answer, and *many* ways to be wrong. But mistakes can be informative, too! LEAP uses this idea to automatically draft a few "principles" for every task (e.g., two `not` operations cancel out in boolean algebra). These principles are

Uri Alon

@urialon1

7 months

📢New paper : "In-Context Principle Learning from Mistakes" Instead of prompting using only *correct* few-shot examples, we intentionally make *mistakes*, and then learn "principles" or "lessons" from them. Lead by @tianjun_zhang @aman_madaan @luyu_gao

5

24

113

0

4

28

Aman Madaan

@aman_madaan

2 years

Stop by our poster today (session 7, 9:00 - 10:30 am) if you are interested to learn more about few-shot prompting + memory!

Ai2

@allen_ai

2 years

MemPrompt, appearing at #EMNLP2022 , is a new way to "fix" #GPT3 after deployment via user interaction 🧑‍🏫 This is new work from @aman_madaan , Niket Tandon, Peter Clark, and Yiming Yang in a collaboration between @ai2_aristo and @LTIatCMU – learn more:

0

9

52

0

4

20

Aman Madaan

@aman_madaan

2 years

ChatGPT's ability to emulate a terminal is amazing! It's probably achieved by combining few-shot prompting with a "memory" of past interactions. Check out this notebook that replicates Python terminal examples using codex + standard prompting + memory: .

(((ل()(ل() 'yoav))))👾

@yoavgo

2 years

this "terminal inside chatGPT" is crazy. is that an easter egg? how? i tried the prompt and it... worked? i was inside a shell. then i tweaked the prompt to be inside a python interpreter. also behaves creepily ok (see next tweet for screenshots). wtf. what game are they playing.

27

33

425

4

3

20

Aman Madaan

@aman_madaan

1 year

@Francis_YAO_ We have similar findings in selfrefine:

1

2

19

Aman Madaan

@aman_madaan

2 years

@gneubig Makes sense! I think search engines are another useful analogy: prompt → query, and model weights → database. A better model (search engine) can work with more complex prompts (queries) to generate the answer (query the database). 1/2

3

0

18

Aman Madaan

@aman_madaan

2 years

@janleike More examples of this phenomenon if you consider Python to be a non-English language (images from from , )

0

14

Aman Madaan

@aman_madaan

4 years

Paper: Code: Joint work with @setlur_amrith , @tparekh97 , Barnabas Poczos, @gneubig , Yiming Yang, @rsalakhu , Alan W Black, @shrimai_ !

Politeness Transfer: A Tag and Generate Approach

Politeness Transfer: A Tag and Generate Approach has 4 repositories available. Follow their code on GitHub.

github.com

TechCrunch

@TechCrunch

4 years

CMU researchers develop a an automatic politeness engine for text-based communications by @etherington

0

18

39

0

8

12

Aman Madaan

@aman_madaan

4 years

@TommSciortino There might be one cycle here: I've heard native Hindi speakers use "Farsi" as the unintelligible language (so Hindi → Persian should be an edge).

2

0

13

Aman Madaan

@aman_madaan

2 years

It's easier to tackle such problems if you first translate them to math/code

Harrison Ritz

@harrison_ritz

2 years

oh thank god

101

282

6K

0

11

Aman Madaan

@aman_madaan

9 months

Check out the #EMNLP2023 talk by @PranjalAggarw16 on our work on Adaptive Consistency, a variant of self-consistency that dynamically adjusts the number of samples. TLDR: We find a way to quantify `P(enough samples have been drawn)` and stop when this probability is over a

Aman Madaan

@aman_madaan

1 year

Self-consistency is a powerful way to push the performance of LLMs. The idea is simple: draw 40 samples and use the majority as the answer. But what if the first 10 outputs are the same? Do we still wait for all 40? In Adaptive-consistency, show that you can do much better! 1/3

1

5

43

0

1

11

Aman Madaan

@aman_madaan

1 year

Turns out the answer is no for many tasks: these models _can_ Self-Refine! Paper: Research demo while credits last: Joint work with an amazing set of collaborators.

2

0

11

Aman Madaan

@aman_madaan

9 months

Apply to work with Ankush! He’ll be a great advisor and is going to work on exciting stuff, like the intersection of large language models, formal verification, and programming languages, among other things!

Ankush Das

@Das8Ankush

9 months

Thrilled to share that I will start a tenure-track assistant professor position in the CS department at Boston University! I am looking for PhD students in the area of programming languages with applications to distributed systems, cryptography, and machine learning.

15

29

214

0

2

10

Aman Madaan

@aman_madaan

11 months

Lots of interesting work happening: - I enjoyed the idea in : Hide part of the information in the original question and try to use an LLM's answer to recover it. - Step-by-step annotation on errors: Looking forward to more! 2/2

1

0

10

Aman Madaan

@aman_madaan

1 year

@_jasonwei I think conveying task requirements is crucial too! CoT is effective if the prompt effectively spells out *what* has to be done, even with mistakes/odd symbols. Simple changes that remove task understanding hurt performance. We discuss more in our work: .

2

0

10

Aman Madaan

@aman_madaan

1 year

Code and data:

GitHub - madaan/self-refine: LLMs can generate feedback on their work, use it to improve the...

LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively. - madaan/self-refine

github.com

1

0

10

Aman Madaan

@aman_madaan

10 months

@rao2z Cool work! I like how you hit the nail on its head--the key issue for reasoning problems is self-verification/critique. Also worth noting are works that *do* find ways of doing self-verification for reasoning problems with LLMs: ,

Forward-Backward Reasoning in Large Language Models for...

Self-Consistency samples diverse reasoning chains with answers and chooses the final answer by majority voting. It is based on forward reasoning and cannot further improve performance by sampling...

arxiv.org

0

1

9

Aman Madaan

@aman_madaan

10 months

Work done with amazing collaborators @PranjalAggarw16 @ankit_s_anand @SriVidya_1729 @Swarooprm7 @adityagupta2211 @dheerajgopal Karthik, Yiming, @shyamupa @mishumausam @manaalfar Thanks also to @prakhariitr , @kalpeshk2011 , and @urialon1 for the valuable feedback! Code, data,

1

0

9

Aman Madaan

@aman_madaan

2 years

@SergeyI49013776 For such problems, generating Python code is often much more effective. More at !

0

9

Aman Madaan

@aman_madaan

4 years

Thanks @jeremyphoward ! I learned a lot from the amazing @fastdotai course by you and @math_rachel . The fastai library also helped us in quickly implementing more than 6 text classifiers we needed for evaluating our model.

Jeremy Howard

@jeremyphoward

4 years

Amazing! 3 years ago, alum @aman_madaan asked us: "could we apply style transfer to text?" Now, he's done it, with @setlur_amrith @tparekh97 @gneubig @rsalakhu @shrimai_ ! A model to "politeify" your text :) Paper:

6

117

485

0

1

8

Aman Madaan

@aman_madaan

11 months

Led by @peizNLP with amazing team @SriVidya_1729 @adityagupta2211 @empiricallykev @universeinanegg @jay_mlr @xiangrenNLP @Swarooprm7 @aidanematzadeh @shyamupa @manaalfar Some failed attempts trying to generate an image for this Tweet:

1

8

Aman Madaan

@aman_madaan

9 months

My advisor Yiming () is actively hiring post-docs working on the areas I mentioned above, so please feel free to reach out/DM if you're interested in learning more.

0

2

6

Aman Madaan

@aman_madaan

1 year

Really like this line of work! Findings resonate with our work on counterfactual prompting: We find: syntax is imp (CoT in Yoda doesn't work), rare entities are more challenging & more! My TL;DR: prompts help with *what* has to be done, not how.

Zhaofeng Wu @ ACL

@zhaofeng_wu

1 year

Language models show impressive performance on a wide variety of tasks, but are they overfitting to evaluation instances and specific task instantiations seen in their pretraining? How much of this performance represents general task/reasoning abilities? 1/4

9

111

477

0

7

Aman Madaan

@aman_madaan

2 months

@arankomatsuzaki This is exactly what @SNAT02792153 did with Self-Imagine! Checkout the work led by her here:

Syeda Nahida Akter

@SNAT02792153

7 months

When solving a difficult problem, we often draw a diagram to help us visualize. What if VLMs could do the same? Introducing Self-Imagine – a method that enhances the reasoning abilities of VLMs on text-only tasks through visualization. Paper: 🧵↓

2

29

104

0

7

Aman Madaan

@aman_madaan

10 months

At the heart of AutoMix is the idea of self-verification with context. While open-ended self-verification is challenging, context can provide valuable signals for verification. But there is noise! So we add a second layer of meta-verification to double-check the verifier.

2

1

7

Aman Madaan

@aman_madaan

1 year

After playing with ChatGPT-ish models, it was clear that: i) they are pretty good at certain tasks, ii) it takes a bit of back-and-forth to get things right. If the model is so good, does it really need me to point out the basic mistakes in its output? 1/n

1

0

7

Aman Madaan

@aman_madaan

9 months

@srush_nlp makes a nice connection between complexity theory and effectiveness of CoT, and gives a reasonable argument for the "CoT provides the model with more compute" hypothesis.

1

0

6

Aman Madaan

@aman_madaan

3 months

@ChengleiSi @kenziyuliu You probably already know about this paper, but if not I'm sure you'll like it

Measuring Progress on Scalable Oversight for Large Language Models

Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills...

arxiv.org

1

0

5

Aman Madaan

@aman_madaan

2 years

@gneubig Slides: Another cool interpretation of in-context learning: 2/2

0

6

Aman Madaan

@aman_madaan

2 years

@sivil_taram PaL provides a way to improve the chain-of-thought family of techniques. Calculator/other tools are ad-hoc fixes for specific problems, but generating Python code is more general (plus it opens up exciting possibilities like adding API calls in the generated reasoning!). [1/2]

1

0

5

Aman Madaan

@aman_madaan

2 years

More information on combining memory and prompting here: , paper: . We will also present this work at poster session 7 @ EMNLP 2022. [2/2]

2

5

Aman Madaan

@aman_madaan

1 year

@Francis_YAO_ Yeah it's quite amazing. You may also be interested in the recent paper by @liharryzhang , @LiamDugan_ , @seacow_x . It provides some interesting counter examples.

Exploring the Curious Case of Code Prompts

Recent work has shown that prompting language models with code-like representations of natural language leads to performance improvements on structured reasoning tasks. However, such tasks...

arxiv.org

2

0

5

Aman Madaan

@aman_madaan

1 year

My advisor is hiring PhD students and Postdocs. Please feel free to ping me or @EdwardSun0909 (also at #ICML2023 ) if you are interested!

0

1

4

Aman Madaan

@aman_madaan

4 years

@yoavgo @jeremyphoward @rsalakhu @setlur_amrith @tparekh97 @gneubig @shrimai_ Thanks for pointing this out Yoav! I'm not familiar with the work but we'll be sure to appropriately cite it in an updated Arxiv version.

1

0

5

Aman Madaan

@aman_madaan

4 years

@TommSciortino I've also heard the Native speakers of Punjabi/Saraiki use Pashto to refer to the incomprehensible utterances.

0

5

Aman Madaan

@aman_madaan

1 year

@ayazdanb Thanks so much for taking the time @ayazdanb ! Lots of fun + learning talking to you as always!

0

4

Aman Madaan

@aman_madaan

10 months

@kohjingyu @adityakusupati (co-author on Matryoshka transformer) had another paper earlier this year with matryoshka representations . It's a neat idea!

AdANNS: A Framework for Adaptive Semantic Search

Web-scale search systems learn an encoder to embed a given query which is then hooked into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. To accurately...

arxiv.org

1

0

4

Aman Madaan

@aman_madaan

10 months

@sarahwiegreffe @pyoudeyer @DrJimFan @Francis_YAO_ @gneubig @srush_nlp @Thom_Wolf Ah yeah thanks @sarahwiegreffe ! I see that Graham has responded already!

Graham Neubig

@gneubig

10 months

@pyoudeyer @DrJimFan @Francis_YAO_ @srush_nlp @Thom_Wolf We have some systematic examination in our paper here: But we were limited by what we can do with inscrutable OpenAI models. Definitely interested in the answer to this question!

2

1

22

0

3

Aman Madaan

@aman_madaan

1 year

In practice, Adaptive-consistency significantly boosts efficiency in models like vicuna-13b & codex, drawing up to 6x fewer samples while maintaining near-optimal performance. Check out more at . Joint work with @PranjalAggarw16 @mishumausam !

0

4

Aman Madaan

@aman_madaan

4 months

@shuyanzhxyc @DukeU @dukecompsci Congrats Shuyan! Looking forward to the cool work from your lab. Exciting news for @DukeU @dukecompsci !

1

0

4

Aman Madaan

@aman_madaan

2 years

@koustuvsinha @yoavgo This is already possible with code-davinci-002! All we need is a small seed prompt that "grows" with user interaction. Here's a notebook that replicates all of @yoavgo 's Python prompts with code-davinci-002:

0

3

Aman Madaan

@aman_madaan

9 months

@universeinanegg In , we use a database/memory of previously made errors and feedback to improve later predictions for reasoning tasks. Also includes a generative IR trick which is now widely known: guess the answer using a LM, and use the guessed answer for retrieving

0

3

Aman Madaan

@aman_madaan

9 months

@melaniesclar @neuranna You may be interested in our ( @khermann_ @ayazdanb ) work from last year, where we show that seemingly benign changes to the prompt format can change the performance on an end task (but we also discuss why and when!). One of my favs is rewriting the prompts in Yodaspeak 😁

0

3

Aman Madaan

@aman_madaan

9 months

The fabled unicorn

0

3

Aman Madaan

@aman_madaan

1 year

@_akhaliq by @swarnaNLP looks relevant too!

Summarization Programs: Interpretable Abstractive Summarization...

Current abstractive summarization models either suffer from a lack of clear interpretability or provide incomplete rationales by only highlighting parts of the source document. To this end, we...

arxiv.org

0

3

Aman Madaan

@aman_madaan

1 year

@delliott Consider two tasks: Task 1: Make a given yelp review positive. Task 2: Pick which of two yelp reviews is more positive. You can have the same LLM do both these things/mix & match. Details matter (e.g., adding "chain of thought" for Task 2). Reporting corr. with humans is imp.

0

3

Aman Madaan

@aman_madaan

11 months

Link to the paper because the arxiv link is 404 for some reason

T4D-FaR-Preprintpdf.pdf

drive.google.com

0

3

Aman Madaan

@aman_madaan

8 months

@billyuchenlin Hey Bill, this is interesting, but is it surprising? We are effectively sampling from the distribution of `P(y | system prompt),` which won't yield empty strings *unless* we finetune. > It might leak some GPTs' training data? Possibly! One way to leak pretraining data could be

1

0

3

Aman Madaan

@aman_madaan

9 months

More examples here: . I was surprised that tikz can render these things in the first place.

1

0

3

Aman Madaan

@aman_madaan

2 months

@ayazdanb @lmsysorg @anyscalecompute @isaacongjw @AmjadMahayri @Cveinnt @infwinston @WthThao @profjoeyg @waleedk Thanks for the pointer @ayazdanb . Maybe we didn't have the best name 😅

0

3

Aman Madaan

@aman_madaan

11 months

@natolambert You might be interested in @afeyzaakyurek 's work on RL4F:

RL4F Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

Blog post for RL4F.

feyzaakyurek.github.io

Afra Feyza Akyürek

@afeyzaakyurek

1 year

New paper! It’s not easy to fix the errors and misbehaviors of large black-box LMs📦 sitting behind APIs🔒. RL4F provides a way! Our new method prepares small-sized LMs to effectively critique and improve black-box LLMs via reinforcement learning. To appear at #ACL2023 🎉

2

16

94

0

3

Aman Madaan

@aman_madaan

7 months

@yanda_chen_ Cool idea, and congrats on getting this to work! We pretty much use the same idea (context-grounded self-verification) in

0

3

Aman Madaan

@aman_madaan

10 months

A typo and I missed the awesome @peizNLP ! I had lots of fun working with Pei during my internship @Google Bard and he will be on the job market soon!

0

3

Aman Madaan

@aman_madaan

3 years

Context is more important than the scale!

dheeraj rajagopal

@dheerajgopal

3 years

Our findings (1) for situational reasoning, context is more important than the scale of the pretrained language model (2) situation graphs generated from PLMs *do* improve downstream task accuracy

0

3

0

3

Aman Madaan

@aman_madaan

4 years

We have another QA session at 4:00 pm EST. Join us at: !

Graham Neubig

@gneubig

4 years

Now in #acl2020nlp QA Session: "Politeness Transfer: A Tag and Generate Approach" Happy to answer questions if you're interested in methods to make text more polite!

0

4

34

0

1

3

Aman Madaan

@aman_madaan

1 year

@WenhuChen @denny_zhou "Annual" might be too slow for LLMs?

1

0

2

Aman Madaan

@aman_madaan

1 year

@DimitrisPapail In , we show that codex is also pretty effective at generating targeted patches for algorithmic improvements (like reducing the *complexity* from O(n^2) to O(n)). Parallel data+codegen gets decent finetuned models. There’s also Scalene by @emeryberger .

0

2

Aman Madaan

@aman_madaan

8 months

@AnsongNi I like the slides, thanks for sharing! Of course you can only cover so much in a lecture, but one important aspect to include could be techniques that proposed using code for non-code related tasks (precursors of PaL/PoT etc.) Interesting discussion: ,

From the singularity community on Reddit: Large Language Models trained on code reason better, even...

Posted by MysteryInc152 - 647 votes and 151 comments

www.reddit.com

1

0

2

Aman Madaan

@aman_madaan

2 years

@sivil_taram Also in this particular case, the calculator doesn't really help much: [2/2]

Shuyan Zhou

@shuyanzhxyc

2 years

@sivil_taram Good point. PaL outperforms CoT+Codex/PaLM+calculator as well (+ ext. calc). The beauty of PaL is that we could get rid of all post-processing steps with tedious rules and heuristics — programs handle them all!

2

0

5

0

2

Aman Madaan

@aman_madaan

1 year

We're rethinking Self-consistency - each run is a sample from a Dirichlet distribution, giving us a quantifiable 'stability' measure for the majority element. This opens up possibilities for intuitive stopping thresholds, like stopping sampling when stability exceeds 95%. 2/3

1

0

2

Aman Madaan

@aman_madaan

10 months

@deliprao Exactly! Until the Monty Hall problem shows up... 😄

0

Aman Madaan

@aman_madaan

1 year

@kchonyc Seems more reasonable for Windows: os.path.join('C:\Users\username\Documents', 'C:\Users\username\Files', 'Photos')) will produce 'C:\Users\username\Files\Photos'. 'C:\Users\username\Documents\C:\Users\username\Files\Photos' will be weird?

1

0

2

Aman Madaan

@aman_madaan

9 years

http://t.co/5SKzA9ZIW6

0

2

Aman Madaan

@aman_madaan

6 months

@LChoshen @rtk254 I'm sure @wzhao_nlp will have some ideas

2

0

2

Aman Madaan

@aman_madaan

1 year

@michael_nielsen Somewhat relevant: , and our work

Language Models of Code are Few-Shot Commonsense Learners

We address the general task of structured commonsense reasoning: given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph. To employ large language...

arxiv.org

0

2

Aman Madaan

@aman_madaan

1 year

@omarsar0 @WizardLM_AI GPT-4 was finetuned on GSM-8k and MATH.

1

0

2

Aman Madaan

@aman_madaan

4 months

@peizNLP Congrats Pei!

1

0

2

Aman Madaan

@aman_madaan

9 months

@gneubig <it ain't much meme>

1

0

2

Aman Madaan

@aman_madaan

2 years

@xiangrenNLP @WeijiaShi2 Cool work! > measuring the human utility of machine free-text rationale Our work investigated this question for defeasible reasoning: . Shows T5-xxl generated explanations help humans do better at defeasible reasoning! Followed by:

0

2

Aman Madaan

@aman_madaan

2 years

@douwekiela @Maxbartolo 4 / 5 Our recent study answers these and more questions! TLDR: text and patterns help each other – (i) text helps extract commonsense from the question to help patterns, and (ii) patterns enforce task understanding and direct text generation.

1

0

2

Aman Madaan

@aman_madaan

3 years

@swarnaNLP @prateeky2806 @lbauer119 @mohitban47 @uncnlp Hey @swarnaNLP , thanks for sharing your work; looks great! You may find our work on generative commonsense reasoning interesting: and the follow-up

2

0

2

Aman Madaan

@aman_madaan

7 years

Thanks for the names dataset , @notMiloBejda . Helped me in finishing

1

2

Aman Madaan

@aman_madaan

4 years

looks neat!

GitHub - neubig/minnn-assignment: An assignment on creating a minimalist neural network toolkit for...

An assignment on creating a minimalist neural network toolkit for CS11-747 - neubig/minnn-assignment

github.com

Graham Neubig

@gneubig

4 years

2021 version of CMU "Neural Networks for NLP" slides () and videos () are being posted in real time! Check it out for a comprehensive graduate-level class on NLP! New this year: assignment on implementing parts of your own NN toolkit.

2

290

1K

0

2

Aman Madaan

@aman_madaan

7 months

@Swarooprm7 I like this perspective on the work! Essentially learning from a few (~5) mistakes to make instructions more informative.

1

0

2

Aman Madaan

@aman_madaan

8 months

@AnsongNi Yeah, one example would be the starcoder paper , Table 22 shows better performance using PaL vs. not. Another is

ViStruct: Visual Structural Knowledge Extraction via Curriculum...

State-of-the-art vision-language models (VLMs) still have limited performance in structural knowledge extraction, such as relations between objects. In this work, we present ViStruct, a training...

arxiv.org

0

1

Aman Madaan

@aman_madaan

1 year

@hardmaru @ycombinator @hn_frontpage Incidentally this was trending on HN yesterday

0

2

Aman Madaan

@aman_madaan

8 months

@kaushikpatnaik @main_horse I think it depends on the context. See section 4.3.3 of Briefly, we have an MoE that combines different nodes of the graph (one expert per node). We show that the MoE learns to leverage different parts of the graphs for different domains.

1

0

2

Aman Madaan

@aman_madaan

3 years

@dmort27 Great work! From the paper: "The first author, who had graduate training in phonetics and field linguistics, elicited an item from the speaker, then attempted to imitate it until the speaker confirmed that it was correctly produced, then transcribed it." 🙏

0

2

Aman Madaan

@aman_madaan

6 months

@shannonzshen @_akhaliq Nice work, you'll find Automix to be super relevant here!

0

2

Aman Madaan

@aman_madaan

10 months

@maithra_raghu @ericschmidt I really enjoyed reading this post! You might also be interested in our work on self-refinement, where we allow the model to refine its outputs over time using self-generated feedback.

Aman Madaan

@aman_madaan

1 year

Can LLMs enhance their own output without human guidance? In some cases, yes! With Self-Refine, LLMs generate feedback on their work, use it to improve the output, and repeat this process. Self-Refine improves GPT-3.5/4 outputs for a wide range of tasks.

7

52

247

0

1

2

Aman Madaan

@aman_madaan

6 months

@_jason_today @jeremyphoward Relevant: , which links to

Hot or Cold? Adaptive Temperature Sampling for Code Generation...

Recently, Large Language Models (LLMs) have shown impressive abilities in code generation. However, existing LLMs' decoding strategies are designed for Natural Language (NL) generation,...

arxiv.org

0

1

Aman Madaan

@aman_madaan

10 years

@spacetime29 Hmm. You mean since yesterday?

1

0

1

Aman Madaan

@aman_madaan