Albert Webson @albertwebson Twitter profile

Last Seen Profiles

@discamino

@TJs0x

@blocky_bss

@stwmaniax

@ura_initial_h_

@IrinSul61910790

@samo_tra

@TJs0x

@borges_agroplay

@motivasyonhane

@SGaming12921

@Rose_golden24

@Jose19831573713

@Plusnet

@ChinaGreat81981

@T25112003

@stw_pdg

@R0BLOCK

@DiwiSatta

@NathanBell2020

@ShunMoja89289

@ShahzadW21

@zhenrentengye

@libdemeuropei

@danger_zone

@BinorRaja

@stak_l

@3wayyyy

@PSCL_09

@ValleyGirlB

@domidehetero

@Medusaesport2

@J049fXe7r9iY3

@MaidenhillERC

@bokeplokalmalam

@Akiha3701010

Albert Webson

@albertwebson

5 months

Okay so I only discuss hot takes with Jason privately but this one I feel obligated to disagree publicly: Almost no one cares about levels/seniority in Gemini. Much of the real work and real decisions are made by ICs with experiment results & TensorBoards, not levels.

Jason Wei

@_jasonwei

5 months

One liberating thing about OpenAI (and presumably other small companies) is that there are no expectations of project scope being tied to your level. What I mean is that an ambitious junior engineer could take on a big project and be judged purely on their execution, without any

23

50

625

13

14

289

Albert Webson

@albertwebson

2 years

Congrats to @996roma for defending her thesis, ascending to become the first ever @Brown_NLP PhD graduate! And as per Brown CS tradition, our lab presents her a chicken that is our best representation of her!

4

7

125

Albert Webson

@albertwebson

7 years

The latest @atpfm is over 3 hrs, which is excellent because I can start listening around midnight and by the end the preorder would be open.

1

76

Albert Webson

@albertwebson

5 months

Levels may be important at Google, but not in Gemini. Probably easiest to just consider Gemini as a separate company/culture from Google. This is the best time ever for ambitious and productive ICs to thrive.

2

0

70

Albert Webson

@albertwebson

3 years

My co-authors have already highlighted the technical side of T0, so I will just chime in on the human side of things behind the scenes. (But really there are no “scenes” since even our meeting videos are public.) 1/7

Victor Sanh

@SanhEstPasMoi

3 years

Yet, our goal was to train a model that generalizes well to many more new tasks, not just some variant of QA. The core hypothesis is that massive multi-task training on a large and diverse set of prompted datasets would improve generalization to unseen tasks.

1

9

2

10

52

Albert Webson

@albertwebson

5 months

In my first “research” meeting with @qinan_yu , we extensively discussed all the available evidence we had on how our PhD advisor @Brown_NLP met her husband (plus some history of LLMs). In just one short year, she has grown from my student/co-author/avant-garde friend to…

2

3

48

Albert Webson

@albertwebson

8 years

Of course customers who bought @marcoarment ’s top headphone pick also bought this (plus sandpaper, probably).

0

1

43

Albert Webson

@albertwebson

5 months

Case in point: the people who made our infra/models so good—they are not L8s/L9s (but some promoted to L8 now!) They are people who you hang out with at microkitchens/lunch & swing by their desks to ask questions everyday, not ppl who are in meetings 9 - 5

Jack Rae

@drjwrae

5 months

The teams working on model serving infrastructure at Google are really impressive. This is something I particularly enjoy about the Google 2.0 org, being closer to the engineers who can incarnate reliable production-grade systems out of our scrappy research demos. Building this

4

121

2

0

45

Albert Webson

@albertwebson

5 months

Also, you may be surprised that some L9s/L10s/VPs/SVPs/Google Fellows will spend hours with you every week looking at your code and TensorBoards and Colabs, even reviewing (like, actually reviewing not stamping) your code and running experiments themselves!

1

0

42

Albert Webson

@albertwebson

2 years

I was wrong about something on the Internet! Embarrassing, but it’s only right that we wrote another paper to right the wrong. Turns out that LMs are smarter than I thought, and humans are weirder than I thought.

Brown NLP

@Brown_NLP

2 years

Last year, we criticized LMs for performing “too well” with pathological prompts, and many papers have now shown similar results with corrupted ICL or CoT. In our new work, we find that *humans* also perform surprisingly well with irrelevant prompts! (But not misleading ones.) ⅕

2

25

137

2

1

30

Albert Webson

@albertwebson

2 years

I thought a lot about how to word my tweets, but words cannot describe how grateful I am to @kelvin_guu . He is an exceptional combination of knowing the literature, getting real work done, as well as being a trustworthy mentor. 1/4

Kelvin Guu

@kelvin_guu

2 years

Which training examples taught my LLM to do that? 🤔 New from Google Research: Simfluence tracks how much "smarter" your model gets after consuming each example. It can then simulate scenarios like “What if I removed X dataset from my training corpus?” 🧵

7

61

376

1

0

23

Albert Webson

@albertwebson

2 years

@tallinzen @AndrewLampinen (and I) are far from scaling maximalists, but his paper () is by far the most rigorous I know and shows (1) the correctness of the few-shot demonstrations do matter and (2) the instructions matter way less.

Can language models learn from explanations in context?

Language Models (LMs) can perform new tasks by adapting to a few in-context examples. For humans, explanations that connect examples to task principles can improve learning. We therefore...

arxiv.org

3

1

23

Albert Webson

@albertwebson

10 years

Anti-procrastination idea: configure @OmniFocus such that when a task is overdue, change its font to all-cap Comic Sans.

4

20

21

Albert Webson

@albertwebson

5 years

@vboykis I find it incredibly useful in reducing runtime type errors, also very handy in reminding myself the shape of tensors.

2

1

21

Albert Webson

@albertwebson

2 years

Also, a thousand thanks to my co-first-authors @alyssamloo and @qinan_yu , who are undergraduate students with competence far exceeding mine when I was a junior PhD. Both will apply to grad school soon so please hire them!

0

1

13

Albert Webson

@albertwebson

5 months

@JordiClive i asked if i can tweet about other internal tools and was told no

1

0

11

Albert Webson

@albertwebson

5 months

first authoring her own paper on mech interp (), and now a PhD in her own right @stanfordnlp ! Meanwhile, when I first taught @alyssamloo and graded her papers, I was astonished how this first-year student was already outputting PhD-level work,

1

0

12

Albert Webson

@albertwebson

7 years

. @cgpgrey protein bar of choice… tastes like toothpaste engineered for robots.

0

10

Albert Webson

@albertwebson

5 months

and immediately after class, I poached her into our lab; and now after her graduation, I poached her into Google DeepMind (Gemini pretraining team)! It’s somewhat cliche that highly accomplished people sometimes say their kids are their highest accomplishment, but in research,

1

0

10

Albert Webson

@albertwebson

2 years

We probably all had the experience of trying a thousand different prompt variations, maybe in a Colab with convoluted loops and string manipulations. Using @hen_str ’s UI is much more sane, insightful, and refreshing!

Hendrik Strobelt

@hen_str

2 years

🎉 NEW ! Finding a LLM prompt for new tasks can be tough. PromptIDE allows users to experiment with prompt variations, visualize prompt performance, and iteratively optimize prompts -- to appear @ieeevis 2022. Demo, Arxiv, Code: #NLProc ❤️ #visualization

5

47

206

0

1

10

Albert Webson

@albertwebson

2 years

This figure needs a second to be digested but it’s a really cool & useful finding: Adding as little as 10% of few-shot training improves 0-shot eval, while adding 10% of 0-shot training improves few-shot too! A theme is to go beyond multi-task, but multi-prompting-paradigm too.

Shayne Longpre

@ShayneRedford

2 years

Q: But why are the results strong? Our breakdown of the Flan Collection shows *why* it works. The most important methods: 🌟Finding 1🌟 Fine-tuning on zero-shot and few-shot prompts together significantly improves both settings (not a trade-off)! 4/

1

2

32

0

1

10

Albert Webson

@albertwebson

5 months

while I’m far from being accomplished, my highest accomplishment is not my models or papers, but my students. Many of my models, papers, and lectures were flawed & wrong in retrospect, but that’s why I teach, so that my students can do better research and correct me in the future

1

0

9

Albert Webson

@albertwebson

6 years

So… I accidentally made @ShriramKMurthi ’s day? I can die—I mean—finish grad school happy now. Also, me before taking his class: I should write unit tests more frequently than I floss. Me now: write tests before writing the actual code.

1

8

Albert Webson

@albertwebson

5 months

(And they already have! ) Truly, from the bottom of my heart, thank you. And congratulations! A very bright future ahead of both of you.

0

1

8

Albert Webson

@albertwebson

3 years

incredibly patient with my opinions even though he is obviously much much more experienced. And of course so are my other co-authors. A huge thank you to all of you!

0

8

Albert Webson

@albertwebson

2 years

@tallinzen @AndrewLampinen Oh I trust Sewon et al. that there is nothing wrong with their paper per se. Could be as simple as the LMs and the datasets are different from Andrew’s. Plus, in our upcoming paper, humans also exhibit some similar behaviors, so best to not draw strong conclusions quite yet.

2

0

7

Albert Webson

@albertwebson

3 years

After an early experiment of naively training on all prompts failed, we needed to design a non-naive training and eval mixture. No one else seemed to want/have time to do it so I picked it up. I read through 700-something prompts and 200-something datasets and picked them by hand

1

0

7

Albert Webson

@albertwebson

3 years

Moral of the story: You don’t need fancy credentials @BigScienceW as long as you are willing to do good and honest work (many other members are busy!) and cooperate with others. Truly amazing that the LHC vision has kinda worked out! Join us here! 7/7

1

7

Albert Webson

@albertwebson

5 years

I rarely take pictures of myself, but I felt a moral calling when I saw these irresistibly cute probability distributions at the Data to Actionable Knowledge Lab at Harvard.

1

0

6

Albert Webson

@albertwebson

4 months

Had an epiphany that typing in all lowercase is evolving into the English equivalent of タメ口、whereas proper case & punctuation = 敬語。Like タメ口、all lowercase & omitting punctuations are no longer just a personal preference but socially preferred/required in some contexts.

0

3

6

Albert Webson

@albertwebson

3 years

@najoungkim There are precedents (from ICLR).

1

0

5

Albert Webson

@albertwebson

4 years

┻┳| ┳┻| ┻┳| ┳┻| _ Is it time to tweet our ┻┳| •.•) #emnlp2020 papers? ┳┻|⊂ﾉ I promise it will be short. ┻┳| 0/4

1

5

Albert Webson

@albertwebson

2 years

Yes I’m so sorry for the confusion! I started the T0++ name in experiment code as a joke ref to C++, but we run out of time for ICLR so it accidentally became the official name. But we couldn’t have “+” in a model var name so we ended up with T0pp. 1/3

Victor Sanh

@SanhEstPasMoi

2 years

@_jasonwei @stanfordnlp Well actually, @albertwebson tells me that he gave the checkpoints some ice cream nicknames when he ported them to jax at google T0 -> vanilla T0+ -> strawberry T0++ -> chocolate

2

9

1

5

Albert Webson

@albertwebson

5 years

When I was teaching myself programming in high school, I thought about 𝚜𝚎𝚕𝚏 a lot. Now that I’m a CS & philosophy grad student, I think about self a lot.

Rimnod Filedwire

@nicholdav

5 years

Philosophers People learning OOP in Python 🤝 "what is self?"

2

37

161

0

1

5

Albert Webson

@albertwebson

2 years

@_julianmichael_ @universeinanegg @AliciaVParrish @meloncholist @sleepinyourhat Excellent survey! Really enjoyed it and hope it can be a recurrent thing! After it’s closed it’d be great to make a mobile-friendly page of all the Qs as these are great conference ice breakers :)

0

1

5

Albert Webson

@albertwebson

5 years

「big king」—my new definitive favorite restaurant in all of New England. A truly eye-opening experience at only $55 that rivals any $200 omakase in NYC or any ¥20,000 kaiseki in Kyoto.

1

0

5

Albert Webson

@albertwebson

3 years

I have probably driven @SanhEstPasMoi crazy by repeatedly changing the dataset mixtures, @stevebach on the prompt reviewing guidelines, and @srush on paper writing and figure making. Amazingly, not only did they not mute me, they added me as a co-first-author!

1

0

5

Albert Webson

@albertwebson

3 years

I won't get into a rabbit hole on “meaning” here, but suffice to say that T5 always had a special place in my heart and wow, what an honor to work with @colinraffel . Just an unbelievably nice person and…

1

0

4

Albert Webson

@albertwebson

4 years

Paper: Code & Data: Recorded Talk: Zoom Q&A: Tue. Nov 17 at 11 am EST (Session 7C Lexical Semantics)

GitHub - awebson/congressional_adversary: For our EMNLP 2020 paper “Are ‘Undocumented Workers’ the...

For our EMNLP 2020 paper “Are ‘Undocumented Workers’ the Same as ‘Illegal Aliens’? Disentangling Denotation and Connotation in Vector Spaces”. - awebson/congressional_adversary

github.com

0

1

4

Albert Webson

@albertwebson

3 years

Interlude: If some dataset decisions seem arbitrary, it's 90% likely simply because we didn't have the prompts or a particular datasets split ready at the time of training, or that they don't have GPT-3 baselines to compare to.

1

4

Albert Webson

@albertwebson

5 years

You, a normal person: There is a flying mammal in your apartment. Me, who tries to keep a sleep schedule: Must submit my compute grid jobs before bed. The bat can stay for tonight while she waits for her asylum hearing.

0

1

4

Albert Webson

@albertwebson

2 years

@_jasonwei Congrats!! Although @machelreid might have beat you to the youngest RS thing? :)

1

0

4

Albert Webson

@albertwebson

2 years

Meanwhile, I myself keep referring to the non-plus one as “T0 Vanilla”, and the ++ one as, naturally, “T0 Chocolate”. When I first presented T0 to a Google team, they had just seen FLAN, and @iftenney joked that all LMs should be named after desserts. 3/4

1

0

4

Albert Webson

@albertwebson

8 years

Me trying to sell @OvercastFM to a friend:

0

4

Albert Webson

@albertwebson

4 years

Political euphemisms like “undocumented workers” and “illegal aliens” denote roughly the same group of people, but they carry extremely different connotations. Popular pretrained models, however, often conflate the two in pernicious ways. 1/4

1

4

Albert Webson

@albertwebson

4 months

@rmaruy 昔、（英語圏の）友達とメッセージをしている時、普通に句点を打ったら「怒っている？」とよく言われました。

0

1

Albert Webson

@albertwebson

3 years

The new mixture (what we called Experiment D3 then, now T0++) was a success, and after leading some error analyses (what @BrownNLP taught me well; much more on those analyses in a follow-up paper) and continuing participating in discussions (aka being annoyingly opinionated)…

1

0

4

Albert Webson

@albertwebson

9 years

An entire live blog dedicated to a (potentially historic) snowstorm. @joelhousman should be very happy.

Capital Weather Gang

@capitalweather

9 years

We have started our live blog of tonight's latest data and info for late week storm:

2

44

56

0

1

3

Albert Webson

@albertwebson

5 years

I have not one, not two, but *six* posters inside this carrier (by @996roma , @think_zink , @attraylor , et al.) See you at the NLP symposium by @NYASciences this Friday!

0

3

Albert Webson

@albertwebson

8 years

So this is what Brexit feels like.

0

2

3

Albert Webson

@albertwebson

4 years

Didn’t realize just how Very Online I am until, speaking to an international audience at my #emnlp2020 Q&A, my on-the-fly examples of lexical semantics were all about the 2020 polling errors of hispanics as a monolith and the legislative history of the 2013 grand compromise.

1

0

3

Albert Webson

@albertwebson

9 years

Look at my rings. Not filled. Ever. Cuz screw them.Screw those rings! I don’t live or die by the rings @tiffanyarment http://t.co/uorJgJ80l5

1

2

Albert Webson

@albertwebson

6 years

Tangentially related: teams in the DC Think Tank Softball League also have very clever names. There is no official website so here is one roster from 2010. Obviously my favorite is Dynamic Scorers, bit of a deep cut here…

1

0

3

Albert Webson

@albertwebson

4 years

At a @BayAreaNLP Q&A of the Octopus Test of Bender & Koller 2020: Audience: What if A and B speak different languages, say Japanese and German? @emilymbender : 行きましょう @alkoller : oh shit you know too much German now

1

0

3

Albert Webson

@albertwebson

4 years

“…four years of continuous exposure to the New England elements has left parts of the sculpture—includes painted and lacquered cast bronze… and a stainless steel interior framework—in need of restoration. The extent of the weathering effect on the sculpture was unanticipated.”

0

3

Albert Webson

@albertwebson

2 years

@YiTayML @_jasonwei I almost did this for our lab pre-pandemic. Depending on the city though it can be super expensive upfront to book a large Airbnb (with enough separate beds) so we need to plan it super early.

0

3

Albert Webson

@albertwebson

5 years

@996roma @_ericrosen Add me to this reading group! I also find this paper super intriguing.

0

3

Albert Webson

@albertwebson

9 years

“It’s not on Google Maps. It cannot exist.”

0

1

3

Albert Webson

@albertwebson

7 years

I, for one, will forever remember @SenJohnMcCain ’s coup de grâce to McConnell as one of the greatest moments in American politics.

0

2

Albert Webson

@albertwebson

2 years

@_jasonwei Haha but my paper is not done yet!

1

0

3

Albert Webson

@albertwebson

10 months

@zhansheng Milo dinosaur, blessed or cursed?

2

0

3

Albert Webson

@albertwebson

6 years

@ShriramKMurthi Send every applicant a quiz on sequent calculus, plus a bonus question: “Infix or prefix notation? Discuss.”

1

0

3

Albert Webson

@albertwebson

4 years

While people are learning to cut their own hair, I’m learning to sharpen my own knives. Handy that IRS Notice 1444, definitely not a piece of electoral campaign promotion, served as a unit test.

0

3

Albert Webson

@albertwebson

2 years

I was kinda proud of myself for having been all caught up with NLP Twitter for a week and still haven’t been scooped yet, and then I realized wait I have been looking at the wrong timeline. Anyway now is a good time to declare tweet bankruptcy and see you @albert @sigmoid .social!

0

3

Albert Webson

@albertwebson

4 years

@ShriramKMurthi @jtompkin @mind_realms I did not see this tweet when I took this screenshot. I was honestly just super impressed by James during CS143 final presentations.

2

0

3

Albert Webson

@albertwebson

3 years

@ShriramKMurthi I agree that they’re often overdone in the literature but how would you rephrase this abstract? (Not trying to argue but seriously asking for advice here.)

Do Prompt-Based Models Really Understand the Meaning of their Prompts?

Recently, a boom of papers has shown extraordinary progress in zero-shot and few-shot learning with various prompt-based models. It is commonly argued that prompts help models to learn faster in...

arxiv.org

1

0

3

Albert Webson

@albertwebson

8 years

@MKupperman @siracusa I have no doubt that Trump can shoot lasers but what if terrorists have Reduce Motion turned on?

0

3

Albert Webson

@albertwebson

3 years

Amazing that both our paper (which finds that LMs cannot understand instructions; ) and this paper (which finds that instruction-tuning improves zero-shot abilities) were out on arXiv on the same day! Some clarifications: 1/4

Jason Wei

@_jasonwei

3 years

New paper: We teach a 137B parameter language model to follow instructions using “instruction tuning”, which finetunes the model on a collection of tasks described via instructions. This improves the zero-shot abilities of the model on unseen tasks.

6

62

290

1

0

3

Albert Webson

@albertwebson

7 years

@aaronecarroll Definitely not okay unless she is also into Scotch, in which case still not okay because it’s criminal to chug a good glass of Scotch.

0

3

Albert Webson

@albertwebson

10 years

20 hours well spent. It’s actually quite addictive once you start to seriously assemble it. http://t.co/SJb1upHOT1

0

2

Albert Webson

@albertwebson

4 years

Oh wow I’m late to the party, but @mind_realms this is truly brilliant!

Shriram Krishnamurthi 🟤 🏴‍☠️ 👨🏽‍🏫 🚴‍♂️ 🏏

@ShriramKMurthi

4 years

Hey @BrownCSDept alums: you might enjoy this. ("What's The (Quaran)Tea?": A Brown University CS Department Quarantine Tribute)

1

12

1

0

2

Albert Webson

@albertwebson

9 years

@RyanHFriend @ssarahhrakin @deion_jordan You gotta find your favorite people on the Internet. For me: @siracusa @marcoarment @caseyliss

0

2

Albert Webson

@albertwebson

4 years

I especially encourage people without a traditional CS background to apply. (I majored in political science and math in undergrad. No first-author paper.) AI & NLP would be fatally misguided if we don’t take domain knowledge from adjacent fields seriously.

Nicholas Tomlin

@NickATomlin

4 years

I just want to debunk the very specific claim that multiple first-author papers are *necessary* for admissions to top PhD programs in NLP/AI, because I think this claim can be harmful (7/N)

1

5

0

2

Albert Webson

@albertwebson

2 years

@_jasonwei Also as mentioned in my email to you and @tallinzen , I’m not sure whether we should have arithmetic as a necessary condition of general intelligence. Plenty of humans (esp. w/o formal education) cannot do arithmetic but it seems wrong to say they’re not generally intelligent.

1

0

2

Albert Webson

@albertwebson

2 years

@AndrewLampinen Thanks so much Andrew! Needless to say, much of this paper’s discussion is heavily influenced by your group’s recent work. I also wanted to say more about your recursive grammar paper in Appx. A, although I haven’t figured out how to polish the writing to fit into the main text.

1

0

2

Albert Webson

@albertwebson

7 years

This perhaps solves @BradyHaran ’s paper rolls in a coffin mystery.

0

2

Albert Webson

@albertwebson

7 years

@caseyliss @ismh @imyke #caseywasright

0

2

Albert Webson

@albertwebson

9 years

You know, Trump’s necktie collection is really not proportional to his net worth. #GOPDebate

0

1

Albert Webson

@albertwebson

6 years

In an old draft of our lab logo, I deliberately broke the “do not resize elements” and “do not reconstitute text” rules because the official guideline has some questionable choices on spacing and alignment.

Andy Baio

@waxpancake

6 years

My favorite part of every brand guidelines is the precious "Logo Misuse" section.

188

3K

11K

0

2

Albert Webson

@albertwebson

9 years

First impression: not nearly good enough to replace my pen and paper.

0

2

Albert Webson

@albertwebson

9 years

My 900th tweet is hereby dedicated to “the most eloquent speech ever delivered in Congress”. Good night, America. http://t.co/74flKj9TL9

0

2

Albert Webson

@albertwebson

9 years

Branching out my research on beverages to the world of tea.

0

2

Albert Webson

@albertwebson

4 years

@najoungkim Worst case scenario probably just find a quiet spot outside and zoom over your phone (or laptop tethered to a phone).

1

0

2

Albert Webson

@albertwebson

9 years

What if Obama nominates a moderate conservative who pledges to overturn Citizens United and McCutcheon? #Scalia

0

2

Albert Webson

@albertwebson

9 years

Trump: ISIS is better at Interneting than we do. #GOPDebate

0

2

Albert Webson

@albertwebson

4 years

All of my shirts (and some jackets too) are wearing out at the same spot. Maybe I will have to learn sewing too… (But no elbow patches.)

1

0

2

Albert Webson

@albertwebson

9 years

@ssarahhrakin @deion_jordan @RyanHFriend For the love of God, that’s what Slack is for.

0

2

Albert Webson

@albertwebson

9 years

Rubio used the identical talking point four times. His speech writers needs to step up their games. #GOPDebate

0

2

Albert Webson

@albertwebson

2 years

Again, sorry for my inability to better word my gratitude. I’m too stressed out and I don’t know what to say. Thank you to all of my co-authors.

1

0

2

Albert Webson

@albertwebson

10 years

I wonder when Tony Blair supported the Iraq War, did Bush say “Tony you dat real mvp”?

0

1

2

Albert Webson

@albertwebson

7 years

TFW you drive from 57th all the way down to 24th St without hitting a single traffic light.

0

2

Albert Webson

@albertwebson

9 years

Christine said this was totally staged by the Democratic National Committee. http://t.co/gbTLKMYUsu

0

1

2

Albert Webson

@albertwebson

5 months

@a__tomala it’s a meeting with many submeetings except you’re free to quit whenever

1

0

2

Albert Webson

@albertwebson

3 years

PS: When T5 first came out, I was super impressed by some of its task prefixes being in natural language (e.g., "translate English to German: ", "summarize: "). It has profound implications on the meaning of words like "summarize", orthogonal to whether the model is grounded.

1

0

2

Albert Webson

@albertwebson

3 years

Increasingly, the pressing research questions are no longer whether LMs are able to represent linguistic feature X, but exactly when and how do they actually use X. My labmate Charlie’s new paper sheds some light on this new direction. Take a look!

Brown NLP

@Brown_NLP

3 years

Checkout our new @iclr_conf paper w/ @tallinzen : We found that models use “good” features (e.g. S-V agreement) over “bad” features (e.g. lexical overlap) when they’re easier to extract from the model’s pre-trained representations. Paper: (1/3)

3

8

65

0

2

Albert Webson

@albertwebson

3 years

A likely misinterpretation is that smaller models (our paper) can’t understand instructions while larger models (they claim ⩾ 68B) can. But note that our measurement of understanding does not rely on zero-shot performance but *few-shot learning speed* 2/4

Brown NLP

@Brown_NLP

3 years

Concretely, models trained with intentionally irrelevant prompts and pathologically misleading prompts can learn just as fast as the those trained with instructively “good” prompts. 2/4

1

2

28

2

Albert Webson

@albertwebson

9 years

. @marcoarment @tiffanyarment As a Top Four member, may I sponsor an episode on top four postcards of Hello Internet?

1

0

2

Albert Webson

@albertwebson

9 years

Yeah let’s totally go to an Apple Store to try out the Apple Pencil. *arrive at the parking lot* I made a mistake.

0

2

Albert Webson

@albertwebson

3 years

I know we’re supposed to tweet about ICLR right now but sorry I need to take a moment and brag about why my department is awesome. Also what is even going on with @jtompkin and TWD here.

Kathi Fisler

@KathiFisler

3 years

An example of why I love working at @BrownCSDept -- this is our second year of virtual chorus/musical sendoff to graduates. While it isn't how we'd like to celebrate graduation, it does show part of what makes us a fun community.

0

1

19

0

2

Albert Webson

@albertwebson

3 years

@sleepinyourhat I have told/discussed this paper with every labmate/NLP friend of mine over the past year. Together with @a_stadt ’s Roberta eventually prefers linguistic gen. paper, I think this is among the most fascinating questions in NLP right now and thank you all for writing them!

0

2