Alex Wang Profile
Alex Wang

@W4ngatang

1,672
Followers
418
Following
9
Media
133
Statuses

LLM Evaluation at Cohere

Joined November 2012
Don't wanna be here? Send us removal request.
@W4ngatang
Alex Wang
2 years
Cohere is growing! If you’re passionate about building world-class LLMs and delivering them to customers, you should apply. I’m specifically looking for folks with experience in NLP data, eval, and annotation. Check out the roles here: DMs are open!
8
30
264
@W4ngatang
Alex Wang
6 years
Been working with BERT, but wish you could talk to it? Check out @kchonyc and my tech report on babbling from BERT, training free! Demo:
3
37
173
@W4ngatang
Alex Wang
5 years
Our new work "Asking and Answering Questions to Evaluate the Factual Consistency of Summaries" does exactly that. We use question generation and question answering models to evaluate whether summaries are factually consistent w/ the source text.
Tweet media one
4
39
168
@W4ngatang
Alex Wang
7 months
🎉🎉🎉 Also, I'm hiring for an MLE/SWE! If you want to build LLMs with @cohere and are interested in developing challenging model evaluation settings + curating high-quality data, please reach out! ICYMI: We also just opened our NYC office 👀
@lmarena_ai
lmarena.ai (formerly lmsys.org)
7 months
[Arena Update] @cohere 's Command R is now top-10 in Arena leaderboard🔥 It's now one of the best open models reaching the level of top proprietary models. We find the model great at handling longer context, which we plan to separate as a new category in Arena very soon.
Tweet media one
14
69
393
8
22
158
@W4ngatang
Alex Wang
2 years
Hello. I am popping up from Twitter lurking to claim the "Longest Time Between Life Update and Actually Announcing It" award: I graduated from NYU in this May and started working at @CohereAI in August as a tech lead for Data+Evaluation!
6
4
108
@W4ngatang
Alex Wang
7 months
📣 We heard you liked the open weights we dropped last month, so we're doing it again, except more. 🎉 Introducing Command R+! 🎉 Really proud of what we've built and excited to see what y'all build on top of this!
@aidangomez
Aidan Gomez
7 months
⌘R+ Welcoming Command R+, our latest model focused on scalability, RAG, and Tool Use. Like last time, we're releasing the weights for research use, we hope they're useful to everyone!
26
187
982
6
10
105
@W4ngatang
Alex Wang
2 years
Already tired of months-old papers at ACL? Looking for a hot, new preprint? Check out SQuALITY 💨🍵! SQuALITY is a long document, question-focused summarization dataset. Unlike many existing summ. datasets SQuALITY summaries are fully crowdsourced! (1/8)
1
12
94
@W4ngatang
Alex Wang
2 years
It's true, I successfully defended my dissertation yesterday! Big thanks to @hhexiy , @ml_perception , @JoaoSedoc for serving on the committee, and an especially big thank you to my advisors @sleepinyourhat and @kchonyc for advising and supporting me over the past five years.
@sleepinyourhat
Sam Bowman
2 years
Congrats to @W4ngatang for a successful dissertation defense today!
Tweet media one
Tweet media two
13
1
131
3
5
85
@W4ngatang
Alex Wang
1 year
there is an unreasonable number of "alex wang"s in the LLM space, between myself at Cohere, an Alex Wang at Perplexity, @alexandr_wang at Scale...truly blursed. s/o Alex L. Wang for once maintaining a disambiguation of "alex wang"s in ML
7
1
74
@W4ngatang
Alex Wang
8 months
Excited to share this work with the world, both the results and the actual model weights. Looking forward to seeing what the community will build with this! Stay tuned for more! ✍️details: ⚖️weights: 🤖chat:
@aidangomez
Aidan Gomez
8 months
⌘-R Introducing Command-R, a model focused on scalability, RAG, and Tool Use. We've also released the weights for research use, we hope they're useful to the community!
31
186
1K
0
8
64
@W4ngatang
Alex Wang
1 year
I live in Toronto now and #ACL2023NLP happens to be here too! If you want to chat about LLMs, where to eat/drink in Toronto, or opportunities at @cohere , feel free to reach out or stop by the Cohere booth!
3
1
66
@W4ngatang
Alex Wang
7 months
This drove me crazy for a while: We had internal experiments showing RM > LLM for evaluation, which felt really counterintuitive to me. Nice to get external confirmation, and thanks for building the benchmark @natolambert ! :)
@natolambert
Nathan Lambert
7 months
Thx to @cohere 's SOTA reward model, LLM-as-a-judge isn't SOTA on RewardBench :)
Tweet media one
4
17
114
1
6
66
@W4ngatang
Alex Wang
11 months
I'm at #NeurIPS2023 the whole week! Shoot me a message if you wanna catch up/chat about LLMs, evaluation, @CohereForAI + @CohereForAI , or catch me at the @cohere booth from 3-6pm CT today!
0
5
56
@W4ngatang
Alex Wang
7 months
If you're interested in working with us shoot me a DM or email about yourself and something cool you've worked on recently! I'm looking for people interested in LLM evaluation and data creation, but we have plenty of other roles. The new NYC office is sweet!!
Tweet media one
3
2
43
@W4ngatang
Alex Wang
7 months
Amazing release from @Nils_Reimers and many others! 🎉🎉
@aidangomez
Aidan Gomez
7 months
Introducing Rerank 3! Our latest model focused on powering much more complex and accurate search. It's the fastest, cheapest, and highest performance reranker that exists. We're really excited to see how this model influences RAG applications and search stacks.
Tweet media one
23
124
743
4
0
39
@W4ngatang
Alex Wang
5 years
The SustaiNLP2020 (at @emnlp2020 ) Call for Submissions is up at . The task evaluates on SuperGLUE and energy efficiency as measured by @PeterHndrsn 's library. Come develop more energy efficient NLP models! Deadline Aug 28 and baseline code available soon!
2
12
33
@W4ngatang
Alex Wang
5 years
New ish paper for ACL2019 comparing a diverse set of tasks for pretraining sentence encoders and augmenting existing pretrained LMs, made possible by great collaborators from Brown, Google, JHU, and many more, as well as oodles of compute.
1
2
27
@W4ngatang
Alex Wang
5 years
Come hear me attempt to recap a couple years of progress in NLP in 5m, or tell me your favorite glue puns at the poster session immediately afterwards!
@AIatMeta
AI at Meta
5 years
#NeurIPS2019 , catch the spotlight on our recently created SuperGLUE benchmark which helps language understanding researchers set a new, higher bar for #NLP research. It's Wed 4:55-5:00 PM West Ballrooms A + B. Read more: Benchmark:
Tweet media one
0
28
109
0
1
24
@W4ngatang
Alex Wang
1 year
really excited for this work and honored to have helped mentor the first cohort of C4AI Scholars! Nice work Max!
@maxdoesresearch
Max Marion
1 year
📢New Pretraining Paper 📢 Delighted to share our new paper coming out of @forai_ml : "When Less is More: Investigating Data Pruning for Pretaining LLMs at Scale" Paper: w/ @ahmetustun89 @luizapzbn @W4ngatang @mziizm @sarahookr
9
25
87
0
4
24
@W4ngatang
Alex Wang
2 years
Really cool work from @luizapzbn and @forai_ml ! Keep an eye on this lab, buy your stonks in it now!
@luizapzbn
Luiza Pozzobon
2 years
🚨PREPRINT ALERT🚨 "On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research" w/ Beyza Ermis, @PSH_Lewis , and @sarahookr Paper: Code:
Tweet media one
3
34
150
1
1
21
@W4ngatang
Alex Wang
4 years
Excited to be co-hosting a mentoring session on "establishing collaborations and networking" and "managing up" with @ryanzhumich and @yangfeng_ji for #acl2020nlp on 7/8 at 12pm ET. I imagine there will be a lot of learning from this session, especially by me😅
2
3
22
@W4ngatang
Alex Wang
2 years
Sam is an amazing advisor and human being! This is incredibly deserved. The lab is also looking to hire researchers at various levels, and that's a great opportunity to work with Sam and the rest of us!
@sleepinyourhat
Sam Bowman
2 years
I just got tenure! Wheee! Predictable-but-heartfelt gratitude thread:
131
17
2K
1
2
16
@W4ngatang
Alex Wang
6 years
Excited to share this work! We look at (1) methods for measuring bias in words embeddings applied to sentence encoders, and find that these methods don't straightforwardly apply (2) tests for nuanced social biases that are difficult or impossible to study at the word level
0
0
14
@W4ngatang
Alex Wang
4 years
The birds-of-a-feather session on generation at #acl2020nlp was awesome! Discussion was lively and spawned *multiple* followup discussions. Kudos to @sebgehr , @gh_marjan , and another moderator whose name I missed! There's another one in a few hours (5pm ET), highly recommended!
1
2
14
@W4ngatang
Alex Wang
7 months
@cohere Oh this is not an April Fool's thing 🫠
1
0
13
@W4ngatang
Alex Wang
4 years
I'll also be presenting "Asking and Answering Questions to Evaluate the Factual Consistency of Summaries" (joint work with @kchonyc and @ml_perception ) at sessions 9A (7/7, 1pm ET) and 10B (7/7, 5pm ET). Come chat and hang out!
1
5
13
@W4ngatang
Alex Wang
7 months
@PSH_Lewis Patrick Lewis is a legend
0
0
11
@W4ngatang
Alex Wang
2 years
The past four months have been a blitz of fast, fun, and cool projects. And I've been fortunate to learn from @egrefen @Nils_Reimers Phil Blunsom and many others. There's cool stuff from Cohere on the horizon that I'm excited to share soon.
1
0
11
@W4ngatang
Alex Wang
6 years
Come join our lab! Sam does really neat work (disclaimer: I am biased).
0
2
9
@W4ngatang
Alex Wang
2 years
Feel free to reach out if you want to talk about LLMs, Cohere, or anything else!
0
0
10
@W4ngatang
Alex Wang
2 years
This is being presented at @emnlpmeeting on Friday at Session 2! Sadly none of us ( @yzpang97 , @_angie_chen , @zhansheng , @sleepinyourhat ) could make it to Abu Dhabi, but feel free to reach out if you have questions or want to talk about summ., data quality, or crowdsourcing!
@W4ngatang
Alex Wang
2 years
Already tired of months-old papers at ACL? Looking for a hot, new preprint? Check out SQuALITY 💨🍵! SQuALITY is a long document, question-focused summarization dataset. Unlike many existing summ. datasets SQuALITY summaries are fully crowdsourced! (1/8)
1
12
94
0
5
9
@W4ngatang
Alex Wang
7 months
Come build on Command R+ and Rerank 3 with us in our offices, especially in NYC!!
@aidangomez
Aidan Gomez
7 months
We're throwing 4 hackathons at each of our offices around the world!! If you're in NYC, London, Toronto, or SF come hang out with us and build with Command R and R+ 🛠️
13
27
228
0
0
8
@W4ngatang
Alex Wang
5 years
@phu_pmh @alfcnz Finishing up for the day and the workshop, panel with @Tetreault_NLP (who had a great talk right before this), @functiontelechy , @sleepinyourhat , @xkianteb , @phu_pmh , and Shubham Chandel on practical ways for beginners to get started in AI #nycaiworkshop
Tweet media one
1
2
8
@W4ngatang
Alex Wang
2 years
There's a lot to do in using the multi-references, developing efficient human evaluation of long texts, and enabling long-text summ. with prompting. If this sounds interesting to you, check out the links below: paper: data: (7/8)
1
0
7
@W4ngatang
Alex Wang
6 years
I participated in this and it was a fantastic introduction to doing research. Highly recommended
@ChrisGPotts
Christopher Potts
6 years
We're now accepting applications for the 6th CSLI Undergraduate Summer Internship Program, which places students in Stanford labs for 8 weeks of mentored research. Housing and a stipend provided. Prior research experience not required:
4
71
78
0
0
7
@W4ngatang
Alex Wang
6 years
Got a Google home mini, immediately hit it off with Alexa #ModernLove #Alexa #GoogleHome
0
0
7
@W4ngatang
Alex Wang
6 years
Nihilist duolingo @shitduosays
Tweet media one
1
0
6
@W4ngatang
Alex Wang
7 months
@jaa_campos Jon Ander is a legend
0
0
6
@W4ngatang
Alex Wang
8 months
Real cool title and really cool result!
@FrancesDing
Frances Ding
8 months
Protein language models (pLMs) can give protein sequences likelihood scores, which are commonly used as a proxy for fitness in protein engineering. But what do likelihoods encode? In a new paper (w/ @JacobSteinhardt ) we find that pLM likelihoods have a strong species bias! 1/
Tweet media one
10
59
249
0
1
5
@W4ngatang
Alex Wang
2 years
Also, while I have you here, consider taking the NLP Community Metasurvey! Having an opinion is fun and seeing how your opinion lines up with the rest of the community is extra fun!
1
0
3
@W4ngatang
Alex Wang
5 years
@vincentsunnchen @SnorkelML Thanks so much! We're big fans of @SnorkelML :)
0
0
4
@W4ngatang
Alex Wang
2 years
0
0
4
@W4ngatang
Alex Wang
5 years
Inspecting the generated questions, we were surprised to find that they are often fluent, on-topic, and sensible. Nvidia has a great paper pushing on the question generation capabilities of existing models: .
1
1
4
@W4ngatang
Alex Wang
6 years
Into the Spiderverse confirms: grad student Spiderman is the best Spiderman #Spiderman #SpiderVerse #MarvelsSpiderMan #gradlyfe
0
0
4
@W4ngatang
Alex Wang
2 years
SQuALITY is question-focused and multi-reference: For each story there are 5 questions, and for each question there are 4 reference summaries. The responses are highly diverse, an aspect of summarization that isn't well-represented in existing single-reference datasets. (4/8)
Tweet media one
1
0
4
@W4ngatang
Alex Wang
5 years
This is work done while interning with @kchonyc and @ml_perception at FAIR, to be published at #acl2020 . Preprint available now, code to come soon!
0
0
4
@W4ngatang
Alex Wang
2 years
Probably one of the best decisions I've made in the past five years has been to do my PhD at NYU. It's a great place to do cutting-edge ML and NLP research. Not to mention it's in NYC!
1
0
3
@W4ngatang
Alex Wang
6 years
God is dead, I found him drowned in the 2nd Ave ice flows
0
0
4
@W4ngatang
Alex Wang
2 years
We spent several months working with Upwork writers and undergraduates to create summaries of Project Gutenberg stories (4-6k words long). We put a big focus on developing a protocol for collecting text responses that is cost-efficient while also maintaining quality. (3/8)
Tweet media one
1
0
3
@W4ngatang
Alex Wang
4 years
Outside of these sessions, I'd love to chat and flex my "establishing collaborations and networking" muscles. Feel free to reach out!
0
1
3
@W4ngatang
Alex Wang
7 months
0
0
3
@W4ngatang
Alex Wang
2 years
SQuALITY is relatively small, but we think it's high-quality and a good benchmark test set for summarization. (6/8)
1
0
3
@W4ngatang
Alex Wang
5 years
This should be fun! Look out for more updates about this soon!
0
0
3
@W4ngatang
Alex Wang
2 years
@d_aumiller Email me at alexwang @cohere .com!
1
0
3
@W4ngatang
Alex Wang
2 years
Stay tuned for what I'll be doing next!
1
0
2
@W4ngatang
Alex Wang
4 years
#EMNLP2020 is great, but it can be challenging to engage with so much research when it's getting late and you've spent most of the day "at" the conf... shout out to @gregd_nlp for putting the Language Generation session on his back and keeping the questions+discussion flowing😅
0
0
3
@W4ngatang
Alex Wang
2 years
Human evaluators consider human-written summaries to be substantially better than summaries from state-of-the-art supervised summarization systems along several dimensions. Also, automatic metrics are a poor indicator of model quality for SQuALITY. (5/8)
Tweet media one
1
0
3
@W4ngatang
Alex Wang
2 years
The group is very collaborative and supportive, and is pursuing excitingly risky and fun lines of research. I highly recommend collaborating with the folks here and visiting whenever you're able!
1
0
2
@W4ngatang
Alex Wang
6 years
Mondays are dumb. Cloudy Mondays are dumb. Damn, cloudy Mondays are dumb. Those damn cloudy Mondays are dumb. Dominate those damn cloudy dumb Mondays. Tl;dr: dom dem dam dim dum days #MondayMotivation
0
0
3
@W4ngatang
Alex Wang
2 years
Common approaches for building summarization datasets (scraping, developing heuristics) have led to unexpected amounts of noise in the datasets. Crowdsourcing summaries is expensive (and subsequently understudied), but one way to mitigate noise, if done carefully. (2/8)
1
0
2
@W4ngatang
Alex Wang
5 years
Using NLP models to evaluate generated text is a promising direction, but it's clear there is a lot of (exciting!) work to be done to make these methods reliable.
1
0
2
@W4ngatang
Alex Wang
6 years
New lunchables are insane
Tweet media one
0
0
2
@W4ngatang
Alex Wang
5 years
This method correlates much better with human judgments of consistency than existing metrics on the XSUM and CNN/DM summarization datasets. Our method is especially effective on the latter, likely due to the somewhat extractive nature of the dataset.
1
0
2
@W4ngatang
Alex Wang
5 years
0
0
2
@W4ngatang
Alex Wang
6 years
somewhere in Singapore, there's an actual real estate dynasty that thinks Crazy Rich Asians is the most terrifying and informative movie of 2018
0
0
2
@W4ngatang
Alex Wang
1 year
0
0
1
@W4ngatang
Alex Wang
8 months
1
0
1
@W4ngatang
Alex Wang
4 years
Hongyao and Eric TA'd several of my classes, and I can attest that they are super smart and kind people working on exciting problems. I remember talking with Hongyao about strategic Doodle voting, the lessons of which I continue to use today. Congrats @hongyaoma and Eric!
@AcmSIGecom
ACM SIGecom
4 years
The ACM SIGecom Dissertation Award for 2019 goes to Hongyao Ma @hongyaoma , with honorary mentions going to Rediet Abebe @red_abebe and Eric Balkanski. Read more about their dissertations here:
2
3
48
0
0
1
@W4ngatang
Alex Wang
7 months
@hu_yifei @CohereForAI Hmm, send me your prompt?
1
0
1
@W4ngatang
Alex Wang
7 months
@rajammanabrolu @natolambert Mostly believing in the magic of a general purpose LLMs working better than a smaller task specific model, nothing especially principled
0
0
1
@W4ngatang
Alex Wang
2 years
by which I mean @yzpang97 and @zhansheng
0
0
1
@W4ngatang
Alex Wang
2 years
@joechoochoy @idavidrein Anecdotally, there were some activities (mostly cognitively intensive games) where i am starving afterwards but I've only really been thinking
0
0
1
@W4ngatang
Alex Wang
5 years
On the other hand, we find that the bottleneck in our metric is due to the QA model breaking down, despite the models being pretrained and finetuned on quite similar data sources as the test environment.
1
0
1
@W4ngatang
Alex Wang
5 years
0
0
1