Freshly added to my .bash_profile:
alias cdd="cd ../.."
alias cddd="cd ../../.."
alias cdddd="cd ../../../.."
I'm not lazy, I'm... efficient... right??
How does quantization affect multilingual LLMs? ๐
For wide adoption, multilingual LLMs must be highly-performant *and* lightweight. ๐ ๐ชถ We analyze SOTA multilingual LLMs in 23 languages under various quantization techniques to find out!
๐
(mostly positive!) Reflections as a female AI researcher at
#NeurIPS2023
part 1 of ??
A man from unnamed-but-very-well-known-AI-company approached me near the booth. He asked about my research and biggest challenges in multilingual NLP.
A rapidfire back-and-forth ensued: 1/5 ๐งต
Just trained an MT model. The output for every test sentence is:
" I & amp ; apos ; m sorry . I & amp ; apos ; m sorry . I & amp ; apos ; m sorry . I & amp ; apos ; m sorry . I & amp ; apos ; m sorry ."
It's not your fault, little buddy! It's me, not you!
I was pleased to be treated as an equal, and for the opportunity to sharpen my intellectual battle swordโ๏ธ๐คบ
(and proud that I was deffo ๐ฏ๐ฏ๐ฏ correct ๐คช๐โโ๏ธ๐๏ธโโ๏ธ๐โโ๏ธ๐ต๏ธโโ๏ธ๐) 5/5
I'm on the job market! (industry/post-doc/faculty) I work on multilinguality and low-resource NLP, with a focus on computational efficiency. Please donโt hesitate to reach out with opportunities (DM/email)! Applying broadly, flexible location! ๐๏ธโ๏ธ๐๏ธ๐ดโ๏ธ
Engagement posts are sooo clichรฉ -- so we trained a neural language model to write ours:
Engaged Engaged Engaged for her big big big move on the big fella hasn't even when we were top-notch! from the happiest and her unravel and her *<expletive>* today๐คฃ๐คฃ๐คฃ๐คฃ
#princesscut
Congratulations to Kelly Marchisio
@cheeesio
(advised by Philipp Koehn) on successfully defending her
@JHUCompSci
PhD thesis "Multilinguality from Embedding Spaces: Algorithmic, Geometric and Data Considerations."
Kelly will join
@CohereAI
@HopkinsEngineer
@mayhewsw
Expected date to run first inference is Sept 2 - weโre currently setting up our eval suite, so remains to be seen. I have high hopes for this one
Introducing โจMini-Model Adaptationโจ - a new parameter- and compute-efficient method for rapid adaptation of pretrained models to new languages! ๐งต1/5
But they donโt with me, and I donโt with them. I have brilliant female computer scientist friends, we just donโt tend to engage with each other this way.
I left thinking โWOAH that was aggressive! But heโd do the same if I were male.โ 4/5
Multilinguality is something that is crucial for equitable utility of this technology. We want our models to work for as many people, organizations, and markets as possible. We perform strongly across 10 languages and we're eager to expand this further.
Amazon and
@HopkinsEngineer
announced the first PhD fellowships and faculty research awards recipients as part of the JHU + Amazon Initiative for Interactive AI. Learn why Alexa AI VP
@natarajan_prem
says these projects will help drive new advances in AI.
#ArtificalIntelligence
We all get a little *confused* sometimes ๐ซข๐ซจ๐ตโ๐ซ - joint work with
@seb_ruder
@weiyinko_ml
Alex Bรฉrard, Thรฉo Dehaze, hot off the press! โจ๏ธ
Understanding and Mitigating Language Confusion ๐ตโ๐ซ
User: ยฟDe quรฉ trata nuestro artรญculo?
LLM: We analyze one of LLMsโ most jarring errors: their failure to generate text in the userโs desired language.
๐
๐ป
Our prioritization of multilinguality extends even to our tokenizer. Better tokenization -> better representations -> better cost-efficiency for you! ๐ธ
One subtlety worth mentioning is how significant the tokenizer is to the cost to use models in non-english languages. Our tokenizer is meaningfully better than others at the 9 non-English languages, achieving up to a 2x effective cost reduction to use.
The ability to extract accurate translation dictionaries from monolingual embedding spaces depends critically on their geometric similarity--"degree of isomorphism." We address this root-cause of faulty X-lingual mapping with โจIsoVecโจ
#EMNLP2022
๐งต1/N
He left, then returned for sources. I left, pleased for standing my ground, & that the โbattleโ had *happened*. Let me explain:
Respectful intellectual argument is a valuable skill to be honed. My male colleagues do it to each other. They hang, they banter & spar, they hack. 3/5
Hilarious that this pops up now while Iโm at EMNLP. Nine years ago - Iโd coded my first โhello worldโ only about 6 weeks earlier - my my how things have changed! ๐ป ๐ ๐ค
C4AI Command R+ is a state-of-the-art RAG-optimized model with advanced tool use to automate sophisticated tasks, including multi-hop tool use. โจ
Command R+ is optimized for general reasoning and excels at multilingual performance evaluated across 10 languages. ๐
Headed to โจNAACL 2022 โจtomorrow! Looking forward to an exciting week of chats about multilinguality and low-resource MT. Come say โhiโ if you see me!
Might supervised and unsupervised MT be mutually-beneficial? In our
#NAACL2022
work, we ask whether the training methods result in systematically different output beyond what is visible via quality metrics like adequacy or BLEU. ๐งต1/4
Arrived in New Orleans for
#NeurIPS2023
! Iโll be at the Cohere booth tomorrow (Mon) 2:30-3:30pm, and 9-11:30am Tues-Thurs - come by if you want to chat about anything and everything multilingual NLP!
โ-R
Introducing Command-R, a model focused on scalability, RAG, and Tool Use. We've also released the weights for research use, we hope they're useful to the community!
Took a break from thesis-writing on Monday to visit
@esalesk
at JHU's Edible Book Festival, presenting her edible rendition of our advisor's book! Cake recipe generated with Bard! ๐ค
@jhuclsp
I'm excited to announce that I've joined
@cohere
to help make LLMs more multilingual!
Itโs crazy how the capabilities of NLP models have evolved over the last years. Iโm thrilled to work with a team full of smart, dedicated and kind individuals to push the boundaries of LLMs.
Introducing โจMini-Model Adaptationโจ - a new parameter- and compute-efficient method for rapid adaptation of pretrained models to new languages! ๐งต1/5
From Monday until early October, I'll be interning with
@artetxem
at Meta AI in London. If you'll be in ๐ฌ๐ง in the next few months, let's meet up!
โฆ
Him: โI donโt believe thatโ
Me: cites sources
โฆ
Him: โWith infinite computation, thatโs not trueโ
Me: โSure, but we live in reality. Infinity isnโt real.โ
โฆ
etc. etc. etc. 2/5
Is RLHF effective for aligning multilingual LLMs? ๐ค
Our work studies multilingual preference optimization to train a new SOTA multilingual LLM, advancing the frontier of alignment techniques to 23 languages covering half the worldโs population ๐! ๐งต
๐
(1) Automatic metrics severely underestimate damage from quantization. โ ๏ธ
While automatic evals estimate deterioration of a quantized model relative to FP16 across tasks at a modest โ0.3% for French and โ1.7% for Japanese. Humans report the drops as โ16.6% and โ16.0% ๐๐
Iโm here in Toronto!
#ACL2023NLP
Iโll present Mini-Model Adaptation in a virtual poster session tomorrow 11:00โ12:30 Toronto time, and again in-person at the RepL4NLP workshop on Thursday. Come say ๐!!
@yihong_thu
@PSH_Lewis
@artetxem
Life-hack: I watched a 3-minute YouTube video about steaming milk and now Iโve been complimented at the office two days in a row and called a โpro.โ Please spam me other ways I can fool others into thinking Iโm competent in 3mins or less!!!
(Voilร โ๏ธ)
Itโs my decade codeaversary! Right around this time 10 years ago, I coded my first line: a โhello worldโ in C. My life has never been the same ๐ฅฐ
I'll be presenting our recent work "How Does Quantization Affect Multilingual LLMs?" at the Cohere4AI ML Efficiency Group on Friday at noon Eastern (GMT-4). Come join in on the fun!
To join, please fill out the form:
At the ML efficiency group, excited to have
@cheeesio
to present work on 'How does quantization affect multilingual LLMs'. Quantization is ever present in the large model stack -- but it can have unintended impacts on quality. Join in to find out :)
Low-resourcedness + domain/script shift + noise dramatically โฌ๏ธโฌ๏ธ geometric similarity of word embedding spaces.
#EMNLP2022
We improve BLI on non-isomorphic spaces using a new optimal transport-based graph-matching algorithm. 9am Sunday in Abu Dhabi! 1/4๐งต
I'll be presenting two papers starting 30mins from now at
#EMNLP2022
! โจIsoVecโจ (below) as a poster, and ๐BLI... using Graph Matching via Optimal Transport ๐ (tweeting yesterday) live in Hall B! Join me!
@jhuclsp
@n_verma1
@AliSaadEldin
@kevinduh
Carey Priebe, Philipp Koehn
The ability to extract accurate translation dictionaries from monolingual embedding spaces depends critically on their geometric similarity--"degree of isomorphism." We address this root-cause of faulty X-lingual mapping with โจIsoVecโจ
#EMNLP2022
๐งต1/N
Day 22-30ish: The full draft is complete! A few hours per week turned into all-day-every-day for a week or two, between adding intro/abstract/future work/conclusion, and making requested edits from my committee. I defend *tomorrow* at 2pm Eastern at JHU!
With 1โฃ week left in our MMLU Translation sprint, we are 22% through the task. โ๏ธ
Korean, Arabic, Vietnamese, Amharic, German, Indonesian, Chinese, Sinhala, Nepali, and Swedish are all closing in on the goal! ๐ฅ ๐
Speak these languages? Join us:
(2) Languages are disparately affected by quantization: non-Latin script languages are impacted worst ๐ฅบ
We knew they were poorly represented in training data & tokenization, causing โฌ performance and โซ cost/latency. Now we know theyโre treated unfairly in quantization, too ๐
(3) Challenging tasks degrade fastest. ๐
For example, mathematical reasoning (MGSM) and generative tasks as evaluated by humans and LLM-as-a-Judge suffer a large performance penalty under quantization.
Our Findings of EMNLP 2021 paper, โAn Analysis of Euclidean vs. Graph-Based Framing for BLI from Word Embedding Spacesโ, is now public:
Code:
Paper:
*thread* 1/5
Computer Science @ JHU is hiring in ALL areas:
๐ Apply early for flexible scheduling + potential early offer.
Our department is expanding fast, especially in AI-adjacent fields.
Come join us!
Finally! Finished my 2019 New Years resolution ๐ฅณ๐โ๏ธ Whatโs next? Iโve got Hogbenโs Mathematics for the Million on the list. (And please excuse the crude coffee mug - a neural network named the color ๐ )
For our 4th date, Martin and I took apart a computer together. For our anniversary, he surprised me with this
- the GPU from that night ๐ญ๐ญ๐ญ
"Love you too much to process" -- not quite sure if he's referring to me or the GPU ๐คทโโ๏ธ
Days ~15-17: Defense date is set!
**Wednesday 7 June, 2-4pm**
I now have to deliver the full draft to my committee members 2 weeks early, by next Wednesday. I've been sending my advisor draft chapters every few days. Final research content chapter today!
The ability to extract accurate translation dictionaries from monolingual embedding spaces depends critically on their geometric similarity--"degree of isomorphism." We address this root-cause of faulty X-lingual mapping with โจIsoVecโจ
#EMNLP2022
๐งต1/N
Day 3: Today I read that Ernest Hemingway allegedly said, โwrite drunk, edit sober.โ Turns out he *didn't* actually say this, which is a real shame because for a moment there I thought he'd cured my writer's block.
Anywho, I copied the JHU thesis template today: it exists! ๐ป
The ability to serve low-compute models is *critical* for wide global adoption.
Even widely-used W8 quantization leads to degradation detectable by humans for some languages, and W4 is even worse.
Consider multilinguality as a key evaluation criterion for efficient models!
If you're deciding which
#NeurIPS23
poster to check out tomorrow, don't forget our forgetting paper! Visit poster
#328
Thursday morning to dive into the world of active forgetting. Discover how it enhances language models with greater language plasticity. See you there!
@ahmetustun89
and I will be chatting about multilingual research at Cohere & C4AI today at
#NeurIPS2023
! Stop by and say โhelloโ! ๐ ๐๐๐
These are the types of questions & answers that get me excited about using ChatGPT -- The ones that are hard to ask traditional search engines, because punctuation/syntax really matters!
If you ever see me in person, please say hi. Please approach me at conferences and assume we are best friends. Yes, I want to get coffee or drinks or dinner and talk about your cool new project or hobby or family.
Legend. Many a late-night spent watching Professor Strang's lectures on 2x speed to understand my linear algebra homework. The impact this man has had on budding scientists/mathematicians is astounding!
Professor Strang gave his last Linear Algebra lecture today after 66 years at MIT.
Strang was among the first to upload his classes to MIT OpenCourseWare when it first came online in the early 2000s. His 18.06 lectures have been viewed millions of times around the world
Re: Hybrid format -- I know some have felt bogged down with the amount of time it takes to make a recording + (poster / in-person talk) + paper. But, I am "reading" *so* many more of your papers now! I hope if/when we go back to in-person-only, the 10min videos will stay ๐
Just watched this very clear talk from EMNLP 2021 on Underline. Might help explain our findings in "When Does Unsupervised Machine Translation Work?", particularly Table 5 on instability in BLI ()
Just received the cutest little โWork From Home Internโ Android from
@Google
for my remote internship. Thanks to Google Translate Research
@markuseful
@GrangierDavid
for hosting me this summer!
@fchollet
22, almost by accident, after a bachelors in psychology/sociology. Changed my life and has brought me more excitement, joy, and fulfillment than I ever could have imagined from a career
Excited to be in New Orleans next week! ๐
Very proud of the work we will be presenting, with many posters, talks and presentations ahead.
Come chat with the
@CohereForAI
@cohere
team. Happy to connect -- looking forward to catching up with friends old and new.
Day 1: Feeling energized after listening to Episode 151 of
@marvettelacy
โs podcast: โWriting a shitty paragraph takes 10 minutes, tops.โ Letโs gooooooo! ๐ช๐ฝ 2/N
Day 10: Phew! No one tells you (...ok fine, plenty of people told me) that interviewing full-time at the end of a PhD means squeezing in writing in any spare energised moment. 1 hour til liftoff - can I crack out a couple sections? โ๏ธ โ๏ธ
Day 4: Printed out my relevant publications, and I'm deciding which parts will be moved to overall intro/background sections vs. which will stay in-chapter with research findings. These two will def need merging, as "BLI for Low Res..." was a follow-on paper to "An Analysis..."
Might supervised and unsupervised MT be mutually-beneficial? In our
#NAACL2022
work, we ask whether the training methods result in systematically different output beyond what is visible via quality metrics like adequacy or BLEU. ๐งต1/4
Day 7: Unexpected ๐ from my past self: In many of my latex docs, Iโd commented-out alternate phrasings, paragraphs that I didnโt have space for, additional derivations, mathematical intuition, etc. Now with unlimited space, these are given new life!
Alright, hats off to GitHub Copilot ๐ฉ
I wrote only the comments and tiny post-edit to specify behavior of keep, but that's because my prompt was unclear.
(OK I know it didn't actually ๐จ๏ธ, but variable assignment is what I actually wanted so I could play with it myself.)
Day 11: Personally, I โค๏ธ the new required "Limitations" section for *ACL conferences. When written well, they clarify work and (counterintuitively?) make the authors' main claims stronger. Keeping them in my thesis!
Our recent work on bilingual lexicon induction with small parallel corpora is now available online, with code (published at MT Summit 2021)
Paper:
Code:
@jhuclsp
Day 5: Decided that "BLI for Low Res..." and "An Analysis..." definitely belong together under the broader category of *Graph Matching Methods for Bilingual Lexicon Induction*. This morning, I spent an hour combining their setups into a "Shared Experimental Setup" section.
@FromPhDtoLife
Take ~5 years between ugrad & PhD to work, make some money (invest!), have a blast in your early-mid 20s, re-evaluate whether PhD is truly the path for you. If it is, go for it full-force!
Day 13: Now that I can talk freely about it, the final chapter isโจMini-Model Adaptationโจ! I got feedback that I should "modernize" my thesis; What does multilinguality from embedding spaces look like in the age of LLMs? Here's a response!
#ACL2023NLP
Day 14: Time to re-commit to a writing habit! Interviewing is a full-time job, and each requires prep--so I've fallen off the writing ๐ recently. To defend in June, I'm re-committing to 1hr writing sessions, 3x/week. Achievable, measurable! ๐๐
All Aboard!! ๐คช๐คธ๐ชฉ
๐ฃAnnouncing our new cross-institutional collaboration.
We've brought together researchers invested in improving multilingual benchmarks. We're starting with MMLU, a heavily translated dataset used for multilingual evals that doesn't capture cultural nuances.
Let's address this