The story of Nigel Richards, the man from New Zealand who memorized every French word in the French scrabble dictionary and won the French Scrabble Championship without speaking any French
If you’ve ever wanted to take a grubby Python project and turn it into something that looks more like a well run open-source project (👋 ML researchers), here’s a guide I wrote on how to do it.
I was frustrated after Googling for hours, so hope it helps!
Tech report coming soon!
SSMs are an amazing fit for audio, perplexity numbers with our new architecture blow Transformer baselines out of the water, look at this giant gap on training loss
🚀Excited to release Robustness Gym, a new Python evaluation toolkit for evaluating the robustness of NLP models, as part of a collaboration between Stanford, Salesforce Research and UNC Chapel-Hill.
Paper:
Code:
pip install!
(Thread) I finally got GPT-3 access last week (shout out to
@gdb
), and took a stab at an experiment that I've been curious about for a while.
TLDR: training a model on a dataset entirely generated by GPT-3.
You can read my blog at .
Incredibly excited to be releasing our first model,
@cartesia_ai
Sonic today.
Sonic is a voice model based on a new state space model architecture we've developed that's blazing fast, efficient and high quality.
It's the first of many models we're building to bring cheap
Today, we’re excited to release the first step in our mission to build real time multimodal intelligence for every device: Sonic, a blazing fast (🚀 135ms model latency), lifelike generative voice model and API.
Read and try Sonic
We built an interactive data frame powered by foundation models that can wrangle your unstructured data (images, videos, text docs...)
Introducing 🔮 Meerkat!
📃
💻
🌐
There’s a weird dichotomy where all the AI researchers I interact with think there’s a lot left to do on designing new architectures that improve over Transformers — but everyone else seems to be entirely unaware that this is even a possibility left to consider
we're hiring research engineers / modelers to accelerate model development
ship multimodal models in a fast-paced team that's moulding the future of real-time architectures
@_albertgu
will give you your company sponsored pet yoshi himself
2.5 months ago
@elevenlabsio
put up this comparison with our 10 day old Sonic model:
The team took it as a challenge, here's our new scorecard.
Higher quality, cheaper & the fastest voice model period.
Next 3 months will be fun.
Preprint alert!
"Model Patching: Closing the Subgroup Performance Gap with Data Augmentation" is now on arXiv!
📑Paper:
🧑💻Code:
📹Video:
✍️Blog:
Read on to learn more (1/9)
Writing a rebuttal for NeurIPS,
What I want to say 😏
“Your review is $%*€¥. Try again. 2/10.”
What I actually say 😒
“Thanks for the helpful feedback. Your wisdom and insight are truly wondrous and move my soul. I was touched that you think we don’t have enough baselines...”
Excited to release a new resource for Data Centric AI:
...with a great post by
@HazyResearch
about our lab's journey in this:
This is already a community effort with 20+ folks who have contributed discussion. Please send us PRs!
Excited to release Meerkat, a new data library for interactive machine learning! We've (
@jundesai
,
@EyubogluSabri
,
@HazyResearch
) been building this up over the last couple of months.
Read our blog post to learn more:
Awesome to see that our MLSys seminar series now has 3k subs on YouTube (and counting):
I’m constantly amazed by how many folks I interact with have watched, thanks for tuning in! (and subscribe)
@realDanFu
@w4nderlus7
@matei_zaharia
@HazyResearch
we'll be shipping another
@cartesia_ai
model next week
start working with us asap if you want to get early access to all the cool stuff that's coming, this team is 🚢 new models every 2-3 weeks
A while back, I wrote a Python library for handling YAML-based configuration in my ML projects.
I've been installing (`pip install quinine`) it for my own projects for a while, now you can use it too
README:
Really chuffed to see that we've crossed 5000 subs on our MLSys Seminar YouTube after 34 weeks of streaming ().
A big thanks to all our speakers and viewers, and the cast (
@realDanFu
,
@w4nderlus7
,
@HazyResearch
,
@matei_zaharia
, Fiodar)!
Want to use state space models (S4 -- ) and don't know where to start?
We just put up an example script () on how to build a simple S4 model backbone that crosses the previous SOTA on sequential CIFAR (81%) in 30 minutes on a V100!
Quadratic attention has been indispensable for information-dense modalities such as language... until now.
Announcing Mamba: a new SSM arch. that has linear-time scaling, ultra long context, and most importantly--outperforms Transformers everywhere we've tried.
With
@tri_dao
1/
one of the best pieces of advice i ever got (in the context of going to ai conferences) was to spend time with your peers rather than chasing after senior or famous researchers
you have more fun, grow together and who knows, maybe some of your peers will be famous one day
our 3 part on-device release today with edge, rene and sonic on-device is now out
edge is our new open-source library for on-device SSM deployments with new kernels & models
this starts our journey to build truly efficient human-like AI that's detethered from the data center
Today, we’re unveiling a significant milestone in our journey toward ubiquitous artificial intelligence: AI On-Device.
Our team pioneered a radically more efficient architecture for AI with state space models (SSMs). Now, we’ve optimized and deployed them at the edge. We believe
excited to finally release Mamba-2!! 8x larger states, 50% faster training, and even more S's 🐍🐍
Mamba-2 aims to advance the theory of sequence models, developing a framework of connections between SSMs and (linear) attention that we call state space duality (SSD)
w/
@tri_dao
Indian society is cursed. The trope of the “qualified woman” whose sole purpose is marriage is frankly infuriating.
These idiotic “traditions” permeate even the most liberal parts of India. If you’re Indian, your family probably has people who clutch onto these ideals.
We built a data exploration dashboard that we shipped with
@togethercompute
's new Red Pajama LLM data release!
We embedded the entire Github subset of Red Pajama (releasing indexes + embeddings soon!).
Built in 100 lines of Python with
@MeerkatML
🚀
the amazing thing about building a business in 🇺🇸
your multilingual models are evaluated by native speakers that sit right next to you (we’ve got 10 languages covered)
A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset:
* 7% attention, the rest is Mamba2
* MMLU jumps from 50 to 53.6%
* Training efficiency is the same
* Inference cost is much less
we're now organizing some incredible efforts to push forward innovation on model architectures (we'll announce more on this soon)
how do we compress a decade of model architecture progress into a year?
if you're excited about this and doing your PhD right now, this is for you
We're now hiring PhD research interns for spring/summer 2025 to work on architecture research and model training at Cartesia.
You'll be part of a small team led by
@_albertgu
that's pushing the frontier of architecture research in AI.
Apply here
demo of our new localization feature, take one voice and localize to any language, accent or dialect
technically shipping next week, couldn't resist showing it off since it's already live on our playground
on our way to models that can control every aspect of voice perfectly
🚀 ChatGPT / GPT-4 for querying and asking questions on codebases
Point to any GitHub repo, and get an index that is used to answer questions. Use --prompt-only mode if you can only access GPT-4 via ChatGPT to copy-paste.
Built with
@MeerkatML
!
this is why I built Sonic; no wisdom teeth and I can still talk on my way from the hospital
doctors are amazing, but they could’ve thrown in a haircut for sure
very proud of the work the team’s done so far in building cheap fast and high quality voice, we’ve gotten to openai quality in 3 months
the next few updates that are coming are going to blow people’s minds
Users come to us all the time with questions around how to evaluate the best voice generation APIs. To help, we put together a systematic comparison on the important features to look at when comparing Cartesia Sonic to ElevenLabs (link below)
Another great resource is the
We're bringing you the 2nd episode of the Stanford MLSys Seminar tomorrow.
@matei_zaharia
will talk about lessons from
@databricks
in building and deploying
@MLflow
.
Tune in at 3pm PT Th at (and join our mailing list at )!
we worked with early partners to create telephony optimized versions of Sonic, and it's paying off
we realized how voice sounds on a phone is very different from how voice sounds in an audiobook, and the optimizations we do make a huge difference to our telephony customers
We're delivering high quality conversations with Sonic at the lowest latencies on voice ever seen with
@Vapi_AI
to their customers.
Amazing to partner with them!
A new tool in the Robustness Gym universe!
This work is prompted by a basic lesson we’re learning in the RG project: quantitative metrics are fuzzy measures of performance, and need to be supported by interactive tools that support deeper inspection. Both are important!
great summary of the talk I gave today at
@aiDotEngineer
by
@kozerafilip
building intelligent machines in the image of humans is a long and hard road, new ideas are going to get us there in the long run
Great talk from
@krandiash
from
@cartesia_ai
at
@aiDotEngineer
on how State Space Models can enable real-time multimodal intelligence.
Let's dive in:
1. Real time on device intelligence will enable a multitude of different agents, doing things for you in the background, see
as promised, we shipped 2 models and 3 new features today
emotion control by API, timestamps on generations and no length limit anymore
new models sound really good and we'll keep updating them, SSMs work!
also working on wacky stuff with SSMs that I'm excited about 👾
Huge Sonic release!
🇺🇸 New English model reduces breathiness & artifacts.
🌎 New multilingual model improves pacing, loudness & word error rate by upto 50%.
💨😡😳😁 Voice control API to precisely control speed and emotion.
⏰ Word timestamps on gen audio, use for captioning
New blogpost on
@StanfordCRFM
:
What will it take to put models like GPT-X into software and not have to worry about insane behavior and bugs?
We discuss making foundation models a reliable software abstraction: new programming tools are going to be key!
We're topping another third-party evals leaderboard with our Sonic model.
Sonic is high quality, ultra fast and cheap for speech generation, and we're seeing amazing adoption along pretty much every use case and sector imaginable.
And 3 new releases are coming.
Speech generation is a fascinating domain as it needs to be heard and felt in order to evaluate the true difference. We’re seeing a large variance in quality among the model providers. Generative AI companies like
@cartesia_ai
and
@elevenlabsio
put up impressive performance
intern applications are open at Cartesia
our interns get to work on projects that are actually important to us rather than on side quests
apply at the link below!
We're recruiting machine learning research interns for fall 2024, apply below by August 24th.
Join us to build and ship cutting-edge multimodal models, and have fun along the way!
More updates at Sonic!
📈Enhanced voice cloning to preserve speaker accents and tones even more
🗣️ Improved default voices on playground for loudness and clarity
🌎 New multilingual model reduces word error rate and improves prosody significantly
📞 Improved clarity and
Someone pointed me to this fragment from Jensen's Wired article -- amazing to see the support around SSMs (and really cool that he's so technically plugged in)
We've shipped continuations 🐍, our most requested API feature.
Sonic can now stream in text (e.g. LLM generations), and generate audio smoothly across chunks using the power of Sonic's state.
This unlocks long transcript use cases, and real-time conversational voice agents!
multilingual support is a big feature request for Sonic, we 🏃 very fast and shipped the first version in a few weeks, expect new updates here
we're cooking some insanely cool models that are further out that will be a step change in speed, quality and capability
Release day with 2 new models
🇺🇸 Sonic English
Improved pacing, voice cloning and pronunciation, same 135ms latency
🌎 Sonic Multilingual
6 new languages (German, French, Spanish, Portuguese, Chinese, Japanese) with a new multilingual voice library
And 🩺 HIPAA compliance
Wow, this went randomly viral and seems to have struck a chord.
In the spirit of self-promotion: check out our work on making ML models more robust.
Video:
Arxiv preprint:
Very excited about the future of AI!
Come by our Model Patching poster at
@iclr_conf
today!
We describe how data augmentation with a domain-translation model and combined with robust training can improve worst-case performance.
Talk/Poster Link:
Time: Today (Monday 5/3) 5-7pm PT [Spot C1]
A very short blog post on 3 directions for data tools I’m personally excited about in the era of GPT-4.
We’re working on these in
@MeerkatML
(stay tuned for something cool coming soon!)
We've crowdsourced a ton of contributions to so far!
You can now get a broad overview of Data Centric AI there -- we've got discussion on weak supervision, self supervision, robustness, data augmentation, privacy, data selection, and more.
Check it out!!
I miss the pre-mid-2022 days when my Twitter was a daily digest of ML research preprints
Now you can’t go 3 tweets without somebody trying to teach you a new incantation to yell into the magic box — it’s the AI equivalent of drugs and vegetables
Excited to see the RedPajama dataset released: check out the
@MeerkatML
data exploration dashboard we put together in a collaboration with
@togethercompute
as part of this release 🚀
We’ll continue to update and add to that in the RedPajama repo!
Announcing RedPajama — a project to create leading, fully open-source large language models, beginning with the release of a 1.2 trillion token dataset that follows the LLaMA recipe, available today!
More in 🧵 …
I’ve been in the AI trenches since 2009, and LLMs are certainly a game-changer. But they also seem to be a warm-up act for the main event—the next cycle of AI innovation, coming in the next 12-18 months.
Here are 3 areas we’re looking at to fuel this cycle, where founders can
@AstleDsa
SSMs generally crush on data derived from continuous signals -- we've observed this consistently across many applications and modalities (audio, video, EEG, EKG, other time series).
Lots more to learn and improve here
really fun to hang out with
@saranormous
and
@eladgil
and shoot this episode of the
@NoPriorsPod
we cover a lot of ground across research, engineering and the future of ai systems
and I preview some of our on device work with a demo
🔥 new
@NoPriorsPod
:
@krandiash
@_albertgu
from
@cartesia_ai
:
*state space models (SSMs)
*their advantages, disadvantages
*alternative architectures to transformers
*making AI real-time (demo!)
*sonic, audio applications
*multimodality and research directions
🔗 in comment
it’s been amazing to work with
@jordan_dearsley
&
@nikhilro_
from day one at
@cartesia_ai
they’re amazing founders – experts in voice, move at breakneck speed and always put their customers first
we’re working closely with
@Vapi_AI
to deliver exceptional voice agents to users
@Vapi_AI
switched to Cartesia as their default voice provider over 8 options after customers saw a 4x increase in call retention.
Read how we built the best API for realtime conversation with this leading voice platform
Try it at
With all the demos flowing for GPT-3, I thought it would be fun to speculate about what this means for the future of user interfaces.
I haven't blogged before, but I decided to take the plunge (it's short).
GPT-3 & The Natural Language Programmer
ChatGPT is pretty cool. Braindump:
It might make mistakes in reasoning and knowledge retrieval, this is not worth overindexing on in my opinion. This is certain to improve quickly, although it’s good to know what’s not working quite as well yet
Delighted to announce that our paper (with
@AIforHI
) on “Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure” has been accepted to ICLR 2019!
We’ve got new work out, appearing at NeurIPS this year!
We extended S4 beyond sequences to handle images and videos.
Our new S4ND layer is a drop in for ConvND in any architecture!
Amidst a barrage of great work released at frenetic pace, it's easy to feel like there's nothing left for you to do (esp. in academia).
I rarely worry about this now -- a trick I use is to imagine myself 3 years ago and then think about all the cool shit that has happened since.
can't wait until we have always-on assistants that live on-device, understand language, audio and video and have multi-year memories -- basically a human
Preprint alert!
"Model Patching: Closing the Subgroup Performance Gap with Data Augmentation" is now on arXiv!
📑Paper:
🧑💻Code:
📹Video:
✍️Blog:
Read on to learn more (1/9)
💡Research Spotlight:
@avivbick
Aviv, one of our summer interns, co-authored MOHAWK, a new way to distill quadratic Transformers into subquadratic architectures like SSMs. We’re proud of him for the groundbreaking research he conducted with Prof.
@_albertgu
our Chief Scientist
we've opened up the private beta for Sonic on-device
Sonic is the fastest cloud voice gen model (), and we've squeezed all of these capabilities into a model you can run locally
fill out this form if you want to get early access
📲Sonic On-Device: The Sonic model you know and love, running on device in private beta. It’s the first ultra-realistic voice model of its kind to support real-time streaming on devices.
An Homage To Metal Gear Solid
a playable voice AI puzzle game
<overheard in slack>
me: I wrote some sample code to show how you switch out LLM context on the fly and why you might want to.
@JonPTaylor
: hold my beer ...
</>
Tech stack:
- input speech processing
@DeepgramAI
my anecdotal experience is that Chinese researchers have insane velocity in adopting and iterating on new research, there are 25 follow ups to published work from the US in 3 months
the US needs to level up fast, the EU is the EU
So what proves that China has now become the world's foremost scientific power?
Firstly, China has now overtaken both the US and entire EU in number of high-impact scientific papers produced each year, including in the Nature Index which is virtually impossible to game.
Committed to UC Berkeley over Duke. Hardest decision of my life thus far. Here’s to hoping I get out of this alive (and with all my limbs intact). Go Bears! 🐻
Check out Mistral, our code base for training large LMs.
We’ve also released multiple random seeds, 600+ checkpoints per run for GPT-2 Small and Medium
We're excited to open-source Mistral 🚀 - a codebase for accessible large-scale LM training, built as part of Stanford's CRFM ().
We're releasing 10 GPT-2 Small & Medium models with different seeds & 600+ checkpoints per run!
[1/4]
Super excited to get this grant with
@HazyResearch
and Sharon Li on new directions for robust machine learning systems. Shout out to
@StephanZheng
and
@nazneenrajani
for their support!
We're thrilled to announce this year's
@SFResearch
Deep Learning Grant winners
@ChenhaoTan
@gregd_nlp
@pulkitology
Christopher Ré and Hung-yi Lee! 🎉👏 We're excited to work together to advance the state of AI. Read more about the winning proposals:
this is why I built Sonic; no wisdom teeth and I can still talk on my way from the hospital
doctors are amazing, but they could’ve thrown in a haircut for sure
Excited to see this report on foundation models go out today, where I co-authored the data section led by
@laurel_orr1
:
Huge credit to
@percyliang
and
@RishiBommasani
for orchestrating this and making sure each section hit a pretty stringent quality bar.
My (17 yr old) brother just released his first product! It’s a Chrome extension that improves your exposure to news stories from other points of view. The design is great and it runs the latest and greatest NLP models for news recs.
Download and review!
Unslant, my browser extension to surface ideologically contrasting takes on the political news you’re reading, is live on
@ProductHunt
! First product release ever, can’t wait to see where this goes 😃
At 10:35 on Wed (Dec 5)
@krandiash
, Tong Mu,
@turingmusician
and Emma Brunskill will have a demo on “Automatic Curriculum Generation Applied to Teaching Novices a Short Bach Piano Segment” in Room 510 ABCD # D10
Playing with
@cartesia_ai
and I’m super impressed!
The voices feel natural and more human-like compared to anything I’ve seen before.
Check out the demo! There are two interesting moments around 1:06 and at the end – not sure what happened there 🤪
Still measuring things but