data management 🤝 human-computer interaction 🤝 machine learning PhD student
@Berkeley_EECS
@UCBEPIC
formerly ML engineer in many orgs and undergrad
@Stanford
Recently I realized that the biggest benefit of going to Stanford is not the high quality of education or the network of successful people. It is the entitlement we develop, which the industry mistakes for confidence, that allows us to aim high and actually achieve our goals.
I probably should have written this years ago, but here are some MLOps principles I think every ML platform (codebase, data management platform) should have: 1/n
Got my invite to the
@OpenAI
GPT-3 API from
@gdb
. I actually think it deserves more hype than it’s getting, but not necessarily for the magical reasons Twitter touts. Why? My quick thoughts and impressions: (1/11)
After many hours of retraining my brain to operate in this "priming" approach, I also now have a sick GPT-3 demo: English to LaTeX equations! I'm simultaneously impressed by its coherence and amused by its brittleness -- watch me test the fundamental theorem of calculus.
cc
@gdb
Visiting my family for the holidays, and my 17 y/o sister said that “everyone at school used chatGPT for their final essays” and asked me if I “have W riz”
once again I spent 30m pair programming with chatgpt to debug something only to find that the first google search result (stackoverflow link) immediately answered my question
thinking about how, in the last year, > 5 ML engineers have told me, unprompted, that they want to do less ML & more software engineering. not because it’s more lucrative to build ML platforms & devtools, but because models can be too unpredictable & make for a stressful job
Our understanding of MLOps is limited to a fragmented landscape of thought pieces, startup landing pages, & press releases. So we did interview study of ML engineers to understand common practices & challenges across organizations & applications:
my best move in 2020 was to quit reading business self help books and start reading fiction. i don’t care if silicon valley tech bros disagree. if anyone tells me again to (re)read “The Subtle Art of Not Giving a F*ck” i will just shove a copy of The Goldfinch in their face
I'm not sure if the machine learning engineer role is very well-defined. IMO, a good MLE does "full-stack" work -- owning ML end-to-end, from model development to integration in production pipelines.
I interview for both MLE and data science roles. Here's what I look for:
Many bits of good news:
- I am vaccinated
- I am not burned out anymore
- I am going to start my PhD at
@Berkeley_EECS
in the fall
I'm thankful for friends and family who have dealt with my stress over the last few months. Now things are looking up 🙂
In good software practices, you version code. Use Git. Track changes. Code in master is ground truth.
In ML, code alone isn't ground truth. I can run the same SQL query today and tomorrow and get different results. How do you replicate this good software practice for ML? (1/7)
I've always preferred latex-ing locally so I can have my text editor on one screen and the pdf on the other screen. However, research => collaboration => Overleaf 🙃
I finally got fed up today and wrote a Chrome extension to render the pdf in a new window & refresh on recompile
my college friend who also just started her PhD said "some people leave their jobs to go backpacking around the world for a few years. that's basically us except we're doing mental backpacking" and i can't stop thinking about how true that is
Been working on LLMs in production lately. Here is an initial thoughtdump on LLMOps trends I’ve observed, compared/contrasted with their MLOps counterparts (no, this thread was not written by chat gpt)
A few months ago, I started using Makefiles for my local Python ML projects. Ever since, I haven’t manually dealt with venv or pip installs.
It’s not life-changing, but I now can’t imagine starting a local ML project without a Makefile. Here’s a template:
Today is my last day at work. I am sad to leave but excited for some time off. This is a long personal essay about my experiences with predictive modeling. I am a bit nervous to publicly share, but I hope you find it worth your time.
in 5th grade i wrote a batch file with a command to log a user out. i renamed the file to “Internet Explorer,” saved it to all the desktops in my school’s library, changed its icon to the IE logo, and deleted the real IE shortcut from each desktop
Continuous Integration (CI) & testing for ML pipelines is hard and generally unsolved. I’ve been thinking about this for a while now — why it’s important, what it means, why current solutions are suboptimal, and what we can do about it. (1/10)
Beginning a thread on the ML engineer starter pack (please contribute):
- ”example spark config” stackoverflow post
- sklearn documentation
- hatred for Airflow DAGs
- awareness of k8s and containers but no idea how to actually use them
- “the illustrated transformer” blog post
Unit testing for ML pipelines is challenging given changing data, features, models, etc. Changing I/O make it hard to have fixed unit tests.
To hackily get around this, I liberally use assert statements in scheduled tasks. These have saved me so many times. Thread: (1/11)
It's been ~15 months since I switched fields (into DB) and started a PhD, so I did a bit of freewriting. I reflect on ML engineering and some uncomfortable learnings:
Honestly, the gap between academic and industry DS/ML feels larger than ever. This post is a good reality check -- most industry ML people mainly do exploratory data analysis & sklearn on flat files on local machines. OSS > proprietary tools. Jupyter & Tableau are dominant.
👩💻 What is a "data scientist" or "machine learning engineer", really?
📄:
Synthesizing responses from
@StackOverflow
, the
@PSF
Survey, the
@Kaggle
Survey, the
@AnacondaInc
survey, and more, I have taken a first stab at some common cohorts. Take a look!
I have been working on personal ML research projects every weekend for 3 months now. I feel like my consistency can be attributed to the following:
- TPU support in colab
- PyTorch is so easy to use
- PyTorch is so easy to use
- PyTorch is so easy to use
I’ve been frustrated for a while about the lack of diversity in engineering and data science roles at early-stage startups. So I’m starting a small mentorship circle for women and nonbinary people around this theme. Please send to anyone who might be interested! Details below:
hello Twitter, I present a fun intro to AI safety! these comics took longer than I thought, so I'm posting half the series today & the second half on Monday.
let me know what you think! or if you have any other ideas :-)
Anyways, as the nth peer in my undergrad cohort gets their non game-changing startup acquired, we could all use a reminder to believe in ourselves a bit more. Happy Monday 🙂
just skimmed the syllabus on the new coursera course on mlops...once again, i feel the need to reiterate that most production ML systems are NOT built around deep learning, AutoML, and/or NAS. and that’s okay; more power to you if simple models do the job
I just made the switch over to M1 / Apple Silicon. I'm currently running 3 docker containers, 2 React apps in dev, Safari & Chrome (>20 tabs), Spotify, RescueTime, Messages, VSCode, Slack, Fantastical...and the fan isn't making a sound. What a world we live in.
IMO the chatgpt discourse exposed just about how many people believe writing and communication is only about adhering to some sentence/paragraph structure
three months into reading papers in my new field and all i've got for you is a big gut feeling that ML workflows will all be done in the DBMS by the time i graduate. models are just extensions of the data
trying to use LLMs in prod is so frustrating bc i can't apply traditional ML tricks to make progress, like cleaning training data. i'm throwing darts here adding \n, changing don't to do not, capitalizing NOT, formatting like markdown...then i feel guilty like i'm a bad engineer
Once I wanted to learn about NLP so I wrote a Transformer in Tensorflow & for the life of me couldn't figure out why it wasn't working. Then I shared my code with an NLP PhD student, who switched the optimizer from SGD to Adam, and it worked.
Now I am a PhD student in databases
I'm excited (and nervous) to post this thread: I've always known I wanted a partner but didn't know what a supportive one looked like (esp. as an ambitious woman who wants kids someday)! Now that I know, I'm so grateful for all the ways in which
@PreetumNakkiran
supports me:
every morning i wake up with more and more conviction that applied machine learning is turning into enterprise saas. i’m not sure if this is what we want (1/9)
Almost all CS academic labs need full-time SWEs to build/maintain infra (e.g. clusters, persistent storage). But they're hard to hire for many reasons -- lower salary, no "ladder" to climb, etc. Can't believe I'm genuinely asking, but why isn't there a startup solving this?
TLDR, if this takes off:
1) Expect the next generation of good ML practitioners to be in way more creative. It’s taking me a while to wrap my head around how to prime this model to get cool demos, lol.
2) Startups will move away from training their own in-house models. (11/11)
Choosing between a PhD in machine learning and an industry role is an incredibly hard and personal decision. This essay, influenced by conversations I've had with ambitious new grads, has been in the works for a while.
this quote has been in the back of my mind for months, on why ML model developers might not always follow good software practices...excited for the full paper on our MLOps interview study to come out this week 🎉
Sometimes I am amazed by just the basics of deep learning. It’s a miracle that backpropagation + ReLU actually works on networks with many layers. You can specify an extremely underdetermined system of nonlinear equations and *gradient descent* your way to a solution. 1/3
I'm tired of hearing people blame data scientists for broken production ML pipelines because they don't have "good software engineering practices." We can have a more productive conversation. 1/7
I 100% recommend this
@karpathy
talk about multi-task learning at scale.
@ericjang11
covers some of the main points well.
But as the 1st ML engineer at a startup that sells an ML platform to automakers, I want to talk about other problems that many applied ML startups face:
This talk by
@karpathy
has convinced me that Tesla is several years ahead of most CV labs in regards to pushing the limits of DL. Commonplace questions like "how do you do early stopping for a multi-task model?" are non-trivial when at scale.
Machine learning is a tool to help build solutions, not the entire solution. Unfortunately, many of us seem to have forgotten the second part. Here I discuss the need to get rid of AI Saviorism and adopt an alternative framework to successfully apply ML.
Computing hardware is getting really freaking powerful! I think edge inference and ML will become more popular very quickly.
It's pretty exciting to think about what this means for industry ML development and some new cool problems we can work on: (1/5)
i wrote a small blog post on the plane last week, reflecting on my (unfortunately significant) experience with failed ML and AI projects, not necessarily due to technical reasons:
It’s a wacky time to be a PhD student. Feels next to impossible sometimes to stay on a long-term research direction that (1) has only me working on it, (2) won’t be obliterated by AI advances, and (3) the industry wouldn’t compete with me on
Today, I celebrate one year of working at
@viaduct_ai
! From writing ML research papers at
@GoogleAI
and
@StanfordAILab
to serving $-saving ML predictions at Viaduct, I reflect on differences in my work experiences and why ML is so hard to operationalize.
people who argue ML models shouldn’t need to be retrained either haven’t worked with time series data or ended a relationship & cursed at their phone keyboard’s predictive text suggestions
Yesterday, I found out I am approved to graduate. This is so emotional and important to me for many reasons. My Stanford experience has been extremely difficult yet rewarding. Thread:
Evals are arguably the hardest part of LLMOps. LLMs mess up, so we check them w/ other LLMs, but this feels icky. Who validates the validators??
We built an interface to align LLM-based evals with user preferences, learning a lot about why this is hard:
the biggest barrier to fine tuning LLMs is not cost or modeling or systems expertise anymore, but in collecting high-quality data. it’s hard to do for custom tasks. people try to use gpt 4 as a data generator, which seems ok at a glance but is full of random mistakes at scale
a recent observation i made in group therapy: if you plot time on the x axis and intensity of emotion on the y axis, many people — myself included — focus on the integral f(t <= now), but the most resilient people have figured out how to only care about f(t = now)
People in the DB community who are interested in vector search should definitely watch this talk. The bit on binary quantization for document vectors is super cool; while doc vecs are binary, query vecs are still floats & they shard the query vec & optimize dot products
How does Exa serve billion-scale vector search?
We combine binary quantization, Matryoshka embeddings, SIMD, and IVF into a novel system that can beat alternatives like HNSW.
@shreyas4_
gave a talk today at the
@aiDotEngineer
World's Fair explaining our approach! ⬇️
day in the life of building LLM applications: yesterday I changed 3 words in a prompt string in the codebase and the system behavior completely changed in a way that i could not have anticipated. but end-users liked it so I guess I won’t revert??
Over the last year, many people have told me that operationalizing machine learning isn’t a research problem. I disagree. In the final post of my ML monitoring series, I outline research challenges and solution ideas:
Jupyter notebooks are bad. Ad-hoc experimentation creates messy code and no versioning, making it hard to understand provenance for important results. Luckily with blockchain technology we can mint NFTs for each cell and
8 months of sub-60 degree open water swims, bike rides to Mill Valley, and runs through Golden Gate Park materialized in one piece of metal and many sunburns! Thankful for friends & family who came out to support 😍
We all know LLMs make mistakes. One simply cannot deploy LLM pipelines without assertions, yet writing good assertions is tedious & hard. So, we built SPADE, a system that analyzes prompts & auto-generates custom assertions in low-data settings:
We kind of bumble around for 4 years, think we're some hot shit, and actually believe we can do big crazy things. So many people in undergrad managed to raise venture $$ for dumb startup ideas. I used to think, if they can do it, I can too!
Over the four years, we grow to expect the industry to treat us for "what we're worth," and we subsequently disregard opportunities that don't meet our self-worth. We don't really "settle," even though we are not actually that much more intelligent or hardworking than others.
Honestly: sometimes I feel defeated because ML observability is so hard. All facets are hard -- detecting, diagnosing, reacting to bugs. We don't have realtime ground truth labels (except recsys) so we don't know asap when performance goes down. Lots of $$ left on the table (1/6)
Update: I'm now working on ML tooling!
When I did applied ML, it seemed like many tools I initially found interesting were divorced from the reality of data, ML, and systems. I don't want to follow that pattern, so I built an open toy ML pipeline: (1/7)
does anyone have LLM agents running in prod or at scale, automatically? forget about cost, how did you get the end-to-end latency low enough & the accuracy high enough?
What else is so hype? The API’s best model is 350 GB. Serving this monstrosity efficiently and cheaply is an entirely new software problem for the industry. If
@OpenAI
cracks this, they can become the AWS of modeling. (10/11)
So why is GPT-3 so hype? It’s amazingly powerful *if* you know how to prime the model well. It’s going to change the ML paradigm — instead of constructing giant train sets for models, we’ll be crafting a few examples for models to do “few-shot” extrapolation from. (5/11)
The ML research ecosystem can be amazing. A few hours ago, I wondered: do pruned neural networks converge to high accuracies faster than the original networks? I'm sure I can find an answer in one of many lottery ticket hypothesis papers, but I wanted to explore myself. (1/5)
IMO there's no substitute MLOps experience for building a pipeline that serves predictions at some endpoint (e.g., REST) and trying to sustain some performance over time. Some pointers & tutorials below:
.
@sh_reya
Hello Shreya, you may have already tweeted about this in the past, sorry if it's redundant.
Could you pls give some resources that you would recommend for getting into MLOps, ideally something that gets hands dirty not only conceptual stuff.
Open to anything ! thx.
Speaking to the undergrad CS experience: sure, the CS curriculum is top-notch, but Dijkstra's algo is the same everywhere. Most CS undergrads don't actually become close with their professors.
We need to start teaching this in class (data engineering, applied AI, etc). There should be some assignment that teaches people how to systematically build RAG apps—first assemble an evals set & metrics, then implement a baseline with BM25 retrieval + 1 llm call, then improve
has anyone created a taxonomy of the tasks people use LLMs for? not talking about prompting & training/fine-tuning strategies. just the tasks. eg question-answering, document summarization, writing code,…?
It was a lovely day to turn 23. I had many drinks and cakes. I learned how to hold 4 avocados + 1 lemon in my tiny hand. My house threw a pizza party, and we played reverse hide and seek.
I am so grateful for friends and family — thank you for going out of your way for me. 🥰
why do ML papers and tools still think that a single clean, labeled training dataset is the solution to real-world ML problems? holding a training dataset constant is absurd when you’re regularly releasing models. data comes in streams, not tables!
Beginner: always train models using *committed* code, even in development. This allows you to attach a git hash to every model. Don’t make ad hoc changes in Jupyter & train a model. Someday someone will want to know what code generated that model… 3/n
I have been feeling tired lately when thinking about the differences between MLOps and DevOps. There are so many “gotchas” to keep track of in production ML systems, but I don't think ML systems are as different from traditional software systems as many people say. (1/13)
What many people do not know is how broken the startup ML / DS hiring pipeline is. It's really easy to laugh at this egregious mistake. For some additional context, here are some themes in ML job descriptions (JDs) I've seen:
I like to think of these language models as “children with infinite memory.” Children’s skills are not all that refined, but they have basic pattern-matching skills. Coupled with a superpower to memorize the entire world, well, couldn’t they be extremely useful? (9/11)
I've read a few blog posts & articles now that imply that MLOps success = maximizing the % of ML models that make it to production. Why is this the north star? IMO the goal is to maximize the % of data science projects that yield business value. Small nit but big difference
1+ year after initial release i just now found a bug in my feature generation code where i added epsilon to a denominator and epsilon was 1e7 instead of 1e-7. yay silent errors 🙃