Caleb Profile
Caleb

@calebfahlgren

1,121
Followers
772
Following
282
Media
1,633
Statuses

Product + Data @huggingface 🤗

Joined January 2018
Don't wanna be here? Send us removal request.
@calebfahlgren
Caleb
14 days
Qwen-2.5 on WebGPU 🏎️ • 42 tok/sec for Qwen2.5-Coder-1.5B on Mac ⚡ • Powered by MLC WebLLM and WebGPU 🔥 Watch Qwen2.5-Coder-1.5B build a website entirely in the browser!
13
54
393
@calebfahlgren
Caleb
5 months
Released a free tool on ChatDB: Parquet AI Query parquet files with natural language in the browser. ◆ Powered by @duckdb in the browser ◆ LLM from @GroqInc and Llama-3-70B Here's me querying the capybara-dpo dataset from @huggingface
10
32
232
@calebfahlgren
Caleb
8 months
Natural-Functions, a 7B function calling model, is out on @ollama !
13
35
203
@calebfahlgren
Caleb
8 months
Excited to release Natural-SQL-7B! A new, very strong Text to SQL model that is my best fine tune yet. You can see how it does on the SQL-Eval benchmark by @defogdata
Tweet media one
9
24
196
@calebfahlgren
Caleb
23 days
NEW SQL Console on @huggingface Datasets Viewer🤗 🔸Run SQL on any public dataset 🔸Powered by @duckdb WASM running entirely in the browser 🔸 Share your SQL Queries via URL with others! More coming soon!
13
56
194
@calebfahlgren
Caleb
3 months
Run SQL on any @huggingface Dataset in the Browser • Powered by @duckdb WASM 🦆 • Open Source Chrome Extension • Query different splits and configs
6
25
168
@calebfahlgren
Caleb
1 month
In 2024, it takes 5 min to write some SQL and Markdown and deploy a beautiful dashboard. Here's a dashboard of @huggingface hub stats powered by @evidence_dev and @duckdb
4
9
152
@calebfahlgren
Caleb
3 months
Who said Claude doesn’t have code interpreter :)
Tweet media one
8
14
149
@calebfahlgren
Caleb
6 months
I feel like the new CodeQwen-1.5 isn't being talked about enough for coding. Punching high above its weight for a 7B. Outperforming DeepSeek is impressive in itself.
Tweet media one
5
14
89
@calebfahlgren
Caleb
2 months
Excited to share I've joined @huggingface 🤗. I'm going to be building on the hub, making easier to work with datasets!
17
3
79
@calebfahlgren
Caleb
1 month
@ShaanVP There’s a whole book on this: The Defining Decade: Why Your Twenties Matter
4
5
78
@calebfahlgren
Caleb
29 days
Not many people know every @Gradio app that is a @huggingface space exposes an API. Here's all the code it takes for me send requests to the Llama guard space on ZeroGPU.
Tweet media one
3
16
71
@calebfahlgren
Caleb
3 months
The HF Data Explorer is on the Chrome Web Store 🥳 Drop in SQL on top of any @huggingface dataset powered by @duckdb wasm running entirely in the browser. Here's some fun things you can do with it 🔥
3
21
69
@calebfahlgren
Caleb
9 days
DuckDB Snippet of the Week 🦆📊 One of my favorite functions that is now part of @duckdb 1.1.0. You can plot beautiful histograms of different values with a single function in the SQL Console!
Tweet media one
2
5
70
@calebfahlgren
Caleb
3 months
Okay this is pretty fun. Created a @huggingface space for @fofrAI emoji LoRA since I couldn't find one. Here are some fun ones emojis :)
Tweet media one
@julien_c
Julien Chaumond
4 months
This is one of my favorite LoRAs ever. The seminal sdxl-emoji LoRA from @fofrAI is now also available on @huggingface
Tweet media one
2
2
53
1
9
51
@calebfahlgren
Caleb
2 months
add this to your .zshrc for a good time
Tweet media one
1
6
49
@calebfahlgren
Caleb
2 months
Bullish on @modal_labs - Great Docs + Examples - Healthy Free Plan (30$ free compute / month) - Never have to worry about infra / just Python
1
4
47
@calebfahlgren
Caleb
8 days
1M Models on @huggingface Hub 📈 Models are going exponential month over month and September isn't even over yet 🤯
Tweet media one
5
18
87
@calebfahlgren
Caleb
3 months
Find movie suggestions from the amazing new reddit dataset from @0xdrej 🎥 or use it to find high quality responses in a niche
Tweet media one
5
7
40
@calebfahlgren
Caleb
3 months
Wrote a blog post on how you can use the Datasets Explorer to find really interesting insights on @huggingface datasets 🔥 There's even a couple examples of the @duckdb spatial extension with some geospatial queries 🌎
0
9
36
@calebfahlgren
Caleb
2 months
Calendar heatmap of Open Source model releases for the big AI labs on @huggingface @Meta @MistralAI @Google all ship 🔥
Tweet media one
4
6
34
@calebfahlgren
Caleb
6 months
I have actually been getting noticeable traffic to @chatdb from the free parquet tools I made. Adding another free tool for reading parquet files in the browser Testing it with some @huggingface datasets. It's all powered by @duckdb wasm 🔥
Tweet media one
2
2
34
@calebfahlgren
Caleb
4 months
WOAH, Claude artifacts ships with - @shadcn ui - tailwindcss - recharts it's like v0 built in 🤯
Tweet media one
2
4
33
@calebfahlgren
Caleb
2 months
The @huggingface hub has been on 🔥lately. Models created on the hub each month is a stock you want to buy 📈
Tweet media one
2
8
33
@calebfahlgren
Caleb
1 year
@gdb
Greg Brockman
1 year
the best refactor is to delete the code
97
205
2K
0
1
28
@calebfahlgren
Caleb
1 month
getting started with @duckdb and @huggingface in <11s for @merm_bot the duckdb -c might be unfair though 😁
@archieemwood
Archie
1 month
getting started with the @duckdb CLI in <20s for @merm_bot
1
3
60
4
3
28
@calebfahlgren
Caleb
5 months
@yacineMTB If Llama-3 400B lands on Groq it will be insane
3
2
28
@calebfahlgren
Caleb
8 months
@steventey @dubdotco @chatdb , create charts and get answers with just natural language. Some new things coming soon as well!
Tweet media one
2
0
27
@calebfahlgren
Caleb
2 months
Is there anyone on @huggingface who has a more cracked heatmap than @bartowski1182 🤯
Tweet media one
3
2
24
@calebfahlgren
Caleb
3 months
Just uses pyodide from cdn js
@alexalbert__
Alex Albert
3 months
Artifacts pro tip: If you are running into unsupported library errors with NPM modules, just ask Claude to use the cdnjs link instead and it should work just fine.
Tweet media one
Tweet media two
44
87
837
0
3
23
@calebfahlgren
Caleb
2 months
New @supabase service can embed and do vector search all in the browser. @ElectricSQL for Postgres in WASM @xenovacom Transformers.js 🔥
Tweet media one
@kiwicopple
Paul Copplestone — e/postgres
2 months
@calebfahlgren @ElectricSQL it also uses Transformers.js by @xenovacom to create embeddings in the browser :)
1
0
3
2
1
23
@calebfahlgren
Caleb
9 months
@jmorgan Just compiled it for WebGPU, runs in the browser at 40+ ish tok/s. Is pretty awesome!
3
0
20
@calebfahlgren
Caleb
2 months
You can now find your @huggingface heatmap 10x easier! 1-Click Embed your heatmap anywhere!
2
3
21
@calebfahlgren
Caleb
2 months
Will have a fun dashboard to share here soon with some interesting stats from the Hub: @Gradio dominates 🔥. Custom Docker deployments and Streamlit sharing the rest of the pie.
Tweet media one
@calebfahlgren
Caleb
2 months
What interesting stats would you like to know about the @huggingface hub? Can be anything for: - spaces - models - datasets For example: ratio of different model licenses or most popular space sdks
3
0
2
1
4
21
@calebfahlgren
Caleb
8 months
@jxnlco I love when people say “AI trained on x” when it’s just RAG
1
0
19
@calebfahlgren
Caleb
4 months
@skirano @shadcn It’s amazing
@calebfahlgren
Caleb
4 months
WOAH, Claude artifacts ships with - @shadcn ui - tailwindcss - recharts it's like v0 built in 🤯
Tweet media one
2
4
33
1
2
20
@calebfahlgren
Caleb
2 months
Smol Instruct v0.2 is out! 🔥 You can run them in the browser with WebGPU with MLC WebLLM and Transformers.js bringing blazing fast intelligence to edge devices. The new v0.2 model was trained with synthetic data from Llama3.1 70B and a few other datasets like OpenHermes-2.5.
3
5
19
@calebfahlgren
Caleb
2 months
Base Models can be more fun than instruct models sometimes 😁 Took like 30 minutes to make and is a blast to use! It uses the SmolLM-360M base model by @huggingface for suggestions
@calebfahlgren
Caleb
2 months
have something fun I wanna try 😏
Tweet media one
2
0
1
3
5
18
@calebfahlgren
Caleb
8 months
Just add the function definition to the system prompt and natural-functions will call it when needed:
4
0
17
@calebfahlgren
Caleb
5 months
@rauchg @shadcn AGI is when v0 is generating framer motion animations
3
0
17
@calebfahlgren
Caleb
7 months
@yacineMTB I think there are some other key factors too: - Relative early age of the internet (Google etc in the 2000s) - Engineers move into Product / Management quickly
0
0
16
@calebfahlgren
Caleb
2 months
Finally seeing some fruits of the 405B beast and the newer license with magpie-ultra - 50k unfiltered rows - Instructions for Planning, reasoning, coding, math, planning etc 🤯 Excited to see some high-quality synthetic data! Thanks @argilla_io
Tweet media one
1
1
16
@calebfahlgren
Caleb
8 months
This is so cool! In less than 30 minutes, I had OpenHermes-16k labeling and summarizing my gmail
Tweet media one
@jxnlco
jason liu
8 months
Fully local Instructor with speculative decoding, Constrained Sampling, and in-process so theres no network dependency thanks to llama-cpp-python Go follow the maintainer @abetlen he's been cooking
12
27
183
3
2
16
@calebfahlgren
Caleb
25 days
I have a playground for the @huggingface datasets viewer server I am working on. It's the easiest way to build applications around datasets.
3
4
15
@calebfahlgren
Caleb
3 months
Just added the @huggingface embed to my EV Charge Finder app. It's pretty neat to view a dataset inside an app.
3
4
15
@calebfahlgren
Caleb
1 month
Running Phi-3.5-Mini in the browser at 67 tokens per second just feels right 🤗 🔸 Powered by MLC WebLLM + WebGPU 🔸 Fully Private / Running on Device
2
3
15
@calebfahlgren
Caleb
8 days
Run Llama 3.2 in the browser on WebGPU 🔥 • Llama 3.2 (1B + 3B)🤏 🦙 • Running 100% locally in the browser at 62 tok/sec 🏎️ • Powered by MLC WebLLM + WebGPU ⚡
6
9
59
@calebfahlgren
Caleb
2 months
The Datasets Explorer now supports the latest version of @duckdb You can summarize splits and how much duplicate data exists with the summarize command. Data leakage and duplicate data can be pretty common among datasets. Here's a really good blog post on it by @BdsLoick
Tweet media one
@calebfahlgren
Caleb
2 months
Two important things to look at for datasets: • Leakage from data in train and test sets etc. • Duplicate Data You can check these very easily with a simple SQL query
Tweet media one
1
0
3
1
5
15
@calebfahlgren
Caleb
3 months
This is so sick. Working with a CSV file within Artifacts on my phone in a Target parking lot 😂 cc: @alexalbert__
Tweet media one
@calebfahlgren
Caleb
3 months
Who said Claude doesn’t have code interpreter :)
Tweet media one
8
14
149
1
0
14
@calebfahlgren
Caleb
17 days
We recently released Founder Mode for Datasets as a feature on @huggingface . You can read more about it here.
1
2
14
@calebfahlgren
Caleb
4 months
@ImSh4yy Dang who’s making supabase, but with SQLite instead of Postgres
2
0
14
@calebfahlgren
Caleb
7 months
Using the beautiful instructor library and @NousResearch Hermes DPO for near realtime email categorization
Tweet media one
3
0
13
@calebfahlgren
Caleb
8 months
Just tried @SourcegraphCody . super cool! Once they support more ollama models outside of CodeLlama-7B it will be 🔥
2
2
12
@calebfahlgren
Caleb
8 months
I fine tuned on ~20k Text to SQL pairs. I created my own dataset and really focused on: - Tough Questions (complex, multi part) - Multiple Tables per schema (most datasets have 1-3) - Much more columns per table (most datasets have 3-7) This helped NaturalSQL be able to excel
@calebfahlgren
Caleb
8 months
Excited to release Natural-SQL-7B! A new, very strong Text to SQL model that is my best fine tune yet. You can see how it does on the SQL-Eval benchmark by @defogdata
Tweet media one
9
24
196
4
1
11
@calebfahlgren
Caleb
8 months
All the GGUFs for the people🤝
1
1
11
@calebfahlgren
Caleb
2 months
Wrote a blog post about remote Parquet files. Parquet files make up a lot of the datasets on the @huggingface hub. 🔸 HTTP Range Requests 🔸 Parquet Structure, Schema, and Metadata 🔸 What makes querying parquet files remotely so efficient
1
1
11
@calebfahlgren
Caleb
5 months
Okay transformer.js is sick!
Tweet media one
1
0
10
@calebfahlgren
Caleb
8 months
@derekcheungsa @ollama Sure thing, I added a demo notebook with a quick parser and tool I wrote. TIL that Langchain has OllamaFunctions as well
1
2
9
@calebfahlgren
Caleb
28 days
@logan_liffick We need more products that have a personality, love it.
1
1
9
@calebfahlgren
Caleb
8 months
Thanks @ivanfioravanti for pointing out how easy it is to push to Ollama🤝
@ivanfioravanti
ifioravanti
8 months
@calebfahlgren @ollama @FernandoNetoAi @erhartford You can publish nearly any models from HF using instructions here:
1
1
13
2
0
10
@calebfahlgren
Caleb
2 months
@osanseviero @huggingface Some people think about the Roman Empire. I think about @TheBlokeAI
0
0
10
@calebfahlgren
Caleb
6 months
@abacaj Yeah it’s all “OAI is screwed” until they drop GPT-5 and we realize everyone are 2-3 years behind haha
2
0
10
@calebfahlgren
Caleb
17 days
Not me testing DuckDB WASM on every possible mobile device in Walmart 🙈
1
0
10
@calebfahlgren
Caleb
3 months
@dharmesh I kinda like the UX of just forwarding the email instead of an app you have to upload etc etc. Give the user value and just get out of their way.
3
0
10
@calebfahlgren
Caleb
2 months
Super cool to see Postgres in the browser (wasm) with @ElectricSQL 🔥 Quickly and safely iterate super fast in the browser and then deploy when ready.
Tweet media one
@kiwicopple
Paul Copplestone — e/postgres
2 months
We’re launching a new @supabase service: It’s like if ChatGPT and Postgres had a love-child: launch as many databases as you want, build them with AI, create charts, create embeddings. 100% open source.
92
338
3K
1
0
9
@calebfahlgren
Caleb
8 months
Pretty cool to see NaturalSQL trending on @huggingface models page!
Tweet media one
0
0
9
@calebfahlgren
Caleb
1 year
Introducing: Querying CSV with SQL in the Browser 🤯 Get rich insights from large CSV files by querying it with SQL! Built with ◆ @vercel / Next.js / next/dynamic ◆ @duckdb / WebAssembly
1
3
9
@calebfahlgren
Caleb
7 months
@Ubunta @IbisData @duckdb @DataPolars Def recommend DuckDB. It is insanely fast even with large datasets. They just shipped a new, faster, csv parser recently too
@holanda_pe
Pedro Holanda
8 months
There's a new CSV Parser in town. I've totally revamped DuckDB's CSV Parser. And it has a bunch of cool optimizations, such as state-machine parsing, a new parallelism strategy, implicit casting, projection pushdown, etc... All these improvements can result in a significant
Tweet media one
5
27
180
0
0
9
@calebfahlgren
Caleb
6 months
lol
Tweet media one
1
0
8
@calebfahlgren
Caleb
3 months
Really liking the new @shadcn charts. Hacked together a quick space to try them. Something really fun about dynamic, snappy experiences running entirely on the client.
2
1
8
@calebfahlgren
Caleb
2 years
Used @ClerkDev for the first time. Built full user auth in 15 min. Pretty insane!
1
1
8
@calebfahlgren
Caleb
5 months
Really makes it interesting when you have: GPT-4-Turbo - $10/$30 per 1M Tokens LLama3-70B on @GroqInc - $0.59/$0.79 per 1M Tokens
@Teknium1
Teknium (e/λ)
5 months
Ok I guess we really do have gpt4 at home lol, well not my home because i broke my computer but, soon anyways
18
15
323
0
2
8
@calebfahlgren
Caleb
1 month
@cognitivecompai I'd love to see it run on webllm + webgpu
0
0
8
@calebfahlgren
Caleb
1 month
Made something fun for creating cool react apps with Llama 3.1 405B with some leftover cloud credits. Upvote / Downvote responses to help create the largest open React dataset on @huggingface 🤗
2
1
8
@calebfahlgren
Caleb
4 months
Super happy user of pgvector. Bullish
@avthars
Avthar
4 months
PGVECTOR IS NOW FASTER THAN PINECONE. And 75% cheaper thanks to a new open-source extension – introducing pgvectorscale. 🐘 What is pgvectorscale? Pgvectorscale is an open-source PostgreSQL extension that builds on pgvector, enabling greater performance and scalability (keep
Tweet media one
41
226
1K
1
3
8
@calebfahlgren
Caleb
4 months
Took the pup paddle boarding
Tweet media one
0
0
8
@calebfahlgren
Caleb
8 months
@abacaj Finding good seed tasks is very important. For the second iteration of NaturalSQL was able to generate 30k very high quality pairs. Creating seed instructions for non coding datasets is still hard though
1
0
8
@calebfahlgren
Caleb
14 days
The 72B is getting tons of love, but @Alibaba_Qwen is really killing it in the 7B to edge range as well with the code models being really intriguing!
@Alibaba_Qwen
Qwen
16 days
The prince of code LLM, Qwen2.5-Coder!
Tweet media one
1
2
22
0
3
8
@calebfahlgren
Caleb
1 month
I can simply write SQL in code block and throw the variable in a chart super easily which is nice. It's super nice not having to build out charts as well
Tweet media one
2
0
7
@calebfahlgren
Caleb
2 months
@abacaj I love how they are doing it without all the “coming soon” marketing too. Just ship, release, ship.
1
0
7
@calebfahlgren
Caleb
3 months
@alexalbert__ It’s great
@calebfahlgren
Caleb
3 months
Who said Claude doesn’t have code interpreter :)
Tweet media one
8
14
149
0
0
7
@calebfahlgren
Caleb
1 year
@Timb03 Talk to the user and see if they are having issues with cancellation flow. Maybe they don’t know how to cancel
1
0
7
@calebfahlgren
Caleb
23 days
Finally, you can share the analysis with friends with the URL. The syntax for DuckDB is here:
Tweet media one
1
0
6
@calebfahlgren
Caleb
1 month
brain was mush today and accidentally put parens in a text message lmao. anyone else ever done this?
Tweet media one
2
1
7
@calebfahlgren
Caleb
8 months
Benchmarks will say Bard is almost as good at GPT-4. However, when I use it I get this response half the time
Tweet media one
1
0
7
@calebfahlgren
Caleb
5 months
The amount of data I am storing in @motherduck rn for a new project is criminal 🤣
1
0
7
@calebfahlgren
Caleb
8 months
@ImSh4yy Would love to throw my NaturalSQL LLM on this when you open source it 🙌
2
0
7
@calebfahlgren
Caleb
11 days
Visiting SF for a few days. @Waymo is first on the list to try. Also let me know if you’re going to @smalldatasf
2
0
7
@calebfahlgren
Caleb
3 months
Converting Datasets Conversation Formats 🔄 Here's Alpaca -> ShareGPT you can't export results just yet :)
Tweet media one
1
1
7
@calebfahlgren
Caleb
11 months
@rishdotblog Super cool, how much worse would the accuracy be with that much quantization? Compiled sqlcoder-7B for WebGPU recently to run in the browser, but accuracy seemed to suffer a bit.
2
1
7
@calebfahlgren
Caleb
2 months
Been hacking around with a dashboard on top of the hub stats dataset - Powered by @duckdb WASM - Charts by @shadcn There are some other neat trends like spaces by sdk and licenses.
1
1
7
@calebfahlgren
Caleb
2 months
Really great read that goes in depth on synthetic data, working with small models, and the insights that went into creating data that wasn't too complex for small models.
@Thom_Wolf
Thomas Wolf
2 months
It’s Sunday morning we have some time with the coffee so let me tell you about some of our recent surprising journey in synthetic data and small language models. This post is prompted by the coming release of an instant, in-browser model called SmolLM360 (link at the end) The
Tweet media one
Tweet media two
Tweet media three
Tweet media four
14
112
517
1
0
6