Caleb @calebfahlgren Twitter profile

Last Seen Profiles

@Beautiful11964

@bokeplokalmalam

@DebbyMayow10729

@AmaliaVict57893

@MoXxiSavage

@bud_4x

@rande_inthe_sky

@moohameed_7

@gear1ess

@lost72678

@FurutachiS26972

@seichishota

@green4sky

@FValverdeRM

@Corneliuslande3

@Lansei_Estrela

@gregso65

@bokeplokalmalam

@MilouDraws

@GuyaneActu

@jessejames3acr

@xioy385612

@kanzya_trhk

@VomitColor

@ty_dilly

@Hamathyst

@wykrhm

@soyokaze1128

@MomoSupportUnit

@dhalkebo

@HarshChilwal22

@Cookie

@e515951

@rena4nyc

@purohitgopal

@ThetheMaphike

Caleb

@calebfahlgren

14 days

Qwen-2.5 on WebGPU 🏎️ • 42 tok/sec for Qwen2.5-Coder-1.5B on Mac ⚡ • Powered by MLC WebLLM and WebGPU 🔥 Watch Qwen2.5-Coder-1.5B build a website entirely in the browser!

13

54

393

Caleb

@calebfahlgren

5 months

Released a free tool on ChatDB: Parquet AI Query parquet files with natural language in the browser. ◆ Powered by @duckdb in the browser ◆ LLM from @GroqInc and Llama-3-70B Here's me querying the capybara-dpo dataset from @huggingface

10

32

232

Caleb

@calebfahlgren

8 months

Natural-Functions, a 7B function calling model, is out on @ollama !

calebfahlgren/natural-functions

Mistral-7B fine tuned for function calling

ollama.com

13

35

203

Caleb

@calebfahlgren

8 months

Excited to release Natural-SQL-7B! A new, very strong Text to SQL model that is my best fine tune yet. You can see how it does on the SQL-Eval benchmark by @defogdata

9

24

196

Caleb

@calebfahlgren

23 days

NEW SQL Console on @huggingface Datasets Viewer🤗 🔸Run SQL on any public dataset 🔸Powered by @duckdb WASM running entirely in the browser 🔸 Share your SQL Queries via URL with others! More coming soon!

13

56

194

Caleb

@calebfahlgren

3 months

Run SQL on any @huggingface Dataset in the Browser • Powered by @duckdb WASM 🦆 • Open Source Chrome Extension • Query different splits and configs

6

25

168

Caleb

@calebfahlgren

1 month

In 2024, it takes 5 min to write some SQL and Markdown and deploy a beautiful dashboard. Here's a dashboard of @huggingface hub stats powered by @evidence_dev and @duckdb

4

9

152

Caleb

@calebfahlgren

3 months

Who said Claude doesn’t have code interpreter :)

8

14

149

Caleb

@calebfahlgren

6 months

I feel like the new CodeQwen-1.5 isn't being talked about enough for coding. Punching high above its weight for a 7B. Outperforming DeepSeek is impressive in itself.

5

14

89

Caleb

@calebfahlgren

2 months

Excited to share I've joined @huggingface 🤗. I'm going to be building on the hub, making easier to work with datasets!

17

3

79

Caleb

@calebfahlgren

1 month

@ShaanVP There’s a whole book on this: The Defining Decade: Why Your Twenties Matter

4

5

78

Caleb

@calebfahlgren

29 days

Not many people know every @Gradio app that is a @huggingface space exposes an API. Here's all the code it takes for me send requests to the Llama guard space on ZeroGPU.

3

16

71

Caleb

@calebfahlgren

3 months

The HF Data Explorer is on the Chrome Web Store 🥳 Drop in SQL on top of any @huggingface dataset powered by @duckdb wasm running entirely in the browser. Here's some fun things you can do with it 🔥

Hugging Face Data Explorer - Chrome Web Store

Explore data from Hugging Face datasets in your browser

chromewebstore.google.com

3

21

69

Caleb

@calebfahlgren

9 days

DuckDB Snippet of the Week 🦆📊 One of my favorite functions that is now part of @duckdb 1.1.0. You can plot beautiful histograms of different values with a single function in the SQL Console!

2

5

70

Caleb

@calebfahlgren

3 months

Okay this is pretty fun. Created a @huggingface space for @fofrAI emoji LoRA since I couldn't find one. Here are some fun ones emojis :)

Julien Chaumond

@julien_c

4 months

This is one of my favorite LoRAs ever. The seminal sdxl-emoji LoRA from @fofrAI is now also available on @huggingface

2

53

1

9

51

Caleb

@calebfahlgren

2 months

add this to your .zshrc for a good time

1

6

49

Caleb

@calebfahlgren

2 months

Bullish on @modal_labs - Great Docs + Examples - Healthy Free Plan (30$ free compute / month) - Never have to worry about infra / just Python

1

4

47

Caleb

@calebfahlgren

8 days

1M Models on @huggingface Hub 📈 Models are going exponential month over month and September isn't even over yet 🤯

5

18

87

Caleb

@calebfahlgren

3 months

Find movie suggestions from the amazing new reddit dataset from @0xdrej 🎥 or use it to find high quality responses in a niche

5

7

40

Caleb

@calebfahlgren

3 months

Wrote a blog post on how you can use the Datasets Explorer to find really interesting insights on @huggingface datasets 🔥 There's even a couple examples of the @duckdb spatial extension with some geospatial queries 🌎

Querying Datasets with the Datasets Explorer Chrome Extension

huggingface.co

0

9

36

Caleb

@calebfahlgren

2 months

Calendar heatmap of Open Source model releases for the big AI labs on @huggingface @Meta @MistralAI @Google all ship 🔥

4

6

34

Caleb

@calebfahlgren

6 months

I have actually been getting noticeable traffic to @chatdb from the free parquet tools I made. Adding another free tool for reading parquet files in the browser Testing it with some @huggingface datasets. It's all powered by @duckdb wasm 🔥

2

34

Caleb

@calebfahlgren

4 months

WOAH, Claude artifacts ships with - @shadcn ui - tailwindcss - recharts it's like v0 built in 🤯

2

4

33

Caleb

@calebfahlgren

2 months

The @huggingface hub has been on 🔥lately. Models created on the hub each month is a stock you want to buy 📈

2

8

33

Caleb

@calebfahlgren

1 year

@natfriedman

Greg Brockman

@gdb

1 year

the best refactor is to delete the code

97

205

2K

0

1

28

Caleb

@calebfahlgren

1 month

getting started with @duckdb and @huggingface in <11s for @merm_bot the duckdb -c might be unfair though 😁

Archie

@archieemwood

1 month

getting started with the @duckdb CLI in <20s for @merm_bot

1

3

60

4

3

28

Caleb

@calebfahlgren

5 months

@yacineMTB If Llama-3 400B lands on Groq it will be insane

3

2

28

Caleb

@calebfahlgren

8 months

@steventey @dubdotco @chatdb , create charts and get answers with just natural language. Some new things coming soon as well!

2

0

27

Caleb

@calebfahlgren

2 months

Is there anyone on @huggingface who has a more cracked heatmap than @bartowski1182 🤯

3

2

24

Caleb

@calebfahlgren

3 months

Just uses pyodide from cdn js

Alex Albert

@alexalbert__

3 months

Artifacts pro tip: If you are running into unsupported library errors with NPM modules, just ask Claude to use the cdnjs link instead and it should work just fine.

44

87

837

0

3

23

Caleb

@calebfahlgren

2 months

New @supabase service can embed and do vector search all in the browser. @ElectricSQL for Postgres in WASM @xenovacom Transformers.js 🔥

Paul Copplestone — e/postgres

@kiwicopple

2 months

@calebfahlgren @ElectricSQL it also uses Transformers.js by @xenovacom to create embeddings in the browser :)

1

0

3

2

1

23

Caleb

@calebfahlgren

9 months

@jmorgan Just compiled it for WebGPU, runs in the browser at 40+ ish tok/s. Is pretty awesome!

cfahlgren1/wasm-TinyLlama-1.1B-Chat-q4f342_1 · Hugging Face

huggingface.co

3

0

20

Caleb

@calebfahlgren

2 months

You can now find your @huggingface heatmap 10x easier! 1-Click Embed your heatmap anywhere!

2

3

21

Caleb

@calebfahlgren

2 months

Will have a fun dashboard to share here soon with some interesting stats from the Hub: @Gradio dominates 🔥. Custom Docker deployments and Streamlit sharing the rest of the pie.

Caleb

@calebfahlgren

2 months

What interesting stats would you like to know about the @huggingface hub? Can be anything for: - spaces - models - datasets For example: ratio of different model licenses or most popular space sdks

3

0

2

1

4

21

Caleb

@calebfahlgren

8 months

@jxnlco I love when people say “AI trained on x” when it’s just RAG

1

0

19

Caleb

@calebfahlgren

4 months

@skirano @shadcn It’s amazing

Caleb

@calebfahlgren

4 months

WOAH, Claude artifacts ships with - @shadcn ui - tailwindcss - recharts it's like v0 built in 🤯

2

4

33

1

2

20

Caleb

@calebfahlgren

2 months

Smol Instruct v0.2 is out! 🔥 You can run them in the browser with WebGPU with MLC WebLLM and Transformers.js bringing blazing fast intelligence to edge devices. The new v0.2 model was trained with synthetic data from Llama3.1 70B and a few other datasets like OpenHermes-2.5.

3

5

19

Caleb

@calebfahlgren

2 months

Base Models can be more fun than instruct models sometimes 😁 Took like 30 minutes to make and is a blast to use! It uses the SmolLM-360M base model by @huggingface for suggestions

Caleb

@calebfahlgren

2 months

have something fun I wanna try 😏

2

0

1

3

5

18

Caleb

@calebfahlgren

8 months

Just add the function definition to the system prompt and natural-functions will call it when needed:

4

0

17

Caleb

@calebfahlgren

5 months

@rauchg @shadcn AGI is when v0 is generating framer motion animations

3

0

17

Caleb

@calebfahlgren

7 months

@yacineMTB I think there are some other key factors too: - Relative early age of the internet (Google etc in the 2000s) - Engineers move into Product / Management quickly

0

16

Caleb

@calebfahlgren

2 months

Finally seeing some fruits of the 405B beast and the newer license with magpie-ultra - 50k unfiltered rows - Instructions for Planning, reasoning, coding, math, planning etc 🤯 Excited to see some high-quality synthetic data! Thanks @argilla_io

1

16

Caleb

@calebfahlgren

8 months

This is so cool! In less than 30 minutes, I had OpenHermes-16k labeling and summarizing my gmail

jason liu

@jxnlco

8 months

Fully local Instructor with speculative decoding, Constrained Sampling, and in-process so theres no network dependency thanks to llama-cpp-python Go follow the maintainer @abetlen he's been cooking

12

27

183

3

2

16

Caleb

@calebfahlgren

2 months

@abidlabs @deliprao Have made a couple updates since, but the space is here :)

Model Release Heatmap - a Hugging Face Space by cfahlgren1

huggingface.co

1

7

15

Caleb

@calebfahlgren

25 days

I have a playground for the @huggingface datasets viewer server I am working on. It's the easiest way to build applications around datasets.

3

4

15

Caleb

@calebfahlgren

3 months

Just added the @huggingface embed to my EV Charge Finder app. It's pretty neat to view a dataset inside an app.

3

4

15

Caleb

@calebfahlgren

1 month

Running Phi-3.5-Mini in the browser at 67 tokens per second just feels right 🤗 🔸 Powered by MLC WebLLM + WebGPU 🔸 Fully Private / Running on Device

2

3

15

Caleb

@calebfahlgren

8 days

Run Llama 3.2 in the browser on WebGPU 🔥 • Llama 3.2 (1B + 3B)🤏 🦙 • Running 100% locally in the browser at 62 tok/sec 🏎️ • Powered by MLC WebLLM + WebGPU ⚡

6

9

59

Caleb

@calebfahlgren

14 days

Qwen-2.5 WebLLM - a Hugging Face Space by cfahlgren1

huggingface.co

0

2

15

Caleb

@calebfahlgren

2 months

The Datasets Explorer now supports the latest version of @duckdb You can summarize splits and how much duplicate data exists with the summarize command. Data leakage and duplicate data can be pretty common among datasets. Here's a really good blog post on it by @BdsLoick

Caleb

@calebfahlgren

2 months

Two important things to look at for datasets: • Leakage from data in train and test sets etc. • Duplicate Data You can check these very easily with a simple SQL query

1

0

3

1

5

15

Caleb

@calebfahlgren

3 months

This is so sick. Working with a CSV file within Artifacts on my phone in a Target parking lot 😂 cc: @alexalbert__

Caleb

@calebfahlgren

3 months

Who said Claude doesn’t have code interpreter :)

8

14

149

1

0

14

Caleb

@calebfahlgren

17 days

We recently released Founder Mode for Datasets as a feature on @huggingface . You can read more about it here.

Introducing the SQL Console on Datasets

huggingface.co

1

2

14

Caleb

@calebfahlgren

4 months

@ImSh4yy Dang who’s making supabase, but with SQLite instead of Postgres

2

0

14

Caleb

@calebfahlgren

7 months

Using the beautiful instructor library and @NousResearch Hermes DPO for near realtime email categorization

3

0

13

Caleb

@calebfahlgren

6 months

Found my new favorite open-source project built by @RillData and @medriscoll ! Love how it all works locally and has a beautiful column explorer🔥

GitHub - rilldata/rill: Rill is a tool for effortlessly transforming data sets into powerful,...

Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code. - rilldata/rill

github.com

1

3

14

Caleb

@calebfahlgren

8 months

Just tried @SourcegraphCody . super cool! Once they support more ollama models outside of CodeLlama-7B it will be 🔥

2

12

Caleb

@calebfahlgren

8 months

I fine tuned on ~20k Text to SQL pairs. I created my own dataset and really focused on: - Tough Questions (complex, multi part) - Multiple Tables per schema (most datasets have 1-3) - Much more columns per table (most datasets have 3-7) This helped NaturalSQL be able to excel

Caleb

@calebfahlgren

8 months

Excited to release Natural-SQL-7B! A new, very strong Text to SQL model that is my best fine tune yet. You can see how it does on the SQL-Eval benchmark by @defogdata

9

24

196

4

1

11

Caleb

@calebfahlgren

8 months

All the GGUFs for the people🤝

chatdb/natural-sql-7b-GGUF · Hugging Face

huggingface.co

1

11

Caleb

@calebfahlgren

2 months

Wrote a blog post about remote Parquet files. Parquet files make up a lot of the datasets on the @huggingface hub. 🔸 HTTP Range Requests 🔸 Parquet Structure, Schema, and Metadata 🔸 What makes querying parquet files remotely so efficient

Parquet in Action: A Beginners Guide

huggingface.co

1

11

Caleb

@calebfahlgren

5 months

Okay transformer.js is sick!

1

0

10

Caleb

@calebfahlgren

8 months

@derekcheungsa @ollama Sure thing, I added a demo notebook with a quick parser and tool I wrote. TIL that Langchain has OllamaFunctions as well

1

2

9

Caleb

@calebfahlgren

28 days

@logan_liffick We need more products that have a personality, love it.

1

9

Caleb

@calebfahlgren

8 months

Thanks @ivanfioravanti for pointing out how easy it is to push to Ollama🤝

ifioravanti

@ivanfioravanti

8 months

@calebfahlgren @ollama @FernandoNetoAi @erhartford You can publish nearly any models from HF using instructions here:

1

13

2

0

10

Caleb

@calebfahlgren

2 months

@osanseviero @huggingface Some people think about the Roman Empire. I think about @TheBlokeAI

0

10

Caleb

@calebfahlgren

6 months

@abacaj Yeah it’s all “OAI is screwed” until they drop GPT-5 and we realize everyone are 2-3 years behind haha

2

0

10

Caleb

@calebfahlgren

17 days

Not me testing DuckDB WASM on every possible mobile device in Walmart 🙈

1

0

10

Caleb

@calebfahlgren

3 months

@dharmesh I kinda like the UX of just forwarding the email instead of an app you have to upload etc etc. Give the user value and just get out of their way.

3

0

10

Caleb

@calebfahlgren

3 months

It's open source. Feel free to contribute 🤝

GitHub - cfahlgren1/hf-data-explorer: Chrome Extension for exploring Hugging Face datasets 🔎

Chrome Extension for exploring Hugging Face datasets 🔎 - cfahlgren1/hf-data-explorer

github.com

0

9

Caleb

@calebfahlgren

2 months

Super cool to see Postgres in the browser (wasm) with @ElectricSQL 🔥 Quickly and safely iterate super fast in the browser and then deploy when ready.

Paul Copplestone — e/postgres

@kiwicopple

2 months

We’re launching a new @supabase service: It’s like if ChatGPT and Postgres had a love-child: launch as many databases as you want, build them with AI, create charts, create embeddings. 100% open source.

92

338

3K

1

0

9

Caleb

@calebfahlgren

8 months

Pretty cool to see NaturalSQL trending on @huggingface models page!

0

9

Caleb

@calebfahlgren

1 year

Introducing: Querying CSV with SQL in the Browser 🤯 Get rich insights from large CSV files by querying it with SQL! Built with ◆ @vercel / Next.js / next/dynamic ◆ @duckdb / WebAssembly

1

3

9

Caleb

@calebfahlgren

7 months

@Ubunta @IbisData @duckdb @DataPolars Def recommend DuckDB. It is insanely fast even with large datasets. They just shipped a new, faster, csv parser recently too

Pedro Holanda

@holanda_pe

8 months

There's a new CSV Parser in town. I've totally revamped DuckDB's CSV Parser. And it has a bunch of cool optimizations, such as state-machine parsing, a new parallelism strategy, implicit casting, projection pushdown, etc... All these improvements can result in a significant

5

27

180

0

9

Caleb

@calebfahlgren

6 months

lol

1

0

8

Caleb

@calebfahlgren

1 month

Here's a guide by @archieemwood on how to use a code space to deploy evidence as a static space.

Deploy Evidence on Hugging Face Spaces

Deploy your Evidence app on Hugging Face Spaces by combining two Spaces: one to store the source code, and one to host the Evidence App as a static website

evidence.dev

0

9

Caleb

@calebfahlgren

3 months

Really liking the new @shadcn charts. Hacked together a quick space to try them. Something really fun about dynamic, snappy experiences running entirely on the client.

2

1

8

Caleb

@calebfahlgren

2 years

Used @ClerkDev for the first time. Built full user auth in 15 min. Pretty insane!

1

8

Caleb

@calebfahlgren

5 months

Really makes it interesting when you have: GPT-4-Turbo - $10/$30 per 1M Tokens LLama3-70B on @GroqInc - $0.59/$0.79 per 1M Tokens

Teknium (e/λ)

@Teknium1

5 months

Ok I guess we really do have gpt4 at home lol, well not my home because i broke my computer but, soon anyways

18

15

323

0

2

8

Caleb

@calebfahlgren

1 month

@cognitivecompai I'd love to see it run on webllm + webgpu

0

8

Caleb

@calebfahlgren

1 month

Made something fun for creating cool react apps with Llama 3.1 405B with some leftover cloud credits. Upvote / Downvote responses to help create the largest open React dataset on @huggingface 🤗

2

1

8

Caleb

@calebfahlgren

4 months

Super happy user of pgvector. Bullish

Avthar

@avthars

4 months

PGVECTOR IS NOW FASTER THAN PINECONE. And 75% cheaper thanks to a new open-source extension – introducing pgvectorscale. 🐘 What is pgvectorscale? Pgvectorscale is an open-source PostgreSQL extension that builds on pgvector, enabling greater performance and scalability (keep

41

226

1K

1

3

8

Caleb

@calebfahlgren

11 months

@AndreyNovikoov Jasper and other companies doing pretty well on their “side projects”

Jasper Announces $125M Series A Funding Round, Bringing Total Valuation to $1.5B and Launches New...

Jasper, an AI Content Platform, today announced it has raised a $125 million series A funding round at a $1.5 billion valuation.

www.jasper.ai

3

0

8

Caleb

@calebfahlgren

23 days

@julien_c @victormustar @severo_dev @qlhoest Thanks @carlo_piovesan for all your work on DuckDB WASM!

0

8

Caleb

@calebfahlgren

4 months

Took the pup paddle boarding

0

8

Caleb

@calebfahlgren

8 months

@abacaj Finding good seed tasks is very important. For the second iteration of NaturalSQL was able to generate 30k very high quality pairs. Creating seed instructions for non coding datasets is still hard though

1

0

8

Caleb

@calebfahlgren

3 months

GitHub - cfahlgren1/hf-data-explorer: Chrome Extension for exploring Hugging Face datasets 🔎

Chrome Extension for exploring Hugging Face datasets 🔎 - cfahlgren1/hf-data-explorer

github.com

0

1

8

Caleb

@calebfahlgren

14 days

The 72B is getting tons of love, but @Alibaba_Qwen is really killing it in the 7B to edge range as well with the code models being really intriguing!

Qwen

@Alibaba_Qwen

16 days

The prince of code LLM, Qwen2.5-Coder!

1

2

22

0

3

8

Caleb

@calebfahlgren

1 month

I can simply write SQL in code block and throw the variable in a chart super easily which is nice. It's super nice not having to build out charts as well

2

0

7

Caleb

@calebfahlgren

2 months

@abacaj I love how they are doing it without all the “coming soon” marketing too. Just ship, release, ship.

1

0

7

Caleb

@calebfahlgren

3 months

@alexalbert__ It’s great

Caleb

@calebfahlgren

3 months

Who said Claude doesn’t have code interpreter :)

8

14

149

0

7

Caleb

@calebfahlgren

1 year

@Timb03 Talk to the user and see if they are having issues with cancellation flow. Maybe they don’t know how to cancel

1

0

7

Caleb

@calebfahlgren

23 days

Finally, you can share the analysis with friends with the URL. The syntax for DuckDB is here:

1

0

6

Caleb

@calebfahlgren

1 month

brain was mush today and accidentally put parens in a text message lmao. anyone else ever done this?

2

1

7

Caleb

@calebfahlgren

8 months

Benchmarks will say Bard is almost as good at GPT-4. However, when I use it I get this response half the time

1

0

7

Caleb

@calebfahlgren

5 months

The amount of data I am storing in @motherduck rn for a new project is criminal 🤣

1

0

7

Caleb

@calebfahlgren

8 months

@ImSh4yy Would love to throw my NaturalSQL LLM on this when you open source it 🙌

cfahlgren1/NaturalSQL-6.7B-v0 · Hugging Face

huggingface.co

2

0

7

Caleb

@calebfahlgren

11 days

Visiting SF for a few days. @Waymo is first on the list to try. Also let me know if you’re going to @smalldatasf

2

0

7

Caleb

@calebfahlgren

3 months

Converting Datasets Conversation Formats 🔄 Here's Alpaca -> ShareGPT you can't export results just yet :)

1

7

Caleb

@calebfahlgren

11 months

@rishdotblog Super cool, how much worse would the accuracy be with that much quantization? Compiled sqlcoder-7B for WebGPU recently to run in the browser, but accuracy seemed to suffer a bit.

cfahlgren1/wasm-sqlcoder-7b-q4f32_1 · Hugging Face

huggingface.co

2

1

7

Caleb

@calebfahlgren

2 months

Been hacking around with a dashboard on top of the hub stats dataset - Powered by @duckdb WASM - Charts by @shadcn There are some other neat trends like spaces by sdk and licenses.

Hub Stats - a Hugging Face Space by cfahlgren1

huggingface.co

1

7

Caleb

@calebfahlgren

2 months

Really great read that goes in depth on synthetic data, working with small models, and the insights that went into creating data that wasn't too complex for small models.

Thomas Wolf

@Thom_Wolf

2 months

It’s Sunday morning we have some time with the coffee so let me tell you about some of our recent surprising journey in synthetic data and small language models. This post is prompted by the coming release of an instant, in-browser model called SmolLM360 (link at the end) The

14

112

517

1

0

6