I built a dashboard using
@evidence_dev
with data scrapped from the SEC and a pipeline built with
@duckdb
.
I thought I'd write a quick guide on how to do this since it was super easy and fun.
DuckDB is my new best friend for analytics. It's so fast, makes it easy to manipulate dataframes, ingests anything, and exports it to whatever form I want.
Parquet, Arrow dataframes, and DuckDB are the future for analytics.
After posting my project analyzing VC SEC filings using
@duckdb
and
@evidence_dev
, I got a ton of requests to open source it.
So I did!
Check out the repo below for the open source code. This is a great example of a DuckDB ETL and modern, simple data viz.
The
@8vc
build program is an incredible success story for venture incubation.
OpenGov
Saronic
Epirus
Standard Metrics
Affinity
Resilience
Opto
+ many more.
Billions of dollars of value creation for the world and their LPs.
Looking at VC fundraising data and the magnitudeof
@AngelList
’s VC business is wild.
In the last 3 years, Angel List has filed more times with the SEC than every other VC fund combined.
New blog post diving deeper into building web apps for analytics use cases is now live! Link below
I write about how painful it is to write a user facing analytics app using React as a Data Analyst and how
@duckdb
+
@evidence_dev
make it easy and enjoyable.
DuckDB has so many great little quality of life features.
I had 20 years of data with a CSV file per quarter, and I was able to combine all of the CSVs into a DataFrame with one query.
Some fun news. I built so those of us who can't afford Pitchbook can still get access to Data on VCs!
All of this data is sourced from the SEC for the period 2016 - Dec 2022. Don't be shy with feedback! Looking to get more data in this fun project.
The small data community is stronger than ever.
Dataframes are so back now that rust + gpu acceleration have made them performant.
@duckdb
allows for easy Read/Write to persistent storage and
@observablehq
+
@evidence_dev
make it easy to turn data into presentation.
Super cool project from
@calebfahlgren
.
Using the
@duckdb
read_parquet from a remote URL is so powerful.
Reduces dev time in
@evidence_dev
to basically nothing.
In 2024, it takes 5 min to write some SQL and Markdown and deploy a beautiful dashboard.
Here's a dashboard of
@huggingface
hub stats powered by
@evidence_dev
and
@duckdb
We just launched Global Benchmarks at
@metrics_co
Every startup now gets a score card like the screenshot below to compare their performance against the market split by sector and scale.
Today, we’re introducing a new product: Global Benchmarking on
@metrics_co
! 🎉
Investors and portfolio companies can now get real-time insights into private company performance with powerful market benchmarks on our platform.
Let me know what you think!
Thinking out loud a bit here:
To succeed in consumer, you need to have either better price or better quality than the market, but runaway product market fit and success comes from having both.
My team at
@metrics_co
is one of the teams hiring!
We’re currently building a query engine using
@duckdb
for lightning fast user facing analytics.
This is a tricky problem where we are taking a path at the cutting edge of small data.
Come solve it with us!
We've hired several super talented engineers over the past few months at
@metrics_co
... 🥁
... and I'm excited to share that we just opened up another full-stack software engineering role today.
Know anyone amazing? Please send them our way:
Recently discovered the
@AcquiredFM
podcast and it's an absolute gem! 🌟 Finished the NBA, Nike, Costco, and Microsoft episodes. Amazing insights and stories all around! Highly recommended!
Hit my 1-year today at
@metrics_co
. It’s been a crazy journey so far. I’ve learned so much in this last year. Everyday the decision to bail on
@amazon
and stay here full time feels better and better.
Super stoked to now be an owner in the company.
I’ve been really impressed with
@IbisData
so far.
Modern data stacks need to be flexible passing data between
@PostgreSQL
, dataframes, and analytics dbs like
@duckdb
or
@ClickHouseDB
.
Super helpful to have one common query language connecting these data sources
Set my evening aside to listen to the new
@AcquiredFM
episode.
I managed to bike the entire burke gilman trail, to the home of
@Microsoft
on the east side, and halfway around lake Washington, but I still have just over an hour left 😂
@Suhail
Well it hasn’t been 3 years yet (1.5 so far), but I picked
@metrics_co
over
@amazon
, and I can’t believe I even thought hard about it at the time.
The amount I’ve learned and the scope of my responsibility is 100x here. If you’re ambitious, it’s a really a no brainer.
There’s so much alpha in just emailing people who are interested in the same problems as you.
One of our big breakthroughs on how to structure our new query engine came from cold emailing a speaker from last
@duckdb
con to learn more about their presentation
Super fun working on this post with
@jmelaskyriazi
.
Our product became so powerful when we opened it up with an API and Excel Plug-in.
Instead of being a point solution, we became a cornerstone of our customers data strategy.
I wrote about how we’ve increasingly embraced integrating with horizontal data tools at
@metrics_co
, and how they can make vertical SaaS apps stronger, not weaker.
Read on, and open to your feedback 👉
I’m super jealous of companies with beautiful API docs and hate our Swagger generated docs.
What are some awesome developer documentation frameworks?
I know people have liked
@mintlify
and
@docusaurus
First off, you can check out the project on my website here:
As a venture capital nerd, I find it pretty fun to look at all of the SEC filings and slice and dice them. This is data from 2008 until June 30 2024.
My team at
@QuaestorTech
is hiring! We’re looking for someone who loves startups and VC with finance experience who isn’t afraid to dive into data.
Here’s the link for the role:
Would love some referrals!
Collecting and structuring finance data in the notoriously opaque venture capital industry used to be hard.
With
@metrics_co
VCs can spend more time analyzing their data and less time collecting it.
Queries with
@DataPolars
are so fast it literally feels free.
If you can get your data into memory to start with, basically anything you can do to it will be snappy and performant because polars is so crazy fast
I also wrote a blog post on my website walking through why
@duckdb
is awesome for these types of projects and how quickly I was able to ideate with
@evidence_dev
.
Check it out here:
@duckdb
@szarnyasg
My favorite trick is the recursive CSV importer.
With /data/*/file_name.csv, I can import the same CSV file from a list of folders in the path.
This is super helpful for merging a bunch of time series CSVs
DuckDB is excellent at pulling data from CSV files and putting them into a database or data frame.
There's a great syntax where you can pull files recursively from directories by using a * in the path. This pulls all of the CSVs and automatically joins them!
This report is super interesting. In a bull market, investors care less about their companies financial performance and mostly focus on traction + top line growth.
In a bear market, investors need to stay on top of their portfolio companies performance. Transparency matters!
Super interesting report from
@juniper_square
on the state of VC.
53% of VCs are looking to improve portfolio monitoring. Top area for forward-looking technology investment. 👀
Not surprised based on what we're seeing.
Time to build!
@metrics_co
@mattmireles
@cartainc
@AngelList
I’ve heard nothing but good things from AngelList customers and nothing but complaints from Carta customers. That 7B round is looking more like a curse than anything
My first step was to download TSV files on VC financing from the SEC website. VCs must file Form Ds when they fundraise and this is a great source of information.
When a journalist cites a SEC filing on VC, this is it.
Link:
In 100 lines of python and another 100 lines of Markdown, I was able to put together this report on VC Fundraising using SEC data.
If you're want to do a similar analysis, definitely reach out! I love talking to venture capital and data nerds. Particularly people who are both!
I had a login bug for my fun little side project today, and
@anmolm_
texted me about it before I even knew it existed!
Pretty cool to have built something that people find enough value in to use.
Evidence is a great tool that uses DuckDB to read data from various sources and allows data scientists to easily create production notebooks using Markdown and JS.
As a Python and SQL guy who can get around a front-end, I found it really intuitive.
The most magical moment is at companies like
@WarbyParker
and
@italic
where they provide a higher quality product for cheaper! It feels stupid not to purchase it. There literally is no trade off in the mind of the customer.
By my counts ~1650 unique GPs have raised on
@AngelList
, but as with many aspects of venture, there is a power law at play with clear winners jumping off the chart in terms of dollars raised on the platform and number of funds raised on the platform.
There’s an asymmetric upside to luck.
Getting lucky once can have a cascading effect whereas getting unlucky normally doesn’t have much of an effect.
A good example is applying for jobs. One lucky break can parley into a future career and one rejection is meaningless.
This is an awesome read!
Columnar tables in postgres solve the PG analytics problem. This is a much better solution than having a CDC ETL process and an entire second database.
Really cool to see where the market is moving here. Definitely the future of analytics
SQL queries are expressed as DuckDB flavored SQL blocks in Markdown and display components are really simple JS components.
The code below is what I used for the data table. Just a simple SQL query and a couple lines of JS.
@AcquiredFM
arena show was so much fun last night. Met some really awesome people.
From founders raising pre-seeds, to a crazy smart 18 year old who already had a successful exit, and tons of other amazing people.
When’s the next one?
These api connections as a service companies are so cool. Really like cos like
@tryfinch
@tellerapi
. They really help companies focus time and energy on things that matter instead of reinventing the wheel over and over again
We are now accepting applications for the summer 2023 cohort of the 8VC Fellowship Program! All software engineers and designers looking internships at an early-stage startup for summer 2023 are welcome and strongly encouraged to apply.
My team at
@metrics_co
is hiring a senior backend engineer.
We’re building a private markets analytics engine to help VCs analyze their companies performance. Super cool problem to solve and some incredible data to work with!
Q1 2023 was a very weak quarter for VC fundraising by my counts. There were only 4 major VC fundraises in Q1 2023.
@8vc
raised 869M
@felicis
raised 825M
@ThriveCapital
raised 745M
@KaszekVentures
raised 975M (across 2 funds)
From my data, this is the lowest number since 2015 👀
I often think about what the world would be like if we had unlimited energy.
Our relationship with most resources would fundamentally change since we can use energy to change the state of resources.
We’d be able to turn ocean water to freshwater, hot to cold, etc. for free!
@realnatekp
@metrics_co
@max_muoto
@gwevans6
@jmelaskyriazi
Such a fun to product to build! It’s been so cool seeing how much fun our users are having with it.
Those
@8vc
benchmarks were crazy to see. Reading that post was so eye opening on how much of an impact this product is making
ORMs are one of the most dangerous traps in backend. The SQL they generate is often total garbage.
Writing SQL as f strings is super ugly and janky though.
Libraries like SQLGlot make writing SQL in python much more pythonic while still maintaining the performance of SQL
I did some data cleaning on this data frame and exported it using the DuckDB SQL engine to create a database file.
With this database file, I could load the data into
@evidence_dev
and start plotting!
Not to mention the impact these companies have on the world.
Saronic and Epirus are literally building the future of warfare.
Affinity, Standard Metrics, Anduin, and Opto are key components of private market infrastructure.
For the data nerds, this was a cool problem to solve for us that we solved in a novel way!
Our benchmarks are live, meaning they don’t run on an etl process and instead are updated continuously as new data becomes available meaning we always are showing the most impactful data
Flight from Seattle to SF and of course the guy sitting next to me was a VC.
Everyone around me’s ears perked up the second the words Series A left my mouth.
VCs love being “contrarian” but the second someone says startups are overvalued it’s all “RIP goodtimes”,“haven’t invested a cent”, “have you thought of an acquisition?”
Can people chill and just focus on building great companies?
@archieemwood
Radar charts are actually really useful when comparing against a benchmark.
This radar from
@scottjwillis
shows a player benchmarked actoss key metrics. We can see quickly the player is an excellent passer and looks well rounded for a 20 year old
Yáser Asprilla - Watford 2023-2024 Attacking Midfield & Winger radar and distribution
This looks like some solid production for a 20 year old in the Championship. Seems like a player for the future rather than say a first team impact type move based on the stats.
@AlexPrejean
We literally just wrote a blog post about this at
@metrics_co
.
It took us a while to figure it out but it’s definitely optimal in our opinion. Fighting excel is just weird. Your customers should be able to do what they want
Added VC data going back to 2009 to Nextrounds just now which is as far back as the SEC’s structured records go.
Pretty crazy that
@sequoia
has raised $50B since 2009!
Jokes on me since the token popped and everyone is rich now lol. Just goes to show the way to make money in 2021 is just putting your money into whatever your dumbest investment ideas is
Kind of funny that everyone posted about buying the constitution on Twitter for like a week and just ended up losing bunch of money to Etheruem’s ridiculous gas fees
By passing a * to read_csv, DuckDB recursively went through each folder and grabbed the specified CSV, concatenating them all into a DataFrame.
Code:
db .read_csv(f"/*/FILE_NAME.tsv")
My old pandas implementation was ~7 lines long and had much longer run time and complexity.
Lets go! Seattle based reusable rocket company Stoke raising a huge round. Nothing is as cool as firing rockets into space and making that process as economical and sustainable as possible is paramount to success. Super stoked to see this announcement!
Priorities in building a startup from
@bhorowitz
’s The Hard Thing About Hard Things
“Take care of the people, the product, and the profits - in that order”
Super cool to get my tweet featured in
@TurnerNovak
’s substack post! I’ve got some more fun stats I’ll be posting about
@AngelList
in the next couple days 👀
Recommend giving Turner’s piece a read
Started using Arc from
@browsercompany
today, and it’s pretty magical.
I’ve always been into alternative browsers (firefox, opera, maxthon, brave), and Arc is by far the best I’ve ever used
Customer: “I need to know you can get my data in?”
Sales: “Oh Absolutely”
PM: “How might we automatically get this data?”
Engineering: “I can automate csv uploads in a month”
Customer: Sends a blurry screenshot of an incoherent excel file
Implementation: 🤦♂️
@anmolm_
2.75 MOIC in 8 years isn’t too bad. Guessing a lot of the deployed capital was in more recent years so hard to judge.
1B returned to investors is pretty impressive in that time frame tho