Chris Riccomini @criccomini Twitter profile

Pinned Tweet

Chris Riccomini

30 days

New post! I talk with JP about FizzBee, TLA+, and writing stable software. This one’s got me thinking about FizzBee + DST/Antithesis.

FizzBee, TLA+, and (Practical) Formal Software Verification with JP Kadarkarai

JP, the creator of FizzBee, talks about formal methods, TLA+, distributed systems verification, and a better way forward.

materializedview.io

0

6

24

Last Seen Profiles

@modtest964

@plutonismos

@zachxbt

@xiaoxubaobay

@zeambot

@Tante_Binal69

@plutolabs_

@sebcbien54

@SOOFTOLA

@ashly2sauceyy

@ShermansShoes

@copticarchives

@pencitaemanedut

@pluto_dreams3

@zapatas_mom

@_faktaindo

@bokeplokalmalam

@nobrainflip

@Grace_Tatum8827

@sm2y4m

@HistoryDirect

@DRAGONGO02

@cfp100days

@anpaaal

@p0rcelainfreak_

@raphcox

@plumbocat

@ExposLiterarias

@plumasdecera

@AlfonsoSemeraro

@Gem_Nieuwegein

@june9042

@kanekoiroha

@FanWo31731

@amowalaza_

@nasubi_345

Chris Riccomini

@criccomini

7 months

I got a chance to sit in on some @ycombinator pitches this week. A few thoughts: 1⃣ I have AI fatigue--SO MUCH. Very little of it is deep tech; mostly applying OpenAI FM to stuff. Investors in this space: I have no idea how you do this. I feel like there's a lot of $ to be lost.

17

36

824

Chris Riccomini

@criccomini

5 years

Successful intern projects: 1. High value if completed. 2. Low risk if not completed. 3. Able to finish in allotted time (2-3 months). 4. Exciting to work on and talk about. Anything else I'm missing?

44

36

532

Chris Riccomini

@criccomini

2 years

Embedded DBs are having a renaissance. RDBMS: SQLite OLAP: DuckDB Graph: KuzuDB Search: Chroma The developer experience is so good on these. Things just work. Really cool to see.

10

66

453

Chris Riccomini

@criccomini

5 years

My @InfoQ talk 🎙️ on the "Future of Data Engineering" is up! I cover the six stages of data pipeline maturity: 0. None 1. Batch 2. Realtime 3. Integration 4. Automation 5. Decentralization Check it out! 👀 (I'm so sorry for the link picture)

Future of Data Engineering

Chris Riccomini talks about the current state-of-the-art in data pipelines and data warehousing, and shares some of the solutions to current problems dealing with data streaming and warehousing.

www.infoq.com

5

76

323

Chris Riccomini

@criccomini

2 months

It's out! I've been working with @paulgb , @vigneshc , the team @responsive_apps , and others to put together an LSM storage engine built on object storage. Contributors, users, and feedback would all be great!

GitHub - slatedb/slatedb: A cloud native embedded storage engine built on object storage.

A cloud native embedded storage engine built on object storage. - slatedb/slatedb

github.com

12

45

314

Chris Riccomini

@criccomini

1 year

Some interesting infra projects: WarpStream Turbopuffer LanceDB Neon AWS Neptune TigerBeetle Modal Materialize Tabular (Iceberg) DuckDB/Motherduck Arrow Data Fusion/Substrate gvisor KIP-932 (Kafka) VeniceDB Bauplan Buf schema registry Apicurio

19

22

275

Chris Riccomini

@criccomini

8 months

TIL about Apache DafaFusion Comet. @apple has replaced @ApacheSpark 's guts with @ApacheArrow DataFusion. And they're donating it. 🤯 This is an alternative to @MetaOpenSource 's Velox Spark implementation. /ht @philippemnoel

6

57

264

Chris Riccomini

@criccomini

3 years

@sethrosen “Reddit’s database has two tables” “Instead, they keep a Thing Table and a Data Table. Everything in Reddit is a Thing: users, links, comments, subreddits, awards, etc. Things keep common attribute like up/down votes, a type, and creation date” 🥴

12

16

233

Chris Riccomini

@criccomini

11 months

This is the future. Kafka writing Parquet to S3 (via tiered storage). Instant data lake.

Gunnar Morling 🌍

@gunnarmorling

11 months

"KIP-1008: ParKa - the Marriage of Parquet and Kafka" That's an interesting proposal: writing #Kafka segments as #Parquet files. Can see the appeal for data lake ingest; wondering though how well the columnar file structure plays with Kafka semantics 🤔.

4

32

185

6

23

223

Chris Riccomini

@criccomini

29 days

Uber's actually doing the thing. If they keep going, this could be a first-class reference architecture.

1

24

222

Chris Riccomini

@criccomini

2 years

DBs are getting totally ripped apart right now and I love it. Query engines (trino, duck), storage (s3, gcs), and indexing (iceberg, hudi) all separate.

Gunnar Morling 🌍

@gunnarmorling

2 years

"Querying SQLite databases with DuckDB" Enjoyed watching this fast-paced video by @markhneedham demoing how to use #DuckDB 's query engine to run analytics queries against data in a #SQLite file. 5:50 well spent 🦆!

1

19

98

15

20

206

Chris Riccomini

@criccomini

2 months

Big news: I'm helping with @martinkl with a second edition of Designing Data-Intensive Applications! An early release of the first 3 chapters is now available (O'Reilly Learning subscribers only at this point) and we're hoping to finish it next year.

Designing Data-Intensive Applications, 2nd Edition

Data is at the center of many challenges in system design today. Difficult issues such as scalability, consistency, reliability, efficiency, and maintainability need to be resolved. In addition,...

www.oreilly.com

10

38

202

Chris Riccomini

@criccomini

2 years

I'm open sourcing Recap, a dead simple data catalog for engineers! Unlike traditional catalogs, Recap is built to power infrastructure and tools that need metadata. Read the docs: Or dive straight into the Github repo:

GitHub - gabledata/recap: Work with your web service, database, and streaming schemas in a single...

Work with your web service, database, and streaming schemas in a single format. - gabledata/recap

github.com

18

22

202

Chris Riccomini

@criccomini

2 years

Can someone explain DuckDB to me like I'm five? It's not clicking for me. It's like SQLite, but for OLAP, right? What's the big deal?

22

24

196

Chris Riccomini

@criccomini

6 years

Martin Fowler's blog post on schemaless data is super helpful when framing a discussion about when, why, and how to deal with this kind of data.

Schemaless Data Structures

Be wary of schemaless data structures, since they still have an implicit schema and, with a couple of exceptions, an explicit schema is better.

martinfowler.com

2

57

188

Chris Riccomini

@criccomini

2 years

I spent some time today comparing common schema format compatibility (Avro, Protobuf, JSON schema, Parquet, Arrow, CUE, and ANSI SQL). If you're interested, my Google sheet is here: (It's hand-wavy in a few areas especially ANSI SQL, still useful).

Schema Type Survey

docs.google.com

7

32

179

Chris Riccomini

@criccomini

5 years

Good morning! 👋🌄 I've written down some thoughts on the next 2-3 years of data engineering. Would love to hear your thoughts! 😃

10

49

173

Chris Riccomini

@criccomini

1 month

Many of you know I've been angel investing for a while now. Today I'm excited to announce the next step: I've started Materialized View Capital. MVC is a micro VC fund that lets me continue doing what I enjoy, and take some friends along for the ride.

Materialized View Capital

Materialized View Capital invests in early stage infrastructure startups.

materializedview.capital

21

8

172

Chris Riccomini

@criccomini

3 years

DWH trends 🔭 * Realtime DWHs * Analytics Engineering * Data Mesh * Data Catalogs * Reverse ETL * Headless BI * Data Quality * Data Lakehouses * DataOps * Data Products So, yeah, I'm thinkin' we have enough work to fill the next 10 years in data infra/engineering.

5

27

165

Chris Riccomini

@criccomini

4 years

"Kafka’s New Architecture" Truly a tour de force talk from @gwenshap . So many layers and concepts bundled so eloquently. Please watch this.

Keynote: Gwen Shapira, Confluent | Kafka’s New Architecture | Kafka...

Register for Kafka Summit: https://kafkasummit.ioLet's begin a very unusual Kafka Summit by reflecting about change. Changes we've seen in the software engin...

www.youtube.com

2

38

165

Chris Riccomini

@criccomini

3 months

Didn't see this one coming. @bufbuild releasing a serverless @apachekafka product!

Bufstream: Kafka at 10x lower cost

We're excited to announce the public beta of Bufstream, a drop-in replacement for Apache Kafka that's 10x less expensive to operate and brings Protobuf-first data governance to the rest of us.

buf.build

11

34

162

Chris Riccomini

@criccomini

3 years

I am noticing a surprising dynamic. Remote work has actually made people feel MORE human, not less. I get to see their dogs, their kids, their work spaces. Interruptions during meetings are humanizing, not unprofessional.

4

13

153

Chris Riccomini

@criccomini

1 year

More interesting infra projects*: Responsive Nile DB Clickhouse Boiling data Quickwit Databend cr-sqlite Litestream Pravega Restate Inngest Bacalhau Roapi * I have $ in some of these (and some in the list below)

Chris Riccomini

@criccomini

1 year

Some interesting infra projects: WarpStream Turbopuffer LanceDB Neon AWS Neptune TigerBeetle Modal Materialize Tabular (Iceberg) DuckDB/Motherduck Arrow Data Fusion/Substrate gvisor KIP-932 (Kafka) VeniceDB Bauplan Buf schema registry Apicurio

19

22

275

8

14

151

Chris Riccomini

@criccomini

2 years

This is slick: "a no dependency Python SQL parser, transpiler, and optimizer. It can be used to format SQL or translate between different dialects like DuckDB, Presto, Spark, Snowflake, and BigQuery." /ht 🐘jwills @data -folks.masto.host

GitHub - tobymao/sqlglot: Python SQL Parser and Transpiler

Python SQL Parser and Transpiler. Contribute to tobymao/sqlglot development by creating an account on GitHub.

github.com

3

20

147

Chris Riccomini

@criccomini

3 months

"Object Storage Native" ... Ok, this is what I'm calling it now.. So many good things in this post. Here's one:

7

19

144

Chris Riccomini

@criccomini

8 months

The sheer number of subprojects coming out of (and because of) @ApacheArrow is pretty staggering. It reminds me of the Hadoop ecosystem circa 2010 (the good parts 😉). Comet, DataFusion, Ballista, Flight, Substrait (via @VoltronData )...

8

16

142

Chris Riccomini

@criccomini

3 years

👋 The project I've been working on is now open source! Open Robo-Advisor is a Python library that acts as an advisor 🤖 for passive indexing (think Wealthfront). It's very basic, but I wanted to get it out early. Check it out and send feedback! 👀

GitHub - highwire-ai/open-robo-advisor: Open Robo-Advisor is a flexible robo-advisor library...

Open Robo-Advisor is a flexible robo-advisor library written in Python. 🤖 - highwire-ai/open-robo-advisor

github.com

7

20

141

Chris Riccomini

@criccomini

3 years

It bugs me that people use the word “stack” for what the data ecosystem is right now. Stack implies some kind of order or hierarchy, but there isn’t one. We have the DWH and then everything else. It’s more like a graph… Or just a mess… </get off my lawn>

22

10

138

Chris Riccomini

@criccomini

21 days

New post is up! Next-gen infrastructure must support flexible deployments. Embedded, single-node, clustered, BYOC, SaaS, and self-managed. We're finally able to do this with one codebase.

The New Era of Flexible Infrastructure Deployment

Flexible deployment is now table stakes. Infrastructure must run embedded, client-side, single-node, clustered, as SaaS, BYOC, and self-hosted.

materializedview.io

14

17

138

Chris Riccomini

@criccomini

5 years

1/ Post-Map/Reduce (second generation) data processing systems (Spark, Flink, Dataflow, Samza) have been about unifying batch and streaming. @confluentinc (with Kafka streams, KSQL) is focused on unifying streaming and databases.

7

45

136

Chris Riccomini

@criccomini

9 months

Latest is out. This is a longer post that goes in-depth on new query engine layers like @ApacheArrow Data Fusion and Velox. Hot takes: - DWH commoditized - Kafka threatened - HTAP coming

Databases Are Falling Apart: Database Disassembly and Its Implications

Why are engineers taking databases apart and putting them back together, again?

materializedview.io

5

31

137

Chris Riccomini

@criccomini

8 months

Databases are getting quite commoditized. - Velox/DataFusion/DataBend/Substrait/optd commoditize query engine - PostgreSQL commoditizes protocol and SQL dialect - S3/RocksDB/Arrow/Parquet commoditize storage layer WAL is next @jrdntgn talked about this, but it goes beyond perf

11

19

135

Chris Riccomini

@criccomini

8 months

I just got around to reading this. It’s really good.

Internal consistency in streaming systems

www.scattered-thoughts.net

1

19

135

Chris Riccomini

@criccomini

2 years

I don't think ETL/ELT is really a thing any more... between outbox pattern, CDC, kSQL, materialize, dbt, CDW, etc etc... data is extracted, transformed, and loaded all over the place... Maybe it always has been. Were we just lying to ourselves all these years? ETLELTETLELT....

16

18

132

Chris Riccomini

@criccomini

4 years

Recent data engineering themes in my Twitter timeline: * Data gateway/mesh * Data ops * ML pipelines * Compliance, privacy, deletion * Data catalogs Some great 🔗 below... 🧵 1/6

1

43

130

Chris Riccomini

@criccomini

1 year

I've got some news! I’m launching Materialized View, a software infrastructure newsletter. Sign up now to get software infra hot takes, projects, papers, developer interviews, stack deep dives, and more. First post coming soon.

Hello, World!

Introducing Materialized View, a newsletter offering software infrastructure news, developer interviews, stack deep dives, and project and paper highlights.

materializedview.io

15

24

131

Chris Riccomini

@criccomini

4 years

Workplace survival tip: don’t be good at things you hate doing. (Is this a tech brain tweet? 😛)

6

12

129

Chris Riccomini

@criccomini

1 year

Three companies with popular products that are under attack: - dbt labs/dbt - temporal - databricks/spark Tons of startups going after these.

23

2

127

Chris Riccomini

@criccomini

2 years

We implemented some parts of Chad's post () at @wepayeng . Moira Tagle (Staff SWE @ WePay) wrote a schema checker that verified that all DB changes were bw/fw compatible before the change could be merged. 1/n

The Rise of Data Contracts

And Why Your Data Pipelines Don't Scale

dataproducts.substack.com

6

11

123

Chris Riccomini

@criccomini

8 months

Solid rainy afternoon read. "If SQL is considered a programming language, then relational databases function as virtual machines that execute SQL, similar to how the JVM executes Java."

What I Talk About When I Talk About Query Optimizer (Part 1): IR Design

An infrastructure engineer, focused on distributed storage system

xuanwo.io

3

23

119

Chris Riccomini

@criccomini

3 years

What’s the current state of the art for managing Python environments? venv seems dead. Should I be using conda?

63

3

118

Chris Riccomini

@criccomini

2 months

New post! We open sourced SlateDB a week ago. I wrote down some notes about its origin, what it's good for, and where it's headed. And, of course, the obligatory Github ⭐️'s vanity metric. 😃

SlateDB: An Embedded Storage Engine Built on Object Storage

We open sourced SlateDB a week ago. Let's look at where we're at and where we're headed.

materializedview.io

3

28

119

Chris Riccomini

@criccomini

7 months

My latest post is up! Apache Kafka is an aging open source project. It's time to accept that Kafka's protocol is what matters.

Ce n'est pas un Kafka: Kafka is a Protocol

Apache Kafka is an aging open source project. It's time to accept that Kafka's protocol is what matters.

materializedview.io

15

19

118

Chris Riccomini

@criccomini

7 months

8⃣ I don't understand young founders with multiple failed startups that still have extreme conviction and boundless energy for their latest idea. This is just a mentality I don't have.

3

1

115

Chris Riccomini

@criccomini

3 years

Dmitriy ( @squarecog ) and I wrote a book for new software engineers. This is the book we've always wanted to give to new hires, the stuff tech leads and managers wish their new hires knew. It's available for pre-order today! Buy your copy now. 🛒

The Missing README

The Missing README gives new engineers a masterclass in coding practices, technical skills, and tips for workplace success.

nostarch.com

6

20

114

Chris Riccomini

@criccomini

4 months

Incredibly clear description about async IO from @kingprotty . This talk is worth watching regardless of whether you care about Zig or not.

Zig's I/O and Concurrency Story - King Protty - Software You Can Love...

0:00 Talk30:13 Q&A

www.youtube.com

0

19

113

Chris Riccomini

@criccomini

2 years

I think a lot of what’s happening in the data space right now is explained by looking through the lens of “data engineer” vs “analytics engineer”. Each has a different (but overlapping) set of tools/skills/responsibilities. I’m not convinced “analytics engineers” need to exist.

18

17

110

Chris Riccomini

@criccomini

8 months

New post! Picking at some of my stream processing scar tissue. Why Samza failed, how it led to Kafka Streams and Kafka Connect, and why I'm skeptical of Apache Flink.

From Samza to Flink: A Decade of Stream Processing

Why Samza failed, how it led to Kafka Streams and Kafka Connect, and why I'm skeptical of Apache Flink.

materializedview.io

6

13

110

Chris Riccomini

@criccomini

1 year

Like Parquet, but with vector similarity indexes. /ht @DSJayatillake

GitHub - lancedb/lance: Modern columnar data format for ML and LLMs implemented in Rust. Convert...

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du...

github.com

3

15

108

Chris Riccomini

@criccomini

2 months

Part 2 of my real-time OLAP series is out now! Hope you enjoy. 😊 “For each of these use cases, there are a different but overlapping set of requirements. Each needs different query latency, data freshness, data correctness, and query throughput.”

15 Years of Realtime OLAP (Part 2)

Realtime OLAP is colliding with search, observability, and data warehouses.

materializedview.io

1

8

104

Chris Riccomini

@criccomini

2 years

I got asked recently for some interesting projects in the data space. Here are some of my favs. 👇 @datafoldcom data-diff @inkandswitch cambria @TigerBeetleDB tigerbeetle @aerialfly buz @mycelial mycelial ( @sarahcat21 pointed me to .. all of these?)

4

15

103

Chris Riccomini

@criccomini

2 months

A post a month in the making! I'm finally writing about ClickHouse. 😀 I break down what makes it great and the challenges ahead.

Unpacking the Buzz around ClickHouse

A look at the excitement around ClickHouse. I break down what makes it great, and look at the challenges ahead.

materializedview.io

3

9

102

Chris Riccomini

@criccomini

23 days

So if big data is dead, is big data integration dead as well? Wondering if ETL/data integration changes at all with this paradigm. Not sure it follows, but dbt + duckdb seems like a signal. (brain dump to follow) 1/5

Big Data is Dead

Big data is dead. Long live easy data.

motherduck.com

6

15

104

Chris Riccomini

@criccomini

7 months

2⃣ Most silly ideas have been filtered out; everything is reasonable. Everything is very early. Consequently, most differentiation is around founding team, background, location, and where investors can add value.

2

0

99

Chris Riccomini

@criccomini

3 months

Took me all the way to Friday to get this one out. Started as a ClickHouse post, but felt I needed to motivate my perspective a bit. Now it's going to be 3 posts. 😅 A walk down memory lane: A brief history of Avatara, Apache Pinot, and Apache Druid.

15 Years of Realtime OLAP (Part 1)

A brief history of Avatara, Apache Pinot, and Apache Druid.

materializedview.io

4

14

96

Chris Riccomini

@criccomini

8 months

Hello, Monday. Here’s my latest Materialized View, just for you! “It's silly to have applications generate text-based SQL; they should be allowed to pass query plans to the database.”

Databases Should Speak Substrait

It's silly to have applications generate text-based SQL; they should be allowed to pass query plans to the database.

materializedview.io

4

12

96

Chris Riccomini

@criccomini

5 months

Here it is! A walkthrough of @lancedb 's Lance V2 and Meta's Nimble storage formats. There's a lot to like. I'm very bullish, though I do have a few concerns.

Nimble and Lance: The Parquet Killers

Everything you need to know about the new AI/ML storage formats from Meta and LanceDB.

materializedview.io

6

25

95

Chris Riccomini

@criccomini

10 days

This is a great post to share with that relative that keeps asking how LLMs work.

An Intuitive Guide to How LLMs Work

Chatting by chance

www.jlowin.dev

2

16

94

Chris Riccomini

@criccomini

3 years

Such a good post from @NotionHQ on their Postgres migration. * Origin of the name "shard" * Why they chose 480 logical shards * Migration process (double write, backfill, verify, switch) * Migration and verification wrote by different people

Herding elephants: lessons learned from sharding Postgres at Notion

With an effort to make Notion faster and more reliable for years to come — we migrated Notion’s PostgreSQL monolith into a horizontally-partitioned database fleet.

www.notion.so

1

23

90

Chris Riccomini

@criccomini

1 year

This is a pretty excellent writeup on Python's async/await stuff.

Concurrency and async / await - FastAPI

FastAPI framework, high performance, easy to learn, fast to code, ready for production

fastapi.tiangolo.com

0

32

89

Chris Riccomini

@criccomini

1 month

We just released 0.2.0! 🎉 This release has: - in-memory block cache - on-disk object cache - garbage collection - compressed bloom filters Next up: admin CLI, range queries, and more cache improevements.

1

12

92

Chris Riccomini

@criccomini

2 years

What are notable public posts in the data engineering space over the last ~10 years? * * * What else?

12

89

Chris Riccomini

@criccomini

3 months

Woah.. TIL "my rule of thumb for pandas is that you should have 5 to 10 times as much RAM as the size of your dataset."

Querying 1TB on a laptop with Python dataframes – Ibis

the portable Python dataframe library

ibis-project.org

1

13

89

Chris Riccomini

@criccomini

5 months

It feels good to get this off my chest. "Notably, S3 has no compare-and-swap (CAS) operation—something every single other competitor has. It also lacks multi-region buckets and object appends. Even S3 Express is proving to be lackluster."

S3 Is Showing Its Age

I'm squarely in the trough of disillusionment with S3.

materializedview.io

6

12

88

Chris Riccomini

@criccomini

4 months

Love this strategy from @databricks folks. Honesty, brilliant.

6

4

87

Chris Riccomini

@criccomini

2 months

“The CacheLib Caching Engine: Design and Experiences at Scale” Great paper. We’re debating whether to do an LOC cache (section 4 in paper) for SlateDB blocks or do object caching like Alluxio.

2

7

85

Chris Riccomini

@criccomini

3 years

I think centralizing transformation in the data warehouse is a dead end. We're doing it because it's convenient, not because it's right. The trend is to use the DWH more. We should instead be building the convenience of the DWH into the app and streaming layers.

18

11

85

Chris Riccomini

@criccomini

1 month

Put another way: Writing books () Writing code () Writing checks () Writing newsletters ()

Materialized View | Chris Riccomini | Substack

Software infrastructure hot takes, projects, papers, developer interviews, and deep dives. Brought to you by Chris Riccomini. Click to read Materialized View, by Chris Riccomini, a Substack publica...

materializedview.io

Chris Riccomini

@criccomini

1 month

Last 20 days for me: * Designing Data-Intensive Apps 2nd edition with @martinkl * SlateDB with @_RohanDesai , @vigneshc , @paulgb * Materialized View Capital with friends Feeling very lucky. And busy. 😃

1

57

1

8

84

Chris Riccomini

@criccomini

2 months

SlateDB 0.1.4 is out! Biggest update is that we now have real compaction thanks to @_RohanDesai ! Next release, SlateDB will have an in-memory block cache and on-disk (inspired by JuiceFS, Alluxio, and Rockset). 🚀

2

6

83

Chris Riccomini

@criccomini

11 months

Ok, y'all. I'm pretty excited to get this post out. From @temporalio to an overflowing market, durable execution is having a moment. The space is too crowded and frameworks are hard to use. I talk about what needs to change.

Durable Execution: Justifying the Bubble

There’s been a surge in durable execution frameworks over the past 6 to 12 months. Temporal has been the go-to for a while and now challengers are emerging.

materializedview.io

6

16

81

Chris Riccomini

@criccomini

2 months

And there it is!

Amazon S3 now supports conditional writes - AWS

Discover more about what's new at AWS with Amazon S3 now supports conditional writes

aws.amazon.com

4

11

80

Chris Riccomini

@criccomini

7 months

4⃣ Most startups are 2 founders. Nearly everyone is in SF.

3

0

79

Chris Riccomini

@criccomini

8 months

"Solving durable execution’s immutability problem" is a solid read. Their overview of the current state is really helpful. Sounds like changing code on long-running workflows remains difficult.

Solving durable execution’s immutability problem

The hardest problem in durable execution, as in many areas of infrastructure, is safe updates.

restate.dev

1

17

78

Chris Riccomini

@criccomini

15 days

SlateDB on 🍊 front page 😅

5

78

Chris Riccomini

@criccomini

1 year

I think not enough people know about Ambry, LinkedIn's open source BLOB store. @sriramsubram and his team built it and it's still actively worked on.

GitHub - linkedin/ambry: Distributed object store

Distributed object store. Contribute to linkedin/ambry development by creating an account on GitHub.

github.com

1

17

76

Chris Riccomini

@criccomini

6 months

Bombs away! 💣 Latency, cost, durability: pick two. "I recently began hacking on a project to test this theory out. The project—dubbed SlateDB—is a cloud-native log-structured merge tree (LSM) embedded key-value database."

The Cloud Storage Triad: Latency, Cost, Durability

A new theorem for primary persistence on object stores.

materializedview.io

4

19

75

Chris Riccomini

@criccomini

6 months

There are weeks where decades happen.. - @supabase @tembo_io @neondatabase go GA - @lancedb 's Lance2 unveiled - @ApacheArrow DataFusion graduates to TLP in Apache - @auto_dba emerges from stealth .. and what else? I feel like I forgot some stuff ..

3

11

74

Chris Riccomini

@criccomini

3 years

I've been thinking of a survey post on modern/next-gen analytics, including: * Headless BI * Analytics Engineering (DBT 'n stuff) * Data Mesh (I know..) * Reverse ETL * White-label data viz (a la @TopcoatData ) What else should I cover?

19

6

73

Chris Riccomini

@criccomini

5 months

I'm squarely in the trough of disillusionment with S3. It's really showing its age. - No preconditions/CAS (Literally everyone else) - No multi-region buckets (GCS) - No append (ABS) - S3E1Z is expensive and lacks a ton of S3 features

8

5

73

Chris Riccomini

@criccomini

2 years

My mental model for monitoring "data quality" has 3 different categories: · Equality checks (a la @datafoldcom ) · Assertions (a la @expectgreatdata ) · Anomaly detection (a la @anomalo_hq ) Does it match? Does it match my expectations? Does it look weird?

GitHub - datafold/data-diff: Compare tables within or across databases

Compare tables within or across databases. Contribute to datafold/data-diff development by creating an account on GitHub.

github.com

3

16

72

Chris Riccomini

@criccomini

4 months

Lots of activity around Kafka proxies: "Reliably Processing Trillions of Kafka Messages Per Day" "Enabling Seamless Kafka Async Queuing with Consumer Proxy" (2021) Kroxylicious, proxy for Apache Kafka

2

9

72

Chris Riccomini

@criccomini

4 months

My second post about data lakehouse catalogs is up. Trying not to be too salty. 😅 "Databricks and Snowflake are talking a big game. So far, they've given us empty Github repositories and rewrites."

Data Lakehouse Catalog Reality Check

Databricks and Snowflake are talking a big game, but they've given us empty Github repositories and partial implementations.

materializedview.io

5

8

71

Chris Riccomini

@criccomini

8 years

Really excited to share this post! We've been streaming MySQL changes into Kafka. Pretty neat stuff.

WePay Engineering

@wepayeng

8 years

How we stream database changes in realtime with @MySQL , @debezium , and @apachekafka #kafka #mysql #bigdata ..

0

31

41

2

39

70

Chris Riccomini

@criccomini

7 months

Shower thought: DuckDB is an edge database.

11

3

71

Chris Riccomini

@criccomini

2 years

Next up, "Serverless Computing: One Step Forward, Two Steps Back."

2

11

70

Chris Riccomini

@criccomini

7 months

3⃣ Most startups are somewhere between pre-revenue to 100k ARR.

1

69

Chris Riccomini

@criccomini

7 months

5⃣ I mostly sat in on B2B, SaaS, DevTools, and AI pitches. I was surprised (and excited) that several pitches were planning to sell straight to enterprise. Not just SMB/open source GTM.

1

0

69

Chris Riccomini

@criccomini

3 months

We had this exact problem at WePay. The table was called “disbursement_history”. Hot platforms & WePay’s own fee accounts were touched on almost every transaction. Led to a ton of prod issues. Really wish TB had existed then.

0

8

70

Chris Riccomini

@criccomini

6 months

Pretty brutal takedown of tiered storage from the @warpstream_labs folks.

Richard Artoul

@richardartoul

6 months

Tiered storage for Kafka is a classic tarpit idea. It makes all the sense in the world, but it doesn't work in practice. Check out our latest blog post to learn why.

2

10

82

7

8

68

Chris Riccomini

@criccomini

1 month

New post! I talk with @philippemnoel about @paradedb , the experience of building as a PostgreSQL extension, pg_duckdb, pg_lakehouse, and more...

Search on PostgreSQL, Building Extensions, and pg_analytics with Philippe Noël

Philippe Noël is CEO and co-founder of ParadeDB. In this post, Philippe and I discuss ParadeDB, the experience of building as a PostgreSQL extension, pg_duckdb, pg_lakehouse, and more ...

materializedview.io

2

9

66

Chris Riccomini

@criccomini

1 year

I'd love to see someone take DuckDB and use it to kill both ELK and Splunk. Put DuckDB everywhere, so it's more scalable than ELK and more cost efficient than Spunk. Anyone? Anyone?

10

2

66

Chris Riccomini

@criccomini

9 months

I had a random DB thought this morning: Are there any DBs that have an interface to send query plans over the wire rather than SQL? What I'm thinking is essentially a protobuf of something like a substrait plan.

16

3

66

Chris Riccomini

@criccomini

5 years

"Building Financial Systems on Eventually Consistent DBs" Surprising talk by @netflix about how to use Cassandra to build a billing system.

"Building Financial Systems on Eventually Consistent DBs" by Rahul...

Netflix operates in 190 countries worldwide and we have 125M+ customers and growing. At any moment, around the globe, millions of customers are getting charg...

www.youtube.com

1

14

65

Chris Riccomini

@criccomini

3 months

Digging into @ClickHouseDB a bit. It seems to fill a similar use case as Druid, Pinot, and Materialize. What are its differentiators? It doesn't have differential data flow (materialize) or startree indexes (pinot). They do have materialized views, though...

13

2

66

Chris Riccomini

@criccomini

1 month

New Friday, new post! Reflecting on the two monolith to microservice migrations I've survived, I think there's a better way.

Modular Monoliths Are a Good Idea, Actually

Microservices aren't the only way to get high cohesion and low coupling.

materializedview.io

1

13

65

Chris Riccomini

@criccomini

5 years

Comparison of the Open Source OLAP Systems for Big Data: ClickHouse, Druid, and Pinot

5

25

64

Chris Riccomini

@criccomini

3 years

Today marks my last day at @wepayeng . 🤗 Over the 6½ years that I've been at WePay, I've watched the engineering team grow from 20 to 250; I'm most proud of this. We built a talented team, but also a team with GREAT culture. I will miss them.

7

4

65

Chris Riccomini

@criccomini

1 year

Buried in today's @motherduck announcement was that they're already doing hybrid execution between local and remote DuckDB instances. The logical query plan is broken up between local and remote and executed seamlessly.

Chris Riccomini

@criccomini

1 year

@ananthdurai @__AlexMonahan__ @neelesh_salian @peterabcz @thetinot This. 👇 The (L) is local and the (R) is remote. (source: )

3

16

2

11

64