Zach Wilson Profile Banner
Zach Wilson Profile
Zach Wilson

@EcZachly

32,315
Followers
940
Following
269
Media
3,901
Statuses

Founder @ $1m ARR | ADHD | 800k+ followers on all platforms | 10 yrs DE experience |ex @facebook , @netflix , and @airbnb

A free 60k+ DE newsletter 👉
Joined July 2014
Don't wanna be here? Send us removal request.
Pinned Tweet
@EcZachly
Zach Wilson
5 months
When I worked at Netflix, I built pipelines that processed over 2000 terabytes per day, data pipelines play by different rules when you get to this scale. I go into more detail here in this 2 min YouTube video you should check out! #dataengineering
8
109
860
@EcZachly
Zach Wilson
2 years
Getting into #dataengineering is actually pretty easy - learn SQL - learn Python - learn Snowflake/BigQuery/DataBricks - learn data modeling - learn data pipelines with Airflow If you learn these 5 things, you’ll be interview-ready for a junior position for sure
89
728
4K
@EcZachly
Zach Wilson
10 months
You’re a great engineer if you know the definition of: - idempotent - monoid - decoupled - dependency injection - unit - functional programming - asynchronous vs parallel programming - thread locking - eventual consistency - exactly-once semantics - lambda vs kappa
238
458
3K
@EcZachly
Zach Wilson
9 months
I created a public Github repo with all the resources, books, companies, and social media accounts you should be following to stay current on data engineering topics. I'm accepting PRs so we can crowdsource this effort! #dataengineering
22
471
2K
@EcZachly
Zach Wilson
2 months
If I had to start learning #dataengineering all over again, I’d follow this plan, mostly in order: - Learn SQL — Aggregations with GROUP BY — Joins (INNER, LEFT, FULL OUTER) — Window functions — Common table expressions - Learn about data modeling — read about data
16
339
2K
@EcZachly
Zach Wilson
11 months
I worked 2 years each at Meta, Airbnb and Netflix. Their engineering stacks are different and cultures have pros and cons. - Meta Stack I used: Hive, Spark, HDFS, Dataswarm, Unidash, Deltoid Pros: Tons of motivated people willing to help you Great social events to make
35
210
2K
@EcZachly
Zach Wilson
7 months
The data engineer interview has 4-5 pieces: - the SQL interview Make sure you know: Window functions, self-joins, common table expressions and SQL fundamentals - the data modeling interview Make sure you know: Fact data modeling, dimensional data modeling, aggregate tables
13
329
2K
@EcZachly
Zach Wilson
10 months
I know data engineers who know just Python and SQL who make $500k at Netflix. You don’t need to know the high performance languages to make a killing as a data engineer!
36
157
2K
@EcZachly
Zach Wilson
9 months
The best tech for each task: - batch pipeline: Apache Spark - data visualization: Apache Superset - web api: NextJS (spring boot close second) - SQL database: Postgres - NoSQL database: DynamoDB - Graph database: Neo4j - front end web: React - front end mobile: React
45
259
1K
@EcZachly
Zach Wilson
10 months
When I was at Airbnb, I reduced the pricing and availability data sets to 3% their original size! This removed a few petabytes from the cloud and made Jeff Bezos cry. How did I do this? 1. I recognized that listing and listing night information should be in one table not
28
105
1K
@EcZachly
Zach Wilson
9 months
Seven months ago, I decided to leave my big tech job to build something on my own. I was inspired by @thejustinwelsh 's solopreneur content and believed I could attain a similar life! I was making $600k/year at my data engineering job at Airbnb. I made $600k in seven months as an
39
105
1K
@EcZachly
Zach Wilson
2 years
Data engineering is like you take all the frustrating parts of being a data analyst and combined them with all the frustrating parts of being a software engineer
29
196
1K
@EcZachly
Zach Wilson
6 months
Every SQL concept you should know to ace data engineering interviews: - Basics SELECT, FROM, WHERE, GROUP BY, ORDER BY and HAVING - Window functions Know the difference between RANK vs DENSE_RANK vs ROW_NUMBER Know how PARTITION BY and ORDER BY work in the OVER clause
17
233
1K
@EcZachly
Zach Wilson
5 months
AI isn't the cause of the tech hiring slow down! There was a law that went into effect in 2022 that updated Section 174 of the tax code. Here are two scenarios to illustrate this: - In 2021, you could found a startup and hire an engineer and pay them $100,000. Say your company
32
253
1K
@EcZachly
Zach Wilson
1 month
The data engineer interview has 4-5 pieces: - the SQL interview Make sure you know: Window functions, self-joins, common table expressions and SQL fundamentals - the data modeling interview Make sure you know: Fact data modeling, dimensional data modeling, aggregate tables
9
196
1K
@EcZachly
Zach Wilson
1 year
SQL interviews are common in data engineering. They’re even more common in big tech. I wrote an article today revealing everything I know about them in my nine years of data engineering experience! Link in my bio since Elon would bury it otherwise! #dataengineering
Tweet media one
7
192
1K
@EcZachly
Zach Wilson
11 months
Breaking in to data engineering can be 100% free and 100% project-based! Here are the steps: - find a REST API you like as a data source. Maybe stocks, sports games, Pokémon, etc. - learn Python to build a short script that reads that REST API and initially dumps to a CSV
13
174
1K
@EcZachly
Zach Wilson
9 months
Please never use COUNT(*) in your SQL. It’s bad and unnecessarily selects all the columns. Use COUNT(1) for a basic row count. Or COUNT(column) for the count of a specific column. #dataengineering
36
115
973
@EcZachly
Zach Wilson
9 months
Breaking into data engineering can be very confusing! Should I learn Spark or Snowflake? Python or Scala? Airflow or Argo? Flink or Spark Streaming? AWS or GCP? Superset or Tableau? Fundamentals are more important than technologies: - understanding distributed
22
156
965
@EcZachly
Zach Wilson
9 months
For the next week only, I’m removing the paywall on my data engineering interview articles. I wrote four in depth articles on passing the following four big tech interviews: - data structures and algos - data modeling - data architecture - SQL Link in bio since Elon would
9
128
954
@EcZachly
Zach Wilson
2 months
My favorite stack to build a data analytics product - Apache Spark (for processing) - Amazon S3 (for storage) - Apache Iceberg (for metadata) - Apache Airflow (for scheduling) - Apache Superset (for visualization) - Great Expectations (for data quality) #dataengineering
14
127
966
@EcZachly
Zach Wilson
9 months
SQL is deceptively complex. The order which things apply isn’t that intuitive and can be frustrating when debugging queries. Let’s talk about the ordering of a query and when each step is executed. Here’s the query well deconstruct. SELECT city, SUM(weed_smoked) as
19
140
935
@EcZachly
Zach Wilson
8 months
Happy new year everybody! Here’s a 2024 learning data engineering roadmap. 1. The basics: - learn SQL — SELECT, FROM, WHERE, GROUP BY, JOIN, HAVING, etc - learn Python — data structures: objects, arrays, tuples, namedtuples �� algorithms: recursion, loops 2. Intermediate
15
183
873
@EcZachly
Zach Wilson
11 months
I wrote a new article on passing data engineering data structures and algorithms interviews! I cover: - how to prepare in the interviews - what to do on the day of the interview - the exact leetcode questions I’ve seen in my career and more! Check out the link in my bio
Tweet media one
8
145
867
@EcZachly
Zach Wilson
2 years
Don’t stop at SQL and Python when learning #dataengineering Add: - distributed computation - data modeling - bash/docker/dev ops - a statically typed language like Java you’ll make a lot more money if you do this
24
160
854
@EcZachly
Zach Wilson
6 months
Breaking in to data engineering can be 100% free and 100% project-based! Here are the steps: - find a REST API you like as a data source. Maybe stocks, sports games, Pokémon, etc. - learn Python to build a short script that reads that REST API and initially dumps to a CSV
12
152
834
@EcZachly
Zach Wilson
5 months
Data engineers come in a few levels: - level 1 Knows Python and SQL. Can move data from point A to point B so long as it’s not too big - level 2 Knows distributed compute basics like BigQuery and Spark. Can move data around on the order of single terabytes - level 3
12
134
812
@EcZachly
Zach Wilson
7 days
Data engineering is like you take all the frustrating parts of being a data analyst and combined them with all the frustrating parts of being a software engineer
21
111
778
@EcZachly
Zach Wilson
2 years
How I went from junior data engineer (L3) at Facebook to staff data engineer (L6) at Airbnb in 4 years. - I got hired at Facebook in 2016 as a junior data engineer. I had 2 years of experience and I realized that I probably got hired at the wrong level. (1/13)
19
85
736
@EcZachly
Zach Wilson
11 months
Being a data engineer is 50% building pipelines and 50% thinking about becoming a data scientist or software engineer
25
56
738
@EcZachly
Zach Wilson
6 months
Python, SQL and Airflow will get you to $125k as a data engineer. If you want more, you’ll need to adopt a software engineering mindset. - how do you make these pipelines scalable to arbitrary sizes of data? - how do you make data sets that are adaptable to inevitable
6
84
618
@EcZachly
Zach Wilson
10 months
The S tier data engineering stack is: - S3 and Apache iceberg for storage - Spark and Flink for compute - Airflow or Mage or Prefect for orchestration - Great Expectations for data quality - Druid for fast columnar storage for dashboards - AWS as the cloud platform What’s
46
87
725
@EcZachly
Zach Wilson
1 year
Quick guide to go from 0 to #dataengineering hero: - learn SQL Data Lemur is a great resource here - learn Python Do like… 30-40 leetcode easy and medium questions - distributed compute Get a trial of Databricks or Snowflake and find a training to learn about it 1/3
11
128
705
@EcZachly
Zach Wilson
2 years
Starting out in the data field can be overwhelming. Should you be a data scientist? A data engineer? A data analyst? An ML engineer? The number of role options is overwhelming! Here's some high-level guidance on how to pick between some of these roles. 1/5
16
163
689
@EcZachly
Zach Wilson
1 month
If you’re a data engineer that knows about: - data lakes - file formats - compression techniques - distributed compute You’re crushing it!
13
58
699
@EcZachly
Zach Wilson
4 months
You should pick SQL over Python for all pipelines that can use it! Here’s why: - SQL pipelines are going to be closer to the database and more likely to be optimized by default - SQL is the common denominator language of data professionals allowing analysts to more easily
21
90
664
@EcZachly
Zach Wilson
16 days
Fundamental concepts every data engineer should know because they don’t really change - ANSI SQL - distributed compute - OLTP vs OLAP - CAP theorem - slowly-changing dimensional modeling - fact data modeling - logging best practices - AVRO / Thrift schemas - idempotent
4
101
634
@EcZachly
Zach Wilson
4 months
By changing the sort order of one of my parquet tables at Airbnb, I was able to reduce its size from 35 GBs to 1 GB! Since there's 365 partitions of this data. It goes from being 12.2 TBs of data to 0.3 TBs. Remember when sorting your Parquet data that you should start with
18
85
629
@EcZachly
Zach Wilson
8 months
Here is a picture of how my resume transformed between 2014 and 2023. You'll see I didn't even list SQL or Python on my 2014 resume! You're allowed to change your mind on the trajectory and direction of your technical career! I realized I didn't like mobile app development
Tweet media one
Tweet media two
14
69
614
@EcZachly
Zach Wilson
1 year
It’s wild how many SQL-killers SQL has withstood in the last 50 years
29
64
613
@EcZachly
Zach Wilson
7 months
3 months ago, I created a public Github repo with all the resources, books, companies, and social media accounts you should be following to stay current on data engineering topics. This repo has ~6k stars now! I'm still accepting PRs so we can crowdsource this effort and make
7
93
591
@EcZachly
Zach Wilson
5 months
Data engineers often become bored of data engineering! After a while of SQL + Python + airflow, you start thinking all pipelines are the same and it’s copy and paste work. Some strategies to help with this: - become more end-to-end Maybe that means building a dashboard. Maybe
13
74
592
@EcZachly
Zach Wilson
19 days
Data products are what is going to elevate data engineering into the stratosphere! They power everything you could imagine in the big tech companies! - At Airbnb, I worked on a data product that helped detect "bad hosts" to increase guest satisfaction - At Netflix, I worked
Tweet media one
4
80
593
@EcZachly
Zach Wilson
2 years
Data engineering without SQL is like pizza without cheese. Sure you can do it but it’ll be weird! #dataengineering
12
69
553
@EcZachly
Zach Wilson
2 years
Most companies need the following data roles: - Data engineer for master data management - Data scientist for model development and experimentation - Analytics engineer for KPI development and visualization - Machine learning engineer for model development, deployment, monitor
13
117
553
@EcZachly
Zach Wilson
7 months
Top 4 reasons why data engineering is the best data profession: 1. highest pay for the least education Machine learning engineers and data scientists make 10-15% more but spent 30% more time in college. Data analysts make less than data engineers but require less schooling.
11
71
510
@EcZachly
Zach Wilson
7 months
Picking the right storage technology depends on a lot of factors! Picking the wrong one will always result in pain and migrations down the line! These constraints are around: - latency Low latency is dominated by queues and caches. Data access in those data structures is
Tweet media one
8
111
550
@EcZachly
Zach Wilson
11 months
Window functions are critical in SQL interviews. Here's every piece dissected. An example query for the question "Give me the rolling 30-day sum of revenue by department" SELECT SUM(revenue) OVER (PARTITION BY department ORDER BY date ROWS BETWEEN 30 PRECEDING AND CURRENT ROW)
12
77
540
@EcZachly
Zach Wilson
8 months
Data educators who know their stuff are hard to come by! Here’s a list of a few that inspire: - @v_vashishta - teaches AI strategy - @Alex_TheAnalyst - teaches data analytics - @andreaskayy - teaches data engineering - @NickSinghTech - teaches SQL - @alexxubyte - teaches
11
123
539
@EcZachly
Zach Wilson
9 months
Data engineering has many "this or that" questions - Python or Scala? If you don't know either, start with Python. If you want to transition to the software/data engineer archetype, pick up Scala later. - Streaming or Batch? A vast majority of data engineering jobs are batch
15
91
525
@EcZachly
Zach Wilson
2 years
My favorite stack to build a data analytics product - Apache Spark (for processing) - Amazon S3 (for storage) - Apache Iceberg (for metadata) - Apache Airflow (for scheduling) - Apache Superset (for visualization) - Great Expectations (for data quality) #dataengineering
14
85
525
@EcZachly
Zach Wilson
11 months
The data engineer journey has a few levels: - level 1 Am I an analyst or a data engineer? At this level you’re probably doing a mixture of pipeline work and reporting. You like pipeline work more. - level 2 Why are pipelines so complicated? Here you learn about
3
91
516
@EcZachly
Zach Wilson
7 months
Breaking into data engineering can feel overwhelming! Here’s a path forward that takes 6-9 months to truly complete! #dataengineering
Tweet media one
3
125
519
@EcZachly
Zach Wilson
7 months
Level 1 data engineers: I use SQL Level 2 data engineers: SQL is hard to test, you need TDD in your pipelines, data frames only! Level 3 data engineers: I use SQL and dbt #dataengineering
10
59
493
@EcZachly
Zach Wilson
8 months
Data engineers spend weeks of their lives grinding out data pipelines just for a data analyst to display it with a pie chart! #dataengineering
29
67
496
@EcZachly
Zach Wilson
5 months
Data analysts don’t need to learn that much more SQL to become data engineers! Data analysts have a mastery of the SELECT query! This is 80% of data engineering SQL tools! Adding in a few other SQL commands will make it much easier to go from data analyst to data engineer! -
6
101
491
@EcZachly
Zach Wilson
9 months
I nearly tripled my salary in a year by transitioning from data analyst to data engineer! I started my career as a data analyst in 2014 making $30k. I decided I needed to upskill more. I learned Linux, Hadoop fundamentals, Java MapReduce, and got more depth in my software
22
49
487
@EcZachly
Zach Wilson
11 months
Python, SQL and Airflow will get you to $125k as a data engineer. If you want more, you’ll need to adopt a software engineering mindset. - how do you make these pipelines scalable to arbitrary sizes of data? - how do you make data sets that are adaptable to inevitable
8
81
485
@EcZachly
Zach Wilson
6 months
When I was in my early 20s, I believed that making $250k was going to be my "late career" earnings. This belief changed in 2017 after working at Facebook for a year. After working for a year with people whose parents' paid $250k+ for their college, made me realize that either:
12
56
477
@EcZachly
Zach Wilson
3 months
Data engineering != data science != software engineering So many companies have data engineers writing REST APIs, data scientists building pipelines and software engineers building models. Hire your specialists for their special skills. Don’t push them into inefficient
12
80
478
@EcZachly
Zach Wilson
2 months
Some people have been asking for sample lectures from the boot camp content. Here's the very first data modeling lecture at full length to give you an idea if the boot camp is for you or not! I hope you enjoy the 48 minutes of data engineering bliss!
5
86
450
@EcZachly
Zach Wilson
3 months
Every SQL keyword and its corresponding cloud cost: - SELECT: EC2 compute cost - FROM: S3 egress cost - JOIN: S3 egress cost, EC2 compute cost, shuffle and restart costs - ORDER BY/GROUP BY: EC2 compute cost, shuffle and restart costs - HAVING / WHERE: EC2 compute cost,
5
68
440
@EcZachly
Zach Wilson
8 months
Job requirements are mostly wishlists. I applied to a staff data engineer role at Airbnb that required 10+ years of experience when I had 6 years of experience. I got the job though! Apply to jobs you don’t think you’re ready for! You might surprise yourself!
8
45
437
@EcZachly
Zach Wilson
4 months
In 2024, my favorite technologies to learn are: - NextJS - Apache Spark - Snowflake - BigQuery - Apache Iceberg - Apache Airflow - Spring Boot Any that I’m missing? #softwareengineering #dataengineering
32
46
441
@EcZachly
Zach Wilson
11 months
The data architecture interview is often the thing that stands between you and a fancy senior+ data engineering role in big tech! I wrote a newsletter article covering the pieces that you need to remember to excel in these interviews! Link in the bio since Elon would downrank
Tweet media one
2
56
431
@EcZachly
Zach Wilson
6 months
Slow ETL slaps data engineers on a daily basis! If you want to speed up your ETL 10x, try these things out: 1. Cumulatively build your dimensions Facebook keeps track of 30 days of user activity in an array. This makes calculating monthly active users much easier! You no
11
69
427
@EcZachly
Zach Wilson
7 months
Please stop using sub queries in your pipelines! #dataengineering
13
50
414
@EcZachly
Zach Wilson
17 days
What people think breaking into data engineering looks like: - processing hundreds of terabytes at scale - mastering Spark, Iceberg, Airflow - knowing everything about data lakes and data architecture - burning thousands of dollars on AWS compute just to get a job What breaking
6
61
418
@EcZachly
Zach Wilson
1 year
Data engineers with strong software engineering skills will be in very high demand for the next 5 years! Building end-to-end data products and not just data pipelines will unlock outsized value for companies! Data products are full stack so DEs should upskill here: 1/2
9
65
408
@EcZachly
Zach Wilson
4 months
Here's what the average data engineering interview looks like in 2024: - 1 hour algorithms in Python Here you will be asked irrelevant questions about dynamic programming, linked lists, and inverting trees - 1 hour SQL Here you will be asked niche questions about recursive CTEs
10
65
404
@EcZachly
Zach Wilson
7 months
After you’ve been in data for a while you realize tooling doesn’t matter that much! - whether it’s Snowflake vs BigQuery vs Spark It’s all distributed compute underneath the hood. - whether it’s Airflow vs Prefect vs Mage vs Dagster It’s all CRON underneath the hood -
8
72
405
@EcZachly
Zach Wilson
7 months
My bold 5 year predictions about #dataengineering - Streaming data eng jobs account for 15-20% of all data eng jobs, but pay the most - Rust becomes a mainstream data engineering infrastructure language like Scala - Spark starts looking like Hive does now - Data engineers
16
52
402
@EcZachly
Zach Wilson
9 months
When I was 17, I ran away from home and ultimately got tackled by my 300 pound step dad. He screamed at me, “Zach you’re a drug addict!” My journey since then has been kind of crazy. I spent 17-22 lost. Going in and out of rehabs and feeling dejected and anxious. My one
20
20
402
@EcZachly
Zach Wilson
2 months
When I worked at Netflix, I built a graph database that had over 40 different vertex types and 50 different edge types! This extreme variety of data needs to be handled with care! I wrote a detailed blog post about everything you should consider here:
Tweet media one
4
53
407
@EcZachly
Zach Wilson
10 months
Distributed SQL is not the same as regular SQL! These keywords cause shuffling in distributed environments: - GROUP BY - JOIN - ORDER BY - PARTITION BY These keywords behave mostly the same everywhere: - WHERE - HAVING - FROM - SELECT You’ll notice the word “BY”
5
57
392
@EcZachly
Zach Wilson
10 months
Mid-level engineers often fall into the trap that doing more gets you promoted faster! This bias sounds correct though. Senior engineers write more code that’s why they’re senior right? I remember at Facebook I fell into this trap. I became the main DE owning notifications,
11
41
397
@EcZachly
Zach Wilson
1 year
Refusing to grow beyond SQL and Python will limit your career growth as a data engineer! Growing in the following areas will get you more money: - data modeling Knowing when to use cumulative table design to model your dimensions is critical. Knowing how to efficiently model
16
65
393
@EcZachly
Zach Wilson
5 months
Data modeling has evolved beyond Kimball’s book Here’s why: - Kimball modeling didn’t think about distributed compute environments or large scale data - Splitting everything up into tables that can’t be broadcast JOIN’d in Spark is expensive. - Doing JOINs with extremely
10
64
392
@EcZachly
Zach Wilson
11 months
Top five skills to break into data engineering: - data modeling Dimensional data modeling - what analysts use Relational data modeling - what software engineers use One Big Table data modeling - a new cutting edge way that is appropriate sometime - distributed compute The
5
85
388
@EcZachly
Zach Wilson
5 months
Data engineering SQL interviews always have a silly RANK question. Should you use RANK, DENSE RANK, or ROW NUMBER? Here’s a refresher! For more free data engineering interview, subscribe to my blog: #dataengineering
3
60
392
@EcZachly
Zach Wilson
5 months
Every engineer has one of two tech stacks: - stack one: MacBook, Discord, AWS, JavaScript, React, Jenkins, GitHub, FaceTime - stack two: Windows, Slack, Azure, Python, Vue, GitHub actions, GitHub, Zoom Which stack are you?
72
36
378
@EcZachly
Zach Wilson
4 months
Data analytics is going to become more "Kafka-first" for a variety of reasons - Relying on a data engineer to ETL the data is a bottleneck that a lot of companies don't want to worry about - Technologies like Apache Pinot sit on top of Kafka and enable real-time analytics
11
70
384
@EcZachly
Zach Wilson
4 months
Breaking into data engineering can feel complicated and overwhelming! You need to learn the languages of the trade SQL and Python. You need to learn the tools of the trade Spark,BigQuery, Airflow, Databricks, etc. Then you need to show that you actually know this stuff! I go
Tweet media one
2
72
387
@EcZachly
Zach Wilson
6 months
Once you’ve been in analytics long enough you realize there’s only like… 6 patterns - Aggregatation Count things by other things - Experimentation / Segmentation Split people into groups and test product changes - Accumulation vs Derivative Think rolling sum or YoY
5
32
383
@EcZachly
Zach Wilson
1 year
If you use Excel for data analytics, you’re a data analyst. You don’t have to know SQL and Python. Don’t belittle others for using tools that are different from yours! It’s very impressive how far business can go with just Excel.
11
89
359
@EcZachly
Zach Wilson
9 months
Data engineering interviews are frustrating because: - some treat DE like software eng and give you ridiculous data structures and algorithms questions - some treat DE like analytics eng and expect extremely in depth knowledge of dbt and metrics - some treat DE like being a
7
43
370
@EcZachly
Zach Wilson
5 months
The perfect data engineering portfolio project has the following things: - a data modeling diagram This shows you know how to build usable data tables. - a live visualization people can view from the web This is probably the thing people will look at and share. Without this
4
59
370
@EcZachly
Zach Wilson
6 months
I intentionally don’t monetize my long form YouTube videos so y’all can have the best learning experience even if you can’t afford YouTube Premium! Here are my best ad-free hits: Data Lakes, Apache Iceberg and parquet compression in 60 minutes:
6
65
365
@EcZachly
Zach Wilson
7 months
I turn 29 + 1 today at 9:02 PM Pacific. As I desperately cling to my 20s, here are 29 + 1 things I’ve learned during my time on this planet that have lead to success 1. Always ask questions! The stupider the question the better! 2. Don’t ask what’s the least I can do. Ask
39
51
359
@EcZachly
Zach Wilson
10 months
Linear regression is still more important than LLMs for 95%+ of data science jobs!
14
47
348
@EcZachly
Zach Wilson
8 months
For the holidays, I'm offering ten full-ride scholarships to V4 boot camp. If you get selected, you'll get immediate access to V3 material and get a free seat in the V4 boot camp in the spring! Here's the link to apply for the scholarship:
49
163
344
@EcZachly
Zach Wilson
2 months
Do you want to get better at data engineering? Here's some free YouTube videos you should watch: Data Modeling 100TBs to 5 TBs: Data Lake fundamentals (Iceberg and Parquet): Dimensional Data Modeling: ()
4
80
356
@EcZachly
Zach Wilson
2 years
Data engineering compensation can get kind of crazy as you climb the ladder in big tech! - junior DEs usually make $180-200k - mid-level makes $250-275k - senior makes $300-350k - staff makes $500-600k Climbing the ladder is definitely worth it! #dataengineering
21
37
347
@EcZachly
Zach Wilson
11 months
The data modeling round in big tech interviews weeds out the DEs who can't solve vague business problems! I wrote a free article about everything you need to know to pass these interviews! Link in my bio since Elon would downrank otherwise! #dataengineering
Tweet media one
1
55
349
@EcZachly
Zach Wilson
10 months
Follow these accounts to level up your data skills! - @SeattleDataGuy - talks about data engineering and consulting - @andreaskayy - talks about data engineering - @startdataeng - talks about data engineering and architecture - @NickSinghTech - talks about SQL - @DalianaLiu
4
73
341
@EcZachly
Zach Wilson
11 months
Fundamentals matter more in #dataengineering than tooling! Knowing distributed compute is more important than knowing Spark vs Snowflake Knowing data modeling is more important than Iceberg vs Delta Lake Knowing job orchestration is more important than knowing Mage vs Airflow!
5
51
334
@EcZachly
Zach Wilson
6 months
OLTP and OLAP data modeling and querying are fundamentally different! OLTP: - focused on latest state of data - normalization and 3rd normal form are powerful - optimized for point queries or single user queries - query latency matters a lot - using CTEs can sometimes produce
6
58
336
@EcZachly
Zach Wilson
1 year
After taking my 65-hour self-paced data engineering course you will know how to: - Build ETL in Spark and SQL end-to-end with confidence - Test your ETLs end-to-end and have them tested in CI/CD using Chispa, PyTest, and PySpark - Create data quality checks to prevent data in
8
62
330
@EcZachly
Zach Wilson
1 year
Evaluating the data quality of a pipeline is complicated. There's business value to consider. There's maintenance. There's ROI I wrote a new newsletter covering all these topics. If you go to my profile you can catch it since Elon would downrank it otherwise!
Tweet media one
2
63
324