List: Data Engineering | Curated by Yaakov Bressler

Feb 17, 2025
88 stories
2 saves
Data EngineeringYaakov's list of worthwhile data engineering reads.
In
Netflix TechBlog
by
Netflix Technology Blog
Introducing Impressions at NetflixPart 1: Creating the Source of Truth for Impressions
Feb 15
9
Feb 15
9
Zhenzhong Xu
The Four Innovation Phases of Netflix’s Trillions Scale Real-time Data Infrastructure
Feb 1, 2022
20
Feb 1, 2022
20
In
Data Engineer Things
by
Vu Trinh
We might not fully understand the column store!A note on the Row, Column, and Hybrid storage model
Dec 14, 2024
2
Dec 14, 2024
2
In
TDS Archive
by
Cai Parry-Jones
Data Quality Doesn’t Need to Be ComplicatedThree Zero-Cost Solutions That Take Hours, Not Months
Dec 10, 2024
11
Dec 10, 2024
11
In
Level Up Coding
by
Arman Hossen
11 Python Libraries That Will 10x Your Development Speed in 2024: A Data-Driven Analysis11 Game-Changing Python Libraries You’ve Been Missing in 2024
Dec 3, 2024
10
Dec 3, 2024
10
Hugo Lu
Managing Snowflake Infrastructure with dbt and TerraformUnderstanding where to draw the boundaries around dbt is important
Oct 6, 2024
Oct 6, 2024
Analytics at Meta
How Facebook Sets GoalsAuthor: Morgan Henry
Nov 22, 2024
3
Nov 22, 2024
3
In
Google Cloud - Community
by
Oscar Pulido
Stop Thinking in Data Pipelines, Think in Data Platforms: Introducing the Analytics Engineering…Imagine a world where you could deploy your entire enterprise-ready data platform in minutes and empower your data practitioners to…
Oct 28, 2024
5
Oct 28, 2024
5
Prem Vishnoi(cloudvala)
TikTok Data Engineer Interview ProcessTikTok, with its rapidly growing user base of over 1 billion and new business eg e-commerce as well , offers an exciting environment for…
Sep 21, 2024
3
Sep 21, 2024
3
Netflix Technology Blog
ETL development life-cycle with Dataflowby Rishika Idnani and Olek Gorajek
Aug 2, 2024
6
Aug 2, 2024
6
In
TDS Archive
by
Cai Parry-Jones
Radical Simplicity in Data EngineeringLearn from Software Engineers and Discover the Joy of ‘Worse is Better’ Thinking
Jul 26, 2024
4
Jul 26, 2024
4
In
Netflix TechBlog
by
Netflix Technology Blog
Maestro: Netflix’s Workflow OrchestratorBy Jun He, Natallia Dzenisenka, Praneeth Yenugutala, Yingyi Zhang, and Anjali Norwood
Jul 22, 2024
12
Jul 22, 2024
12
In
In the Pipeline
by
Dave Flynn
How to use a Makefile to speed up your dbt project workflowLearn how to use a makefile to reduce dbt command fatigue and group related commands for easy reuse and sharing.
Jun 13, 2024
2
Jun 13, 2024
2
 This story is no longer available
In
TDS Archive
by
Patrick Hoefler
Dask DataFrame is Fast NowHow Dask enables processing data at terabyte scale efficiently
May 27, 2024
May 27, 2024
In
TDS Archive
by
Dario Radečić
Python One Billion Row Challenge — From 10 Minutes to 4 SecondsThe one billion row challenge is exploding in popularity. How well does Python stack up?
May 8, 2024
60
May 8, 2024
60
In
Data Engineer Things
by
Vu Trinh
I spent 5 hours understanding more about the Delta Lake table formatAll insights from the paper: Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores
May 4, 2024
2
May 4, 2024
2
Sean Coyne
Data Engineering: Practice System Design Question and Solution: BatchOne common type of data engineering system design question is the batch process pipeline, with any system design question there are many…
Nov 8, 2023
1
Nov 8, 2023
1
In
TDS Archive
by
João Pedro
My First Billion (of Rows) in DuckDBFirst Impressions of DuckDB handling 450Gb in a real project
May 1, 2024
12
May 1, 2024
12
In
TDS Archive
by
Dario Radečić
DuckDB and AWS — How to Aggregate 100 Million Rows in 1 MinuteProcess huge volumes of data with Python and DuckDB — An AWS S3 example.
Apr 25, 2024
8
Apr 25, 2024
8