The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting.
SEASON 1 DATA BROS
Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse.
SEASON 2 DATA BROS
In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space.
For inquiries contact
[email protected]
Data Rewind: Conversation Highlights from Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan
October 31, 2024 • 28 MIN
This is a special episode of The Data Engineering Show, and joining the Bros is not one guest, nor even two – instead they’re revisiting the best bits from three different fascinating episodes. In each, they spotlight essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspectives on where data engineering is heading and why certain challenges persist.
The Resurgence of SQL: Insights from Ryanne Dolan from LinkedIn
September 24, 2024 • 32 MIN
In this episode of The Data Engineering Show, Ryanne Dolan from LinkedIn joins the Bros to discuss LinkedIn's Hoptimator project. Ryanne explains how they’re simplifying complex data workflows by automating them through SQL queries, integrating Kubernetes, Kafka, and Flink. The conversation highlights the shift towards a consumer-driven data model and the future of data engineering.
Vector Databases Won’t Replace SQL - Andy Pavlo
June 4, 2024 • 42 MIN
SQL’s slow. SQL’s stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production databases safely, and why SQL is here to stay.
How ZoomInfo transitioned from data graveyards to ROI-driven data projects
April 16, 2024 • 39 MIN
Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data & Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture.
Matthew Weingarten from Disney Streaming about Data Quality Best Practices
March 26, 2024 • 27 MIN
Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies.
Joseph Machado, Senior Data Engineer @ LinkedIn talks best practices
February 29, 2024 • 25 MIN
Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods.
Professors Joe Hellerstein and Joseph Gonzalez on LLMs
January 24, 2024 • 46 MIN
Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. They’ve inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded.If you consider yourself a hardcore engineer, this episode is for you.
Megan Lieu on powerful notebooks that enable collaboration
January 1, 2024 • 31 MIN
There are two types of data influencers on LinkedIn:1. Those who talk directly about the products and companies they work for2. Those that provide more general guidance, tips and opinions Can influencers actually be passionate about the products they’re developing and straightforwardly talk about them without sounding salesly? We’re kicking off 2024 with the amazing Megan Lieu on a new Data Engineering Show episode.Megan is one of those influencers that combine the two approaches, and with almost 100K followers, her content seems to be resonating with many data folks. She talked to the bros about her approach to data advocacy as well as the power of notebooks, especially when they become broader and enable collaboration.
Transitioning from software engineering to data engineering
November 22, 2023 • 29 MIN
Every data team should have at least one data engineer with a software engineering background. This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium. She’s a data engineer at Adyen with a software engineering background, and she met the bros to talk about why both software and data engineering skills are so important.Without software engineering skills you’ll be limited to the rigid capabilities of your stack. But without data engineering skills you’ll find it hard to be cost effective and see the bigger picture.
Vin Vashishta explains why we should stop using dashboards
October 4, 2023 • 35 MIN
Vin Vashista, the guy we all love to follow, has never seen a dashboard with positive ROI. This time on The Data Engineering Show, he met the bros to talk about the difference between BI dashboards and analytics that actually introduce knowledge. It’s no longer just about the data volume, it’s about quality and relevance.
Joe Reis and Matt Housley on the fundamentals of data engineering
September 6, 2023 • 42 MIN
After co-writing the best-selling book ‘Fundamentals of Data Engineering’, Joe Reis and Matt Housely joined the bros for some much-needed ranting, priceless data advice, and good laughs. So why are we still talking about providing business value and dashboards, even though we don’t really have anything new to say? If there are so many great tools in the data stack, why are we still so troubled? How can we focus more on things like data governance and data quality that’ll actually push the industry forward?
Bill Inmon, the Godfather of Data Warehousing
August 8, 2023 • 30 MIN
As people in the data industry go, Bill Inmon is among the top, often seen as the godfather of the data warehouse. In this Data Engineering Show episode, Bill Inmon talks about surviving rabbit holes throughout the evolution of data, the data modeling renaissance, and why ChatGPT is not Textual ETL.
Large-scale data engineering at Momentive.ai - Meenal Iyer
July 12, 2023 • 38 MIN
As companies scale, data gets messy. The data team says one thing, the business team says something completely different. Meenal Iyer, VP Data at Momentive.ai, Met the Data Bros to talk about enforcing collaboration in large organizations to ensure what she considers the three most important data factors: Adoption, Trust, and Value.
Data engineering from the early 2000s till today - BlackRock
June 8, 2023 • 41 MIN
When it comes to data management, have we come a long way since the early 2000s? Or has it simply taken us 20 years to finally realize that you can’t scale properly without data modeling. With over 20 years of experience in the data space, leading engineering teams at Cisco, Oracle, Greenplum, and now as Sr. Director of Engineering at BlackRock, Krishnan Viswanathan talks about the data engineering challenges that existed two decades ago and still exist today.
Zach Wilson on what makes a great data engineer
April 27, 2023 • 34 MIN
How good you are at Spark or Flink ≠ how good you are at data engineering. After years of data engineering experience at Airbnb, Netflix, and Facebook, Zach Wilson is now focused on spreading the knowledge in EcZachly and all over social media. He met Benjamin Wagner to explain why data modeling and storytelling are more important than the actual tech, why data engineering is going to see more job growth than data science, and what brought him to start creating content, reaching over 250K followers on LinkedIn.
How ZipRecruiter and Yotpo power self-service data platforms that work
March 23, 2023 • 45 MIN
Data engineers are not paid to do support. Liran Yogev, Director of Engineering at ZipRecruiter, and Doron Porat, Director of Infrastructure at Yotpo talk about building resilient self-service products that keep customers happy and engineers calm. They walked the bros through their data stacks and explained how ZipRecruiter is completely rebuilding its data layer from scratch.
Data Observability with Millions of Users - Barr Moses
February 8, 2023 • 38 MIN
Barr Moses, CEO of Monte Carlo explains the difference between data quality and data observability, and how to make sure your data is accurate in a world where so many different teams are accessing it.
How Amplitude Engineers Process 5 Trillion Real-time Events
January 5, 2023 • 27 MIN
Weichen Wang, Senior Engineering Manager at Amplitude, came to meet the bros to talk about Amplitude's cutting-edge data stack and how it processes 5 Trillion real-time events while dealing with mutable data and massive scale.
Making Observability a Key Business Driver
November 29, 2022 • 48 MIN
80% of the code that you write doesn’t work on the first try. And that’s fine. But knowing which 80% is not working and which 20% is working is the actual challenge. After 10 years at Facebook, managing and scaling the Seattle site to over 6000 engineers(!) Vijaye Raji founded Statsig to make observability automated and real-time. How is the semantic layer managed? How was the Statsig team able to build an observability product that handles real-time ever-changing metadata? What are Vijaye’s main takeaways from engineering at Facebook? Tune in.
A ClickHouse Review from a Practitioner’s Point of View
September 1, 2022 • 34 MIN
Sudeep Kumar, Prinipal Engineer at Salesforce is a ClickHouse fan. He considers the shift to ClickHouse as one of his biggest accomplishments during his eBay days and walks Boaz through his experience with the platform. How on one hand it handled 2B events per minute, but also how it required rollups which compromised granularity when extending time windows.
Besides a ClickHouse review from a practitioner’s point of view, Sudeep tells us about interesting use-cases he’s working on at Salesforce.
The Creator of Airflow About His Recipe for Smart Data-Driven Companies
August 3, 2022 • 45 MIN
According to Maxime Beauchemin, CEO & Founder at Preset and Creator of Apache Superset and Apache Airflow, building a thriving company is not so straight-forward. So how did he do it?
Choosing the right system and services is key for a successful start, and can help you avoid the chaos of having too many tools spread across multiple teams.
Max walks the Bros through his recipe for a smart data-driven company, and the genesis of Airflow, Superset & Presto (with some great tidbits about Airflow's old school marketing approach and how the open source platform took on a life of its own).
How Similarweb Delivers Customer Facing Analytics Over 100s of TBs
July 13, 2022 • 37 MIN
According to Yoav Shmaria, VP R&D Platform at Similarweb, the best way to manage data warehouse costs is tagging every table, database or ETL running to have good granularity over every feature.
Besides handy cost management tips, Yoav walks the bros through the tech stack he implemented to analyze 100s of TBs of web data to serve fast customer-facing analytics.
Full disclosure, Similarweb is a Firebolt customer, but the bros kept it objective, and there’s no Firebolt talk in this episode.
How Klarna Designed a New Data Platform in the Cloud
June 9, 2022 • 40 MIN
Klarna is one of the leading fintech companies in the world, valued at $45B.
While many corporations are “stuck” on-prem, Klarna made the move and today is a cloud-only company. Gunnar Tangring, Klarna’s Lead Data Engineer tells Boaz what this new modernized stack looks like.
How Eventbrite is Modernizing its Data Stack
May 23, 2022 • 23 MIN
Archana Ganapathi, Head of Data & Analytics Engineering at Eventbrite, shares Eventbrite’s data stack modernization process, and how you get engineers to adopt new technologies like dbt which may be outside their comfort zone.
A Deep Dive into Slack's Data Architecture
May 10, 2022 • 34 MIN
Growing from a startup to an IPOed and then an acquired company meant that Slack’s sales org was scaling rapidly.
Apun Hiran, Slack’s Director of Software Engineering explains how the data stack and architecture evolved to support this growth with more reliable and timely metrics.
Speaker: Apun Hiran, Director of Software Engineering (Data), Slack
Hosts: Eldad and Boaz Farkash, CEO and CPO, Firebolt
Transitioning Scopely’s 5.5 PB Data Platform to the Modern Data Stack
April 12, 2022 • 31 MIN
Should data engineering AND BI be handled by the same people? According to Jonathan Palmer, VP Data Platform at Scopely – YES. By Analytics Engineers.
His team of Analytics Engineers is in the final stages of transitioning 5.5 PBs of data which include 15B evens per day to the modern data stack. Tune in to learn how they did it.
Getting rid of raw data with Jens Larsson
March 22, 2022 • 29 MIN
Why would you create ugly data? According to Jens Larsson, don’t even go near raw data. Jens started off at Google, continued to manage data science at Spotify, caught the startup bug at Tink, and recently joined an exciting new company called Ark Kapital, together with Spotify’s former VP Analytics. Jens explains how he and his team killed the notion of raw data at Tink and walks us through the Google, Spotify and Ark Kapital data stacks.
How Zendesk engineers manage customer-facing data applications
February 17, 2022 • 33 MIN
This time on the data engineering show, Eldad abandoned his brother Boaz but it’s ok because Boaz got the full 30 minutes to talk to one of the most interesting people in the data space.
Ananth Packkildurai is Principal Software Engineer at Zendesk and runs one of the strongest newsletters in data – Data Engineering Weekly.
He talked about data applications at Zendesk and how they’re built, technologies that excite him like data lineage and data catalog, and the best routes for software engineers to get their hands dirty in the data world.
How are those data intensive customer facing apps engineered at Gong?
January 20, 2022 • 26 MIN
Gong manages hundreds of thousands of videoconferences and millions of emails PER DAY, which add up to hundreds of TBs.
The Data Bros met Yarin Benado, Gong’s engineering manager to understand what is required to move to a modern data stack to support all this, what this stack looks like, and why it all comes down to data quality at the end of the day.
How Bolt Engineers Are Designing Its Next-Gen Data Platform
December 14, 2021 • 35 MIN
Bolt's ride-hailing app serves over 75M users in Europe and Africa and handles 500K queries every day.
Erik Heintare along with Bolt's engineering team is in the midst of designing a new next-gen data platform and is sharing how it's going to solve their biggest data challenges.
Guest: Erik Heintare - Senior Analytics Engineer at Bolt
Hosts: Eldad and Boaz Farkash, AKA The Data Bros
How did Agoda scale its data platform to support 1.5T events per day?
November 23, 2021 • 38 MIN
Scaling a data platform to support 1.5T events per day requires complicated technical migrations and alignment between hundreds of engineers. What to see how Agoda did it.
Guests:
Amir Arad, Director of Machine Learning, Agoda
Shaun Sit, Senior Dev Manager, Agoda
Hosts:
The Data Bros - Eldad and Boaz Farkash
Diving Into GitHub's Data Stack
October 21, 2021 • 34 MIN
It’s the mother of all development projects. You use it daily. And so do 65M developers around the world. This time on the Data Engineering Show – A deep dive into GitHub’s data stack. Arfon Smith KimYen (Truong) Ladia shared GitHub’s data engineering challenges and solutions and explained why every developer should know and adopt the ADR protocol.
How Vimeo Keeps Data Intact with 85B Events Per Month
August 18, 2021 • 40 MIN
How does the Viemo data team deal with 2 PBs of data and 85B events per month? What made them recently build a data ops team? What data tool does the team love? And why (the hell) did they call their legacy platform Fatal Attraction? Guest: Lior Solomon, VP Data Engineering at Vimeo.
How Substack's Data Stack Supports 500K Paying Subscribers
August 3, 2021 • 24 MIN
Substack is an amazing — if not the most amazing — content publishing platform out there. Essentially, it allows anyone to become a journalist or to start their own newsletters and charge subscriptions for them. So how did they build a data stack that can support all of their 500K paying subscribers?
Guest: Mike Cohen, Data Engineer at SubStack
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt
A Technical Deep Dive to Yelp's Data Infrastructure - With Steven Moy
May 11, 2021 • 50 MIN
As an expert in query engines and performance-related challenges, Steven Moy explains how Yelp handled its huge data growth in the past ten years.
Guest: Steven Moy, Software Engineer at Yelp
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt
How Canva's Data Engineers and Analysts Support 55M Active Users
May 11, 2021 • 43 MIN
Canva is one of the hottest, if not the hottest, graphic design platforms out there. Only a week ago it was announced that they reached a staggering 16 Billion dollar valuation, after having seen even stronger growth during the pandemic. With 55 million active users and around 500 million dollars in annual revenue, it seems that Canva is unstoppable.
So how do Canva analysts and engineers scale their data platforms to meet the company's insane growth?
Guest: Krishna Naidu, Data Engineer at Canva
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt
How AppsFlyer Delivers Sub-Second BI to 1000 Looker Users - With Alexandra Sudilovsky
May 11, 2021 • 31 MIN
AppsFlyer has exploded in size, growing from a small company of 200 people to 1000 people in just three years. Dealing not only with a huge amount of data on a daily basis but doing so while growing quickly as a company can come with many challenges.
Guest: Alexandra Sudilovsky, Senior BI Expert at AppsFlyer
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt
The Data Engineering Show - Coming Soon...
April 5, 2021 • 1 MIN
The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory, and learn from the biggest influencers in tech about their practical day to day data challenges and solutions in a casual and fun setting.