The Data Engineering Show
The $100M Problem: How Lyft's Data Platform Prevents ML Failures with Ritesh Varyani at Lyft
December 16, 2025
What if your data platform could serve AI-native workloads while scaling reliably across your entire organization? In this episode, Benjamin sits down with Ritesh, Staff Engineer at Lyft, to explore how to build a unified data stack with Spark, Trino, and ClickHouse, why AI is reshaping infrastructure decisions, and the strategies powering one of the industry's most sophisticated data platforms. Whether you're architecting data systems at scale or integrating AI into your analytics workflow, this conversation delivers actionable insights into reliability, modernization, and the future of data engineering. Tune in to discover how Lyft is balancing open-source investments with cutting-edge AI capabilities to unlock better insights from data.
In this episode of the Data Engineering Show, host Benjamin Wagner sits down with Ritesh Varyani, Staff Software Engineer at Lyft, to explore how the company manages a sophisticated multi-engine data stack serving thousands of engineers, while simultaneously integrating AI across infrastructure and user-facing analytics.

What You'll Learn:

If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts. Instructions on how to do this are here.


About the Guest(s)


Ritesh is a Staff Software Engineer at Lyft, bringing six years of experience architecting and scaling the company's data platform. With a background spanning Microsoft's data and cloud infrastructure, including work on Hadoop, Azure, and SaaS products. Ritesh leads Lyft's critical data systems including Trino, Spark, and ClickHouse. In this episode, Ritesh shares insights on building scalable, AI-native data platforms that serve diverse organizational needs, from batch processing and analytics to real-time marketplace operations. His strategic approach to unifying complex data stacks while integrating AI-driven reliability and user experience improvements provides actionable guidance for data engineers and platform leaders navigating infrastructure modernization at scale.


Quotes


"The goal of our platform is to give our users access to the data as fast as possible so that they can drive the meaning from the data that they are getting and take better data driven decisions." - Ritesh

"We are a Hive format shop. We are going to be moving to other open table formats in the future, but at this point, we are a hive table format." - Ritesh

"Our main goal at this point is primarily understanding how we see the data platform running five years from now, three years from now, and how we are able to future proof it." - Ritesh

"In this world of AI, we should not be falling behind in any way, and bringing AI in the right places within our platform." - Ritesh

"We want to make our semantic layer ready for the AI native side of things so that our teams are able to drive the best meaning possible from the data that they see." - Ritesh

"Big data systems are distributed systems by nature, and where AI can help you is very clearly understand how the patterns are changing and what is a good action to take." - Ritesh

"Rather than thinking of this as an AI versus an open source thing, it's about a question of what work is the most business critical and how do you go 100% behind it." - Ritesh

"Not everybody is working on AI initiatives at this point, but where it makes sense according to our business strategy, if it aligns with it, then obviously we go and invest." - Ritesh

"If you are the one who's going to take on the initiative, probably spend a few hours outside of what you're already working on, and that is how you will discover AI and the tooling for it." - Ritesh

"We are trying to consolidate into a single direction of providing different kinds of models so that you are easily able to integrate and focus on the value you want to provide to your customers." - Ritesh


Resources
 

Connect on LinkedIn:

Websites:

Tools & Platforms:


The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so

Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.

Check out our three most downloaded episodes: