The Data Engineering Show
Block Bad Data Before the Write with Nike’s Ashok Singamaneni
October 7, 2025
Nike’s Principal Data Engineer Ashok Singamaneni joins Benjamin and Eldad to discuss his open-source data quality framework, Spark Expectations. Ashok explains how the tool, which was inspired by Databricks DLT Expectations, shifts data quality checks to before the data is written to a final table. This proactive approach uses row-level, aggregation-level, and query data quality checks to fail jobs, drop bad records, or alert teams - ultimately saving huge costs on recompute and engineering effort in mission-critical data pipelines.
In this episode of The Data Engineering Show, Benjamin and Eldad are joined by Ashok Singamaneni, a Principal Data Engineer at Nike. Ashok dives deep into his work on the open-source projects BrickFlow and Spark Expectations. He shares his journey from mechanical engineering to data engineering and the lessons learned over a decade of tackling production data quality issues that lead to costly recomputes.

Ashok explains the philosophy behind Spark Expectations: treating the ingestion and transformation layers of a data pipeline (Bronze/Silver) as a software product rather than just a data engineering product. This means implementing rigorous checks like data quality, unit testing, and integration testing before the data is written to the final layer. He details the implementation using a Python decorator pattern within Spark jobs, allowing engineers to define rules that check for everything from basic column validation to complex referential integrity and aggregation consistency. The discussion also covers the trade-offs of using generative AI tools like Cursor for data engineering and the growing industry trend of prioritizing upfront data quality due to the rise of AI-powered analytics and direct leadership access to data.

What You'll Learn:

If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts. Instructions on how to do this are here.

About the Guest(s)

Ashok Singamaneni is a Principal Data Engineer at Nike, with over twelve years of experience in the data space across the banking, healthcare, and retail domains. He is the creator of the popular open-source frameworks Spark Expectations and BrickFlow, which focus on improving data quality and pipeline reliability. Ashok advocates for treating data ingestion and transformation as a software product, ensuring checks and balances are in place early in the pipeline. He holds a background in mechanical engineering.

Quotes

"DLT expectations gave an idea to the industry that you can do data quality before actually writing the data into your final tables." - Ashok

"I think over the time, in my experience, what I learned is this ingestion layer and the transformation layer, you should treat that as a software product, not like a data engineering product." - Ashok

"If it's mission critical, then you fail the job, not process the data, and don't put that data into the final table so that you don't need to recompute that again." - Ashok

"As the scale of the product increases, it becomes even more difficult for us to find exactly where the issue went wrong... it takes time for you to debug and see, like, lot of human effort also involved." - Ashok

"Data observability and quality is becoming prime because of AI integrations that are happening." - Ashok

"Ultimately, at the end of the day, you are responsible when you're checking in the code. It's not Claude or Karsar that will be blamed if something goes wrong." - Ashok

"The leadership is directly looking at the data and if there is something wrong in the data, then there can be some serious repercussions happening on the business decisions." - Ashok

"Rather than having bad data in the tables and then recomputing or reclarifying things, let's not put that data first in the first place." - Ashok

"You can drop the record and put that in an error table and give that alert to the engineering team that there is some error in the error table you can look at." - Ashok

"The road eq checks that happens are very fast. It should happen as a pretty standard checks that happens on the scale." - Ashok

Resources

Projects:

Tools & Technologies:

For Feedback & Discussions on Firebolt Core:


 Primary Speakers:


The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so

Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.

Check out our three most downloaded episodes: