The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

by Astronomer

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward.Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in...

Insights from recent episode analysis

Audience Interest

Estimated Reach: 500 to 3K

Listeners across platforms

Podcast Focus

Categories: technology

Publishing Consistency

Frequency: ~3-4 / Week

50+ episodes since 2018

Platform Reach

Insights are generated by CastFox AI using publicly available data, episode content, and proprietary models.

Most discussed topics

data engineering

apache airflow

airflow

data orchestration

data pipelines

Brands & references

Generic platforms filtered out.

Low Confidence

Total monthly reach

500 to 3K

Estimated from 1 chart position in 1 market.

By chart position

🇰🇪
KE · Technology
#141
500 to 3K

Per-Episode Audience
Est. listeners per new episode within ~30 days
150 to 900
🎙 Daily cadence·96 episodes·Last published 2d ago
Monthly Reach
Unique listeners across all episodes (30 days)
500 to 3K
🇰🇪100%
Active Followers
Loyal subscribers who consistently listen
200 to 1.2K

Market Insights

This ShowCategory Avg

No category insights available.

📡

Platform Distribution

Reach across major podcast platforms, updated hourly

Total Followers

—

Total Plays

—

Total Reviews

—

YouTube

Subscribers

—

Views

—

Videos

—

Castbox

Followers

—

Plays

—

Reviews

—

Podcast App

Followers

—

Plays

—

Reviews

—

Podcast Republic

Followers

—

Plays

—

Reviews

—

TuneIn

Followers

—

Plays

—

Reviews

—

* Data sourced directly from platform APIs and aggregated hourly across all major podcast directories.

On the show

From 17 eps

Hosts

Marc Lamberti

1 ep

Marc

1 ep

Astronomer

1 ep

Recent guests

18 across last 17 eps

Kaxil Naik

1 ep

Pavan Kumar Gopidesu

1 ep

Samantha Blaney Cuevas

1 ep

Kenten Danas

1 ep

Benjamin Rogojan

1 ep

Ashir Alam

1 ep

Christos Bisias

1 ep

Karan Alang

1 ep

Egor Tarasenko

1 ep

Ethan Shalev

1 ep

Carlos Daniel Puerto Niño

1 ep

BÖ

Buğra Öztürk

1 ep

Filip Kunčar

1 ep

Najeeb Sulaiman

1 ep

Mateus Ferreira

1 ep

William Orgertrice III

1 ep

Shrividya Hegde

1 ep

Julian Larralde

1 ep

Recent episodes

What's New in Apache Airflow® 3.3

Jul 9, 2026

Unknown duration

Running Airflow 3 in a regulated environment at OTPP

Jun 25, 2026

Unknown duration

Managing a Customer Analytics Platform with Airflow at Skimlinks

Jun 11, 2026

22m 40s

Building a custom Tableau provider for Airflow at JLR

Jun 4, 2026

21m 18s

Orchestrating 2,000 Airflow pipelines at Luiza Labs with Mateus Ferreira

May 28, 2026

32m 37s

🔗

Social Links & Contact

Official channels & resources

🌐

Official Website

📡

RSS Feed

Episodes

106

~0.8 per week

Avg length

24m 11s

17m 43s – 32m 37s

Range

Sep 2020 – Apr 2026

Topics

data engineering, apache airflow +68

Guests

Julian Larralde +17 · last 17 eps

25 of 25

Date	Episode	Topics	Guests	Brands	Places	Keywords	Sponsor	Length
7/9/26	What's New in Apache Airflow® 3.3	Airflow 3.3 is here, with a set of features to help with the messy realities of production pipelines: persisting state across retries, reacting intelligently to different failure types, and partitioning assets by more than just time. In this episode, Marc Lamberti, Education Content Lead at [Astronomer](astronomer.io), joins Kenten Danas to walk through what's new in the release and where each feature actually pays off.Key Takeaways:00:00 Introduction.01:46 The new task state store (AIP-103) lets tasks persist state across retries, so a long-running Spark job can be reattached after a worker failure instead of being duplicated on retry.03:46 The asset state store enables watermarking patterns: persist the last processed date or offset to an asset and resume from there on the next run.05:33 Why this matters for agentic workflows: resume an agent from where it left off rather than replaying every action.06:58 Why XComs don't solve this problem: they get reinitialized on every retry.09:27 Pluggable retries let you attach a retry policy to a task that branches on the exception type. Retry on transient errors, stop immediately on a 403.11:42 Subclassing the retry rule for more complex logic, including dynamic retry counts that used to require hacking the metadatabase.15:52 Updates to asset partitions in 3.3: segment-based partitioning with fan-out and roll-up mappers for downstream DAGs.21:42 Running tasks in Java and Go, moving Airflow toward a multi-language orchestrator.23:33 DAG versioning improvement: choose whether a manual rerun uses the most recent DAG version or the original version from that run.25:03 Advice for teams still on Airflow 2: use the upgrade ebook and Astro's AI migration tooling to handle the undifferentiated heavy lifting.Resources Mentioned:AstronomerAirflow 3.3 Release NotesThe Task State StoreUpdates to the asset partitions featureRetry policiesMulti-language support3.3 WebinarUpgrading from Airflow 2 to 3 ebookThanks for listening to "The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI." If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow						—
6/25/26	Running Airflow 3 in a regulated environment at OTPP	Running Apache Airflow at a major pension fund means balancing strict compliance requirements with the need to move fast on new capabilities. On this episode, Kowsy Narayan, Cloud Data Platform Lead, Data Engineering at [Ontario Teachers' Pension Plan](otpp.com), joins host Kenten Danas to walk through OTPP's cloud migration, their move to Airflow 3, and going fully live on remote execution.Key Takeaways:00:00 Introduction.01:18 Inside the OTPP data platform team and what they're responsible for across cloud migration, standards, and enablement.02:33 What's driving OTPP's multi-year move off on-prem to a cloud architecture built around scalability and resilience.02:57 The new stack: Snowflake as the enterprise data platform, dbt for transformation, and Airflow as the orchestrator in the middle.04:15 Why OTPP chose Astronomer: active contributions to the Airflow OSS project, fast runtime releases, and built-in monitoring, observability, and RBAC.05:50 Evolving from dbt core with Bash operators to dbt Cosmos for model-level granularity, lineage, and precise failure recovery, plus a performance boost from watcher mode.08:00 Upgrading from Airflow 2.9 to Airflow 3, using the Astro CLI and linters to catch deprecations quickly.09:32 The drivers behind adopting remote execution: keeping data inside the security perimeter and scaling workloads on their own Kubernetes cluster.11:35 How remote execution replaced a complex network architecture of VPN tunnels and firewall rules, removing latency along the way.12:53 The POV process, success criteria, and a six week timebox to validate remote execution before going to production.14:14 Going fully live: OTPP's last hosted deployment was sunset just before recording.15:06 What Kowsy wants next from Airflow: AI orchestration capabilities and continued maturation of remote execution.Resources Mentioned:[Ontario Teachers' Pension Plan](otpp.com)[Apache Airflow](airflow.apache.org)[Astronomer](astronomer.io)[Cosmos](astronomer.io/cosmos)Thanks for listening to "The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI." If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow						—
6/11/26	Managing a Customer Analytics Platform with Airflow at Skimlinks✨	customer analyticsdata engineering+4	Julian Larralde	SkimlinksApache Airflow+4	—	SkimlinksAirflow+6	—	22m 40s
6/4/26	Building a custom Tableau provider for Airflow at JLR✨	data engineeringApache Airflow+4	Najeeb Sulaiman	JLRTableau+1	UKRange Rover+3	AirflowTableau+5	—	21m 18s
5/28/26	Orchestrating 2,000 Airflow pipelines at Luiza Labs with Mateus Ferreira✨	Airflow orchestrationdata engineering+3	Mateus Ferreira	Luiza LabsMagazine Luiza+4	Brazil	Airflowpipelines+7	—	32m 37s
5/21/26	Enhancing DAGs for Data Processing with William Orgertrice III at Cargill✨	data engineeringApache Airflow+4	William Orgertrice III	CargillAirflow+2	US	DAGdata pipeline+5	—	26m 15s
5/14/26	Getting Into Data Engineering with Shrividya Hegde, Data and AI Engineer✨	data engineeringAI+3	Shrividya Hegde	AirflowAstronomer	—	data engineeringAI+3	—	27m 34s
5/7/26	Orchestrating DBT With Cosmos and Airflow with Filip Kunčar at ShipMonk Product Development✨	data orchestrationAirflow+4	Filip Kunčar	ShipMonk Product DevelopmentAirflow+2	US	data orchestrationAirflow+5	—	24m 57s
4/30/26	Building Airflow CTL with Buğra Öztürk at Mollie✨	Airflow CTLApache Airflow+4	Buğra Öztürk	MollieApache Airflow+1	—	Airflow CTLApache Airflow+5	—	19m 42s
4/23/26	Introducing Airflow’s Common AI Provider with Pavan Kumar Gopidesu and Kaxil Naik✨	Apache AirflowAI orchestration+4	Kaxil NaikPavan Kumar Gopidesu	Apache AirflowApache DataFusion+2	—	AI providerAirflow 3+3	—	28m 36s
Want analysis for the episodes below?Free for Pro Submit a request, we'll have your selected episodes analyzed within an hour. Free, at no cost to you, for Pro users.
4/16/26	Building AI Debugging Agents Into Airflow DAGs at Jeppesen ForeFlight with Samantha Blaney Cuevas✨	AIdata pipelines+4	Samantha Blaney Cuevas	AirflowJeppesen ForeFlight	—	AI debuggingAirflow DAGs+4	—	22m 17s
4/9/26	Introducing Airflow 3.2✨	Airflow updatesdata pipelines+3	Kenten Danas	Airflow 3.2Astronomer+1	—	Airflow 3.2data pipelines+6	—	26m 22s
4/3/26	Reflections on a Decade of Data Engineering at Seattle Data Guy✨	data engineeringApache Airflow+3	Benjamin Rogojan	Apache AirflowSeattle Data Guy	—	data engineeringApache Airflow+5	—	26m 12s
3/26/26	Managing Data Quality and Governance With Airflow at Credit Karma with Ashir Alam✨	data qualitydata governance+4	Ashir Alam	AirflowDAG Factory+4	—	data qualitydata governance+6	—	22m 04s
3/19/26	Open Source Airflow Contributions and Performance Improvements at G-Research with Christos Bisias✨	open source contributionsperformance improvements+4	Christos Bisias	Apache AirflowG-Research+1	—	open sourceAirflow+5	—	17m 43s
3/12/26	Automating Threat Intelligence Using Airflow with Karan Alang✨	cybersecuritythreat intelligence+5	Karan Alang	AirflowXDR+4	—	threat detectioncybersecurity automation+5	—	22m 14s
3/5/26	Using Plugins To Customize Airflow at Ponder Labs with Egor Tarasenko✨	Apache Airflowdata orchestration+5	Egor Tarasenko	Ponder LabsApache Airflow+2	—	Apache Airflowplugins+5	—	27m 45s
2/26/26	Scaling Airflow at Wix for Analytics and AI with Ethan Shalev✨	data orchestrationAirflow migration+4	Ethan Shalev	WixApache Airflow+4	—	Airflow 3DAGs+5	—	18m 00s
2/19/26	Using Airflow To Orchestrate Billions of Events at Addi with Carlos Daniel Puerto Niño✨	data orchestrationAirflow+4	Carlos Daniel Puerto Niño	AddiApache Airflow+4	—	data orchestrationAirflow+5	—	24m 49s
2/12/26	Building Event-Driven Data Pipelines With Airflow 3 at Astrafy with Andrea Bombino	Real-time data expectations are reshaping how modern data teams think about orchestration and dependencies. As event-driven architectures become more common, teams need to rethink how pipelines react to data changes, rather than schedules.In this episode, Andrea Bombino, Co-Founder and Head of Analytics Engineering at Astrafy, joins us to discuss how event-driven scheduling in Airflow is evolving and how Astrafy applies it to deliver faster, more responsive data pipelines.Key Takeaways:00:00 Introduction.02:02 Astrafy’s role in guiding clients across the modern data stack.03:15 Strong DAG dependencies create challenges for time-based scheduling.04:48 Event-driven pipelines respond to increasing real-time data demands.05:30 Airflow 3 introduces native support for event-driven orchestration.06:27 Sensor-based workflows reveal scalability and efficiency limitations.11:32 Event-driven assets improve efficiency and pipeline elegance.14:45 Governance and cross-instance coordination emerge as ongoing challenges.Resources Mentioned:Andrea Bombinohttps://www.linkedin.com/in/andrea-bombino/Astrafy \| LinkedInhttps://www.linkedin.com/company/astrafy/Astrafy \| Websitehttps://www.astrafy.ioApache Airflowhttps://airflow.apache.org/Google Cloudhttps://cloud.google.com/Google Pub/Subhttps://cloud.google.com/pubsubGoogle BigQueryhttps://cloud.google.com/bigqueryThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow						—
2/5/26	Uphold’s Approach to Orchestrating Modern Data Workflows with Jaime Oliveira	A strong data-driven mindset underpins how fintech teams scale analytics, infrastructure and decision-making across the business.In this episode, Jaime Oliveira, Lead Data Engineer at Uphold, joins us to discuss how Uphold structures its data organization and orchestration strategy. Jaime shares how the team uses Airflow and dbt to support analytics, reporting and data activation while evolving their approach as the stack grows.Key Takeaways:00:00 Introduction.01:23 A data-driven mindset supports product development and business decisions.02:55 Diverse ingestion pipelines enable scalable analytics.04:18 A single orchestration platform simplifies analytics workflows.05:17 Early experience with orchestration tools shapes engineering practices.08:16 Analytics orchestration works best when aligned with transformation workflows.09:25 Infrastructure choices involve tradeoffs in testing, visibility and overhead.16:39 More collaborative workflow tools could improve accessibility and autonomy.Resources Mentioned:Jaime Oliveirahttps://www.linkedin.com/in/jaime-oliveira-b075855a/Uphold \| LinkedInhttps://www.linkedin.com/company/upholdinc/Uphold \| Websitehttps://uphold.comApache Airflowhttps://airflow.apache.orgdbthttps://www.getdbt.comSnowflakehttps://www.snowflake.comKuberneteshttps://kubernetes.ioAstronomer Cosmoshttps://astronomer.github.io/astronomer-cosmosCosmos e-bookhttps://www.astronomer.io/ebooks/orchestrating-dbt-with-airflow-using-cosmos/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow						—
1/29/26	Modern Airflow Best Practices for Scalable Data Pipelines with Bhavani Ravi	Building reliable data pipelines at scale requires more than writing code. It depends on thoughtful design, infrastructure trade-offs and an understanding of how orchestration platforms evolve over time.In this episode, Airflow best practices shaped by real-world implementation are examined. Bhavani Ravi, Independent Software Consultant and Apache Airflow Champion, shares lessons on pipeline design, architectural decisions and the evolution of the Airflow ecosystem in modern data environments.Key Takeaways:00:00 Introduction.01:30 Independent consulting supports effective Airflow adoption.02:38 Early challenges shaped modern Airflow practices.03:21 Airflow setup has become significantly simpler.04:30 New features expanded workflow capabilities.06:03 Frequent releases support long-term sustainability.07:34 Community and providers strengthen the ecosystem.10:03 Pipeline design should come before coding.10:55 Decoupling logic requires careful trade-offs.13:30 Plugins extend Airflow into new use cases.Resources Mentioned:Bhavani Ravihttps://www.linkedin.com/in/bhavanicodes/Apache Airflowhttps://airflow.apache.org/Kuberneteshttps://kubernetes.io/Azure Fabrichttps://learn.microsoft.com/en-us/fabric/Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow						—
1/22/26	Inside Conviva’s Decision To Power Its Data Platform With Airflow with Han Zhang	Conviva operates at a massive scale, delivering outcome-based intelligence for digital businesses through real-time and batch data processing. As new use cases emerged, the team needed a way to extend a streaming-first architecture without rebuilding core systems.In this episode, Han Zhang joins us to explain how Conviva uses Apache Airflow as the orchestration backbone for its batch workloads, how the control plane is designed and what trade-offs shaped their platform decisions.Key Takeaways:00:00 Introduction.01:17 Large-scale data platforms require low-latency processing capabilities.02:08 Batch workloads can complement streaming pipelines for additional use cases.03:45 An orchestration framework can act as the core coordination layer.06:12 Batch processing enables workloads that streaming alone cannot support.08:50 Ecosystem maturity and observability are key orchestration considerations.10:15 Built-in run history and logs make failures easier to diagnose.14:20 Platform users can monitor workflows without managing orchestration logic.17:08 Identity, secrets and scheduling present ongoing optimization challenges.19:59 Configuration history and change visibility improve operational reliability.Resources Mentioned:Han Zhanghttps://www.linkedin.com/in/zhanghan177Conviva \| Websitehttp://www.conviva.comApache Airflowhttps://airflow.apache.org/Celeryhttps://docs.celeryq.dev/Temporalhttps://temporal.io/Kuberneteshttps://kubernetes.io/LDAPhttps://ldap.com/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow						—
1/15/26	Why Airflow Became the Scheduling Backbone at Condé Nast Technology Lab with Arun Karthik	Data platforms are moving from batch-first pipelines to near real-time systems where orchestration, observability, scalability and governance all have to work together.In this episode, Arun Karthik, Director, Data Solutions Engineering at Condé Nast Technology Lab, joins us to share how data engineering evolves from relational databases and ETL into distributed processing, modern orchestration with Apache Airflow and managed Airflow with Astronomer.Key Takeaways:00:00 Introduction.02:13 Early data systems rely heavily on relational databases and batch-oriented processing models.07:01 Scheduling requirements evolve beyond fixed time windows as dependencies increase.10:14 Ease of use and developer experience influence adoption of orchestration frameworks.13:22 Operating open source orchestration tools requires ongoing engineering effort.14:45 Managed services help teams reduce infrastructure and maintenance responsibilities.17:27 Observability improves confidence in pipeline execution and system health.19:12 Governance considerations grow in importance as data platforms mature.20:46 Building data systems requires balancing speed, reliability and long-term sustainability.Resources Mentioned:Arun Karthikhttps://www.linkedin.com/in/earunkarthik/Condé Nast Technology Lab \| LinkedInhttps://www.linkedin.com/company/conde-nast-technology-lab/Condé Nast Technology Lab \| Websitehttps://www.condenast.com/Apache Airflowhttps://airflow.apache.org/Astronomerhttps://www.astronomer.io/Apache Sparkhttps://spark.apache.org/Apache Hadoophttps://hadoop.apache.org/Jenkinshttps://www.jenkins.io/dbt Labshttps://www.getdbt.com/product/what-is-dbtAmazon Web Serviceshttps://aws.amazon.com/free/?trk=54026797-7540-48d8-9f6b-0db2c3a0040c&sc_channel=ps&trk=54026797-7540-48d8-9f6b-0db2c3a0040c&sc_channel=ps&ef_id=CjwKCAiAmp3LBhAkEiwAJM2JUKIc3E2I-hDlF6fRWgZn5n2-RWX-kEDAVApJYd88wwlsiyosV71VixoCmRoQAvD_BwE:G:s&s_kwcid=AL!4422!3!785574063524!e!!g!!amazon%20web%20services!23291338728!189486861095&gad_campaignid=23291338728&gbraid=0AAAAADjHtp813XNbg7azDj5QMwJPbGNqZ&gclid=CjwKCAiAmp3LBhAkEiwAJM2JUKIc3E2I-hDlF6fRWgZn5n2-RWX-kEDAVApJYd88wwlsiyosV71VixoCmRoQAvD_BwEThanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow						—
12/11/25	The Role of Airflow in Building Smarter ML Pipelines at Vivian Health with Max Calehuff	The integration of data orchestration and machine learning is critical to operational efficiency in healthcare tech. Vivian Health leverages Airflow to power both its ETL pipelines and ML workflows while maintaining strict compliance standards.Max Calehuff, Lead Data Engineer at Vivian Health, joins us to discuss how his team uses Airflow for ML ops, regulatory compliance and large-scale data orchestration. He also shares insights into upgrading to Airflow 3 and the importance of balancing flexibility with security in a healthcare environment.Key Takeaways:00:00 Introduction.04:21 The role of Airflow in managing ETL pipelines and ML retraining.06:23 Using AWS SageMaker for ML training and deployment.07:47 Why Airflow’s versatility makes it ideal for MLOps.10:50 The importance of documentation and best practices for engineering teams.13:44 Automating anonymization of user data for compliance.15:30 The benefits of remote execution in Airflow 3 for regulated industries.18:16 Quality-of-life improvements and desired features in future Airflow versions.Resources Mentioned:Max Calehuffhttps://www.linkedin.com/in/maxwell-calehuff/Vivian Health \| LinkedInhttps://www.linkedin.com/company/vivianhealth/Vivian Health \| Websitehttps://www.vivian.comApache Airflowhttps://airflow.apache.org/Astronomerhttps://www.astronomer.io/AWS SageMakerhttps://www.google.com/aclk?sa=L&ai=DChsSEwj3-fbz1tiQAxWXlKYDHXUBBVoYACICCAEQABoCdGI&ae=2&aspm=1&co=1&ase=2&gclid=Cj0KCQiA5abIBhCaARIsAM3-zFWbfj2olUvX4dqoiYNaE3q2fMf_ZifRjmbKNQCVX7D6ZMClaUXUkFkaAuwmEALw_wcB&cid=CAASQuRoMccxWhBvMq-1Uez3XOZti1ul7mTDotKvSMoDHv0q2xCsyS2FzMptO5dJf3tmfkLRu22TtD8ChTmdjvs6YetTjQ&cce=2&category=acrcp_v1_35&sig=AOD64_2xE2xolEEVbpDb56qXQluxTzs-Aw&q&nis=4&adurl&ved=2ahUKEwj7le3z1tiQAxWXcvUHHfZePbAQ0Qx6BAgUEAEdbtLabshttps://www.getdbt.com/Cosmoshttps://github.com/astronomer/astronomer-cosmosSplithttps://www.split.io/Snowflakehttps://www.snowflake.com/en/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow						—