
Insights from recent episode analysis
Audience Interest
Podcast Focus
Publishing Consistency
Platform Reach
Insights are generated by CastFox AI using publicly available data, episode content, and proprietary models.
Total monthly reach
Estimated from 7 chart positions in 7 markets.
By chart position
- 🇬🇧GB · Technology#1345K to 30K
- 🇮🇳IN · Technology#1771K to 10K
- 🇬🇷GR · Technology#5010K to 30K
- 🇭🇺HU · Technology#953K to 10K
- 🇳🇿NZ · Technology#139500 to 3K
- Per-Episode Audience
Est. listeners per new episode within ~30 days
10K to 45K🎙 Weekly cadence·51 episodes·Last published today - Monthly Reach
Unique listeners across all episodes (30 days)
21K to 89K🇬🇧34%🇬🇷34%🇮🇳11%+4 more - Active Followers
Loyal subscribers who consistently listen
8.2K to 36K
Market Insights
Platform Distribution
Reach across major podcast platforms, updated hourly
Total Followers
—
Total Plays
—
Total Reviews
—
* Data sourced directly from platform APIs and aggregated hourly across all major podcast directories.
On the show
Recent episodes
Matt Zelesko and the Future of SRE
May 26, 2026
Unknown duration
Handling Burnout with Sam Anderson
May 21, 2026
Unknown duration
The One with Crisis Engineering and Mikey Dickerson
May 15, 2026
Unknown duration
This is Fine! With Colette Alexander and Clint Byrum
May 12, 2026
Unknown duration
The One With Damion Yates and Building AI systems
Feb 26, 2026
Unknown duration
Social Links & Contact
Official channels & resources
Official Website
Login
RSS Feed
Login
| Date | Episode | Description | Length | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 5/26/26 | ![]() Matt Zelesko and the Future of SRE | We sit down with Matt Zelesko, VP of SRE at Google, for a candid talk about how AI is changing SRE — and how it's not. | — | ||||||
| 5/21/26 | ![]() Handling Burnout with Sam Anderson | Sam Anderson shares his experiences with burnout, and how to support yourself as a reliable system. Sam provides guidance on how to deal with burnout, and some suggestions on how to avoid burnout through understanding yourself and finding the help and support you need. | — | ||||||
| 5/15/26 | ![]() The One with Crisis Engineering and Mikey Dickerson | Crisis Engineer Mikey Dickerson joins us to talk about what constitutes a crisis. Mikey draws on his broad experience across industry and the public sector, as well as on work with his team of systems fixers. | — | ||||||
| 5/12/26 | ![]() This is Fine! With Colette Alexander and Clint Byrum | What's happening in the world of SRE and resilience engineering? Join us as we catch up with fellow podcast hosts Colette Alexander and Clint Byrum of the This Is Fine! podcast at SREcon in Seattle. | — | ||||||
| 2/26/26 | ![]() The One With Damion Yates and Building AI systems | How do you introduce Site Reliability Engineering to an AI research lab, bringing concepts of scale to engineers who are at the leading edge of AI systems? In the latest episode of The Prodcast, hosts Steve McGhee and Florian Rathgeber chat with Damion Yates, who helped establish the reliability engineering culture at Google DeepMind. Damion shares his journey of bringing scalable infrastructure to DeepMind, supporting massive machine learning experiments. Discover the unique challenges of supporting AI research, such as managing highly expensive "lockstep" training models where a single machine failure halts the entire process. Damion also explains why he believes "luck is our enemy" in systems engineering, and why protecting a research scientist's time is the ultimate metric for success. | — | ||||||
| 2/11/26 | ![]() The One With Carla Geisser and Crisis Engineering | Join us for a discussion with Carla Geisser of Layer Aleph, a company focused on "crisis engineering". Carla distinguishes a crisis from a standard incident by noting that a crisis is novel and lacks a playbook. She outlines five criteria for a true crisis: fundamental surprise, broken critical functions, high visibility, a rigid deadline (unlike internal tech deadlines), and perception breakdown. Crises often arise in organizations that struggle to admit computers control core decisions, leading to complex, glued-together systems. Carla emphasizes that SRE-adjacent skills are essential for connecting the dots and exposing the full system. The key takeaway for SREs is to recognize when a true crisis is happening, as leadership will only be willing to "break rules" and enable substantive change once three of these criteria are met.1 | — | ||||||
| 2/5/26 | ![]() The One with Parker Barnes, Felipe Tiengo Ferreira, and AI | This episode of the Prodcast tackles the challenges of maintaining AI safety and alignment in production. Guests Felipe Tiengo Ferreira and Parker Barnes join hosts Matt Siegler and Steve McGhee to discuss AI model safety, from examining content to emerging security risks. The discussion emphasizes the vital role of SREs in managing safety at scale, detailing multi-layered defenses, including system instructions, LLM classifiers, and Automated Red Teaming (ART). Felipe and Parker dive into the evolving world of AI safety, from core product policies to the groundbreaking Frontier Safety Framework. The guests explore the need for SRE principles like drift detection and context observability. Finally, they raise concerns about the velocity of AI development compressing long-term research, urging the industry to collaborate and share vocabulary to address rapidly emerging risks. | — | ||||||
| 1/28/26 | ![]() The One With Shannon Brady and Operating Systems | In this episode of the Prodcast, guest Shannon Brady speaks with hosts Jordan Greenberg and Florian Rathgeber about managing Google's vast fleet of internal devices. Shannon explains how Google's Linux platform uses core SRE principles—specifically testing, canarying, and monitoring—for weekly stage rollouts of its Debian-based distribution. Configuration is efficiently managed using Puppet to ensure the right setup for a diverse user base. The conversation pivots to "the year of Linux everything," underscoring its widespread adoption. Discussing AI, Shannon identifies its greatest utility for SREs in rapidly analyzing signals and generating complex queries to resolve outages. This episode reinforces that practicing SRE fundamentals is paramount, demonstrating that you can be an SRE at heart, regardless of your official title. | — | ||||||
| 1/21/26 | ![]() The One With Denia Del Cid and AI | Curious about the real impact of AI on Site Reliability Engineering? In this episode of The Prodcast, Google SRE Denia del Cid breaks down how her team is leveraging AI to transform production workflows. Denia details practical applications like early outage detection, incident similarity analysis, and toil reduction. She explains the critical importance of validating against "golden data sets" and keeping humans in the loop to build trust. Discover how SREs are evolving from skepticism to strategic adoption with Gemini. Tune in for a pragmatic, measured look at the future of reliability. | — | ||||||
| 1/14/26 | ![]() The One With Heather Adkins and Security (and AI) | Join us on The Prodcast as we host Heather Adkins, leader of Google's Office of Cybersecurity Resilience, for a critical look at the future of digital defenses. We explore the intersection of SRE and security , unpacking the "Secure by Design" philosophy and the shared DNA of incident management. Heather candidly discusses the rise of "Agentic AI hackers" and polymorphic malware , revealing how defenders can use AI to stay ahead. From "castle" defense strategies to "nodal biology" theories, this episode is a must-listen for anyone navigating the new era of AI-driven threats. | — | ||||||
Want analysis for the episodes below?Free for Pro Submit a request, we'll have your selected episodes analyzed within an hour. Free, at no cost to you, for Pro users. | |||||||||
| 1/7/26 | ![]() The One With SLOs | In this episode, we welcome Alex Hidalgo and Brian Singer of nobl9 to discuss Service Level Objectives (SLOs). Alex and Brian talk about how SLOs can establish a vernacular across industry verticals, leading to constructive conversations and a shared understanding of how to implement SRE practices. Join us for a lively discussion that ranges across SLO topics! | — | ||||||
| 12/16/25 | ![]() The One With Steph Hippo and Observability | In this episode, Steph Hippo, Platform Engineering Director at Honeycomb, joins The Prodcast to discuss AI and SRE. Steph explains how observability helps us understand complex systems from their outputs, and provides a foundation for SRE to respond to system problems. This episode explains how AI and observability build a self-reinforcing loop. We also discuss how AI can detect and respond to certain classes of incidents, leading to self-healing systems and allowing SREs to focus on novel and interesting problems. She advises small businesses adopting AI to learn from others' mistakes (post-mortems) and to commit time and budget to experimentation. | — | ||||||
| 7/30/25 | ![]() The One with Ben Good and Our Kubernetes Friends | In this special episode hosts Steve McGhee from the Google SRE Prodcast and Kaslin Fields from the Google Kubernetes Podcast, welcome Google Cloud Solutions Architect Ben Good to discuss platform engineering. Listeners can look forward to hearing about the role of Kubernetes as a tool for building platforms, how to create "golden paths" for developers, and the importance of observability and self-service in platform design. The conversation also touches on industry trends, the bespoke nature of platforms, and how DORA metrics can be applied to platform engineering practices. | — | ||||||
| 7/23/25 | ![]() The One With AI Agents, Ramón Llamas, and Swapnil Haria | Google Staff SRE Ramón Llamas and Google Software Engineer Swapnil Haria join our hosts to explore how AI agents are revolutionizing production management, from summarizing alerts and finding hidden errors to proactively preventing outages. Learn about the challenges of evaluating non-deterministic systems and the fascinating interplay between human expertise and emerging AI capabilities in ensuring robust and reliable infrastructure. | — | ||||||
| 7/16/25 | ![]() The One with Technical Program Managers and Karanveer Anand | This episode features Google Technical Program Manager (TPM) Karanveer Anand, who joins our hosts to discuss the unique role of TPMs in Site Reliability Engineering (SRE). The conversation highlights how SRE TPMs bridge the gap between technical details and business impact, managing complex projects with inter-team dependencies and ensuring system reliability, particularly in the rapidly evolving AI landscape. | — | ||||||
| 7/2/25 | ![]() The One with STPA, Jeffrey Snover, and Theo Klein | This episode discusses Systems Theoretic Process Analysis (STPA), a method for analyzing complex systems. Theo Klein, a Google SRE, and Jeffrey Snover, a Distinguished Engineer at Google, explain that STPA focuses on identifying how system accidents and losses occur due to a loss of control, rather than component failures. STPA helps identify design flaws early, even before code is written! The discussion highlights that STPA is a human-driven process, prompting critical questions about system goals and potential losses, and that Google is adapting the pure STPA approach for commercial software development to make it more practical and efficient. | — | ||||||
| 6/25/25 | ![]() The One with Startups and Adam Fletcher | In this episode, hosts Steve McGhee and Matt Siegler are joined by guest, Adam Fletcher, CEO and Co-Founder of MarketStreet. They discuss the current state of web development with LLMs, managing technical debt in startups, the evolution of infrastructure and reliability engineering, the role of community in technology, and the future of software engineering with AI. | — | ||||||
| 6/18/25 | ![]() The One with SLOs and Sal Furino | In this episode, Sal Furino, Customer Reliability Engineer at Bloomberg, discusses all things Service Level Objectives (SLOs) with hosts Steve McGhee and Matt Siegler. Together, they dig into what successful SLOs look like, how it relates to users, and how SLOs provide an effective framework for joint decisions about system reliability across product, engineering, and leadership teams. | — | ||||||
| 6/11/25 | ![]() The One With the Future of SRE and Matt Zelesko | Matt Zelesko, the head of Site Reliability Engineering at Google, discusses the evolution of SRE, highlighting the shift from traditional operations to a model that balances velocity and reliability to better serve the rapid advancements in AI and ML. He emphasizes that SRE's core mission is to enable partners to move quickly while meeting reliability goals, and that the sheer scale of Google's infrastructure necessitates the SRE model for cross-system problem-solving. Zelesko envisions AI as a crucial assistant for SREs, improving incident detection, mitigation, and postmortem processes, and allowing SREs to focus on more complex engineering challenges and risk management earlier in the development cycle, while still valuing the hands-on experience of operating production infrastructure. | — | ||||||
| 6/4/25 | ![]() The One with AI and Todd Underwood | In this Google Prodcast episode, Todd Underwood, a reliability expert from Anthropic with experience at Google and OpenAI, discusses the current state and future of AI in SRE. Todd and the hosts focus on the current state and future of AI and ML in production, particularly for SREs. Topics discussed include the challenges of AI-Ops, limitations of current anomaly detection, the potential for AI in config authoring and troubleshooting, trade-offs between product velocity and reliability, the evolving role of SREs in an AI-driven world, and book publication for optimal timing. | — | ||||||
| 5/28/25 | ![]() The One With Data Centers and Peter Pellerzi | This episode features guest, Peter Pellerzi (Distinguished Engineer, Google). Peter and the hosts, Matt Siegler and Steve McGhee, focus on the physical infrastructure side of SRE, discussing topics such as the scale of Google's data centers, handling incidents like power outages, testing and preparedness strategies, the use of AI for optimizing cooling plants, and more. Peter also emphasizes the importance of community support, proactive planning, and learning from real-world testing and incidents to ensure high availability and resilience in data center operations. | — | ||||||
| 5/21/25 | ![]() The One With Security and Jessica Theodat | Jessica Theodat (Senior SRE & Security Tech Lead, Google) joins hosts Jordan Greenberg and Steve McGhee to discuss the intersection of security and site reliability engineering at Google. Jessica touches on risk management, the unique nature of security incident responses, and the shared goals between security and SRE. The crew also delves into the balance between security and SRE, acknowledging the tension and the need for collaboration between teams to achieve business goals and user trust. | — | ||||||
| 4/16/25 | ![]() We're back with Season 4! | In this "bumpisode", hosts and producers of Prodcast (including our new co-host, Matt Siegler!) reflect on the previous season and introduce the new season's focus on upcoming trends in Site Reliability Engineering (SRE) and AI, and the friends we make along the way. They also introduce new elements we are bringing in with Season 4, such as a video format and a feedback form. | — | ||||||
| 1/29/25 | ![]() Special Episode: You Missed a Page from Telebot | This episode features Javi Beltran, a Google engineering lead who created the "Telebot" theme song. With our beloved hosts, Steve McGhee and Jordan Greenberg, Beltran discusses the origins of the song, created in 2012 for Google's paging system. The song was meant to add a touch of levity to what could be a stressful situation for engineers on-call. Beltran also unveils a new, more modern remix of "Telebot" (created in collaboration with our host, Jordan Greenberg!) which will be used as the intro theme for the podcast's next season. | — | ||||||
| 12/11/24 | ![]() Imperative vs. Declarative Change Workflows with Dominic Hutton & Niccolo' Cascarano | In this episode of the Prodcast, guests Dominic Hutton (Staff SRE, HashiCorp) and Niccolo' Cascarano (Senior Staff SRE at Google) join hosts Steve McGhee and Jordan Greenberg to dive into configurations. They discuss the differences between imperative and declarative configuration, explore the benefits and challenges of each approach, and the need for careful consideration when choosing between the two. Ultimately, the goal is to achieve reliable and maintainable systems through effective configuration management. | — | ||||||
Showing 25 of 55
Sponsor Intelligence
Sign in to see which brands sponsor this podcast, their ad offers, and promo codes.
Chart Positions
7 placements across 7 markets.
Chart Positions
7 placements across 7 markets.























