
Insights from recent episode analysis
Audience Interest
Podcast Focus
Publishing Consistency
Platform Reach
Insights are generated by CastFox AI using publicly available data, episode content, and proprietary models.
Most discussed topics
Brands & references
Total monthly reach
Estimated from 2 chart positions in 2 markets.
By chart position
- 🇦🇺AU · Technology#1855K to 30K
- 🇲🇾MY · Technology#185500 to 3K
- Per-Episode Audience
Est. listeners per new episode within ~30 days
1.6K to 9.9K🎙 Daily cadence·74 episodes·Last published 2d ago - Monthly Reach
Unique listeners across all episodes (30 days)
5.5K to 33K🇦🇺91%🇲🇾9% - Active Followers
Loyal subscribers who consistently listen
2.2K to 13K
Market Insights
Platform Distribution
Reach across major podcast platforms, updated hourly
Total Followers
—
Total Plays
—
Total Reviews
—
* Data sourced directly from platform APIs and aggregated hourly across all major podcast directories.
On the show
From 10 epsHost
Recent guests
Recent episodes
The Future of Agentic Data Science
May 25, 2026
1h 04m 37s
Agent-Harness.ipynb*
May 20, 2026
1h 19m 46s
Agentic Engineering and the Lost Art of Verification
May 12, 2026
1h 32m 26s
Next Level AI Evals for 2026
Apr 23, 2026
53m 34s
Privacy Theater Is Not Privacy Engineering: What It Actually Takes to Ship Safe AI
Apr 15, 2026
1h 06m 31s
Social Links & Contact
Official channels & resources
Official Website
Login
RSS Feed
Login
| Date | Episode | Topics | Guests | Brands | Places | Keywords | Sponsor | Length | |
|---|---|---|---|---|---|---|---|---|---|
| 5/25/26 | ![]() The Future of Agentic Data Science | So I think we’re really at a historical moment, and the opportunity is massive. Almost 15 years ago, we were promised that data science was going to be this incredible thing and create all this value for people. And I think nowadays it’s mostly viewed as a cost center in most companies. I think we can really now fulfill that original promise with agentic data science. Thomas Wiecki, Co-creator of PyMC and Founder at PyMC Labs, joins Hugo to talk about how agentic data science is finally fulfilling the promise of Decision Intelligence.We Discuss:* Decision Engines: Transform data science from a cost center providing cryptic answers into a real-time decision intelligence hub delivering actionable outcomes;* Tame the “Garden of Forking Paths”: Overcome human shortcuts by running parallel analyses to provide an honesty check, revealing the true robustness of business conclusions;* Multiplayer Data Science: Foster organizational learning by moving agents into team chats, democratizing “what-if” questions and reducing context-switching friction;* The Full Agentic Data Science Stack: Beyond harness and skills, the full stack includes orchestration for parallel analyses and a causal eval layer to measure actual outcome improvement;* Agentic Dashboards: Move beyond static BI; use chat interfaces to inquire into models and generate real-time, custom visualizations for specific follow-up questions;* Encode Professional Judgment as Skills: Elevate agent performance by encoding expert domain standards and high-fidelity workflows into specific Agent Skills, rather than relying on LLM pre-training;* Ground Decisions in Generative Processes: Prevent hallucinations by forcing agents to model underlying physical or behavioral processes, providing a programmatic guardrail aligned with market realities;* Scripted Causal-Bayesian Workflows: Their methodologically structured nature—from prior elicitation to posterior predictive checks—makes Causal-Bayesian workflows inherently automatable for agents;* Iterative Autonomy via Skills: Achieve autonomy iteratively: verify workflows with human oversight, then encode verifiable parts as skills to hand off trusted tasks;You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here and use the code ADSVG10 for 10% off. Hit reply to enquire about group discounts.👈LINKS* Thomas Wiecki on LinkedIn* PyMC Labs* Open-Sourcing Decision Lab: Scaling AI Judgment in Data Science (PyMC Labs blog)* Decision AI Discord* Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results (Sage Journals)* The Agent Harness Reading List* Show Us Your Agent Skills (GitHub)* Agentic Data Science course with Hugo, Thomas, and Luca (10% off with code ADSVG10)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube👉Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here and use the code ADSVG10 for 10% off. Hit reply to enquire about group discounts👈 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 1h 04m 37s | ||||||
| 5/20/26 | ![]() Agent-Harness.ipynb* | One thing that I don’t like about Claude is that you get into this weird mental state: oh, I think I trust the model. Let’s do the slot machine. Hit click, which puts you in an inactive mode of thinking. Maybe it’s better to use a worse model….Vincent Warmerdam, senior data professional and prolific open-source maintainer (some packages with over a million downloads), now Engineer at marimo, joins Hugo to talk about how the Python notebook is evolving from a static scratchpad into a working agent harness, and what it takes to stay in the loop as a developer when agents are writing most of the code. This episode was originally a livestream Q&A with the Vanishing Gradients audience.We Discuss:* Shared Notebook Canvas: Notebooks act as a shared memory space where agents and humans co-exist, enabling real-time visual feedback by direct manipulation of global state and UI elements;* Speed-of-Thought Models: Faster, open-weight models like Kimi K2 enhance exploratory flow by keeping humans more alert to the code, unlike frontier models that can induce passive thinking;* Pi as a Harness: Vincent favors an agent harness where agents extend themselves rather than reach for MCP, and where hooks can rigidly constrain which files an agent is allowed to read or touch;* Why PRDs Don’t Fit Notebooks: Notebook work is fundamentally exploratory, so the discipline that works for shipping web apps does not transfer cleanly; the one exception is reproducing a paper;* Interactive Code Review: Interactive UIs (e.g., dragging integers) transform code into a physical object, incentivizing developers to actively review and understand agent logic;* Modular “Lego” Components: Provide agents with high-level, well-tested components (”Lego” code) instead of raw boilerplate, creating systems that are easier to debug and modulate;* Algorithm-Driven Visualization: Let the algorithm dictate the visualization needed, rather than choosing visualizations first, revealing the most interesting structures within the data;* Don’t Outsource the Thinking: Pen and paper architectural planning, walks away from the keyboard, and protecting calm remain the most effective ways to keep producing good ideas in the age of AI-generated software.* Agent Auto-Healing: A marimo-specific linter solved 60% of agent errors overnight by letting agents diagnose and fix their own “slop” without complex prompt engineering;* Incremental Generation: Avoid monolithic LLM outputs; generate code one to two cells at a time to prevent laziness and ensure human oversight and learning;Vincent closes on the idea that calm, not the latest frontier model, is the most underrated tool for building well, and that we should study LLM output the way chess players studied the engines that beat them.Vincent gives several live demos toward the end of the episode. He describes them well enough to follow on audio, but the visuals are worth seeing, so check out the YouTube version here.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here and use the code ADSVG10 for 10% off.👈Also join us for Ep. 3 of Show Us Your Agent Skills: with Vincent, Paul Iusztin (Decoding AI), Eleanor Berger (Elite AI-Assisted Coding), Alan Nichol (Rasa), Nico Gerold (amp), and Matthew Honnibal (spaCy, Explosion). Register on lu.ma to join live, or catch the recording afterwards.LINKS* Vincent Warmerdam on LinkedIn* Vincent’s website (koaning.io)* Wiggly Stuff — Vincent’s widget library* Marimo Gallery* skills.sh* Armin Ronacher on Pi (the minimal agent inside open claw)* Building Agents That Build Themselves — Hugo’s workshop write-up with Ivan Leo* Data Science Fiction: Winning at Metrics, Losing at AI Evals — Hugo’s blog post based on Vincent’s talk* Isaac Flath’s project (on X)* Braid (video game)* Hugo’s earlier podcast with Akshay (marimo)* Elite AI Assisted Coding — Eleanor Berger’s course (Vanishing Gradients community gets 25% off with code “HUGO”)* GameMakers Toolkit (YouTube)* Upcoming Events on Luma* Vanishing Gradients on YouTube* Come build the future of Agentic Data Science with us in our upcoming course (10% off) .How You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 1h 19m 46s | ||||||
| 5/12/26 | ![]() Agentic Engineering and the Lost Art of Verification | > I almost don’t read code now. My approach with Roborev is it’s like my code reader. The mantra is: Roborev reads every line of code that is generated. It gets read multiple times. And so, whenever I push up a pull request, the branch gets re-reviewed. And so by the time I’m merging a pull request into a repository, the code has all been read by agents four or five times minimum. I look at the code in terms of structural detail: does it look right?— Wes McKinney (creator of pandas, POSIT)Wes, Jeremiah Lowin (Prefect), and Randy Olson (Good Eye Labs) join Hugo and his cohost Thomas Wiecki (PyMC Labs) for the premiere of Show Us Your Agent Skills, a live session where guests walk us through the exact skills, workflows, and setups they use to work with agents every day.We Discuss:* Wes McKinney on why he barely writes, or even reads, code anymore, his “software factory” of parallel agents, and RoboRev, the background reviewer that reads every line four or five times before he merges;* The shift from “vibe coding” to agentic engineering, and why verification, not reading, is the part that actually matters;* Jeremiah Lowin on years of context engineering: trickling voice memos, recorded meetings, and morning briefs into his agent’s memory substrate as a true “second brain”;* Why Jeremiah picked OpenCode specifically for how deeply he can customize its memory, and what he’s building with FastMCP, Prefab, and Cardboard;* Randy Olson on encoding human judgment, like Tufte’s rules for data visualization, directly into agent skills, so the agents themselves perform the verification;* The “digital twin” Randy loads into his agents as a thought partner that pushes back instead of agreeing;* Skills as thin drivers, progressive disclosure, and managing context rot across extended sessions;* The rise of ephemeral, “just for me” software that agents finally make viable.Skills and workflows discussed and shown in the episode:* Wes’s RoboRev background code reviewer, his “software factory” dashboard, and his agentic engineering setup built on the Superpowers skills framework;* Jeremiah’s “explain” skill (which anchors every other skill he has), his voice memo memory pipeline, his FastMCP and Prefab projects, and Cardboard, his ephemeral presentation tool;* Randy’s data visualization verifier skills, his digital twin thought partner prompt, his cron job reports for colleagues, and his reflect and improve skill design pattern.Check out the GitHub repo where we’re starting to drop some of these skills and workflows for you to grab and try yourself.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!Up next on Show Us Your Agent Skills: Hilary Mason (CEO, HiddenDoor), Bryan Bischof (Theory Ventures), Eric Ma (Research DS lead, Moderna Therapeutics), and Tomasz Tunguz (Theory Ventures). Register on lu.ma to join live, or catch the recording afterwards.👉 Want to learn how to apply agentic engineering to the world of data science? Come build the future of Agentic Data Science with us in our upcoming course. It’s a live cohort with hands on exercises, capstones, and reusable agent skills, OSS code, and notebooks that will 10x your data science projects. Sign up here.👈LINKS* spicytakes.org, Wes McKinney’s website* RoboRev, Wes’s background code reviewer* Agents View, Wes’s agent session database* Middleman, Wes’s local GitHub dashboard* Superpowers, Jesse Vincent’s skills framework that Wes builds on* An Open Source Maintainer’s Guide to Saying No, by Jeremiah Lowin* FastMCP* Prefab, Jeremiah’s Python DSL for generative UIs* Beautiful Charts with AI, by Randy Olson* The Coding Agent is Dead, by Amp* Building Effective Agents, by the Anthropic team* Show Us Your Agent Skills, the GitHub repo where we are dropping skills and workflows from the show* Upcoming Events on Luma* Vanishing Gradients on YouTube* Watch the podcast video on YouTube* Come build the future of Agentic Data Science with us in our upcoming course.How You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 1h 32m 26s | ||||||
| 4/23/26 | ![]() Next Level AI Evals for 2026✨ | AI evaluationproduct development+4 | Stella Wenxing LiuEddie Landesberg | ASUGoogle | — | AI evalsproduct iteration+5 | — | 53m 34s | |
| 4/15/26 | ![]() Privacy Theater Is Not Privacy Engineering: What It Actually Takes to Ship Safe AI✨ | AI privacyprivacy engineering+4 | Katharine Jarmul | Practical Data Privacy | — | privacy theaterAI agents+6 | — | 1h 06m 31s | |
| 4/13/26 | ![]() LLM Architecture in 2026: What You Need to Know with Sebastian Raschka✨ | AI architecturelarge language models+4 | Sebastian Raschka | Build a Large Language Model from ScratchBuild a Reasoning Model from Scratch+2 | — | AI architecturelarge language models+3 | — | 1h 18m 02s | |
| 3/20/26 | ![]() Episode 72: Why Agents Solve the Wrong Problem (and What Data Scientists Do Instead)✨ | data scienceAI agents+4 | Bryan Bischof | Theory Ventures | — | data scienceAI agents+5 | — | 1h 33m 39s | |
| 2/18/26 | ![]() Episode 71: Durable Agents - How to Build AI Systems That Survive a Crash with Samuel Colvin✨ | AI engineeringdurability in AI+4 | Samuel Colvin | Pydantic AITemporal | — | AIdurability+5 | — | 51m 27s | |
| 2/12/26 | ![]() Episode 70: 1,400 Production AI Deployments✨ | AI deploymentsinfinite loops+5 | — | DoorDashELIOS | — | AIinfinite loop+7 | — | 1h 09m 52s | |
| 2/3/26 | ![]() Episode 69: Python is Dead. Long Live Python! With the Creators of pandas & Parquet✨ | agent ergonomicsAI-generated code+4 | Wes McKinneyMarcel Kornacker+1 | pandasPosit+1 | — | PythonGo+7 | — | 55m 27s | |
Want analysis for the episodes below?Free for Pro Submit a request, we'll have your selected episodes analyzed within an hour. Free, at no cost to you, for Pro users. | |||||||||
| 1/23/26 | ![]() Episode 68: A Builder’s Guide to Agentic Search & Retrieval with Doug Turnbull & John Berryman✨ | Agentic Searchinformation retrieval+4 | Doug TurnbullJohn Berryman | RedditShopify+4 | — | Agentic Searchinformation retrieval+5 | — | 1h 28m 42s | |
| 1/14/26 | ![]() Episode 67: Saving Hundreds of Hours of Dev Time with AI Agents That Learn✨ | AI-assisted codingcontinual learning+5 | Eleanor BergerIsaac Flaath | Elite AI Assisted Coding | — | AIcoding+6 | — | 1h 18m 22s | |
| 1/8/26 | ![]() Episode 66: The Agent Paradox - Why Moderna's Most Productive AI Systems Aren't Agents✨ | AI systemsworkflow+4 | Eric Ma | ModernaAnthropic | — | agentsworkflows+5 | — | 42m 58s | |
| 12/19/25 | ![]() Episode 65: The Rise of Agentic Search | We’re really moving from a world where humans are authoring search queries and humans are executing those queries and humans are digesting the results to a world where AI is doing that for us.Jeff Huber, CEO and co-founder of Chroma, joins Hugo to talk about how agentic search and retrieval are changing the very nature of search and software for builders and users alike.We Discuss:* “Context engineering”, the strategic design and engineering of what context gets fed to the LLM (data, tools, memory, and more), which is now essential for building reliable, agentic AI systems;* Why simply stuffing large context windows is no longer feasible due to “context rot” as AI applications become more goal-oriented and capable of multi-step tasks* A framework for precisely curating and providing only the most relevant, high-precision information to ensure accurate and dependable AI systems;* The “agent harness”, the collection of tools and capabilities an agent can access, and how to construct these advanced systems;* Emerging best practices for builders, including hybrid search as a robust default, creating “golden datasets” for evaluation, and leveraging sub-agents to break down complex tasks* The major unsolved challenge of agent evaluation, emphasizing a shift towards iterative, data-centric approaches.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈Oh! One more thing: we’ve just announced a Vanishing Gradients livestream for January 21 that you may dig:* A Builder’s Guide to Agentic Search & Retrieval with Doug Turnbull and John Berryman (register to join live or get the recording afterwards.Show notes* Jeff Huber on Twitter* Jeff Huber on LinkedIn* Try Chroma!* Context Rot: How Increasing Input Tokens Impacts LLM Performance by The Chroma Team* AI Agent Harness, 3 Principles for Context Engineering, and the Bitter Lesson Revisited* From Context Engineering to AI Agent Harnesses: The New Software Discipline* Generative Benchmarking by The Chroma Team* Effective context engineering for AI agents by The Anthropic Team* Making Sense of Millions of Conversations for AI Agents by Ivan Leo (Manus) and Hugo* How we built our multi-agent research system by The Anthropic Team* Upcoming Events on Luma* Watch the podcast video on YouTube👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 51m 53s | ||||||
| 12/3/25 | ![]() Episode 64: Data Science Meets Agentic AI with Michael Kennedy (Talk Python) | We have been sold a story of complexity. Michael Kennedy (Talk Python) argues we can escape this by relentlessly focusing on the problem at hand, reducing costs by orders of magnitude in software, data, and AI.In this episode, Michael joins Hugo to dig into the practical side of running Python systems at scale. They connect these ideas to the data science workflow, exploring which software engineering practices allow AI teams to ship faster and with more confidence. They also detail how to deploy systems without unnecessary complexity and how Agentic AI is fundamentally reshaping development workflows.We talk through:- Escaping complexity hell to reduce costs and gain autonomy- The specific software practices, like the "Docker Barrier", that matter most for data scientists- How to replace complex cloud services with a simple, robust $30/month stack- The shift from writing code to "systems thinking" in the age of Agentic AI- How to manage the people-pleasing psychology of AI agents to prevent broken code- Why struggle is still essential for learning, even when AI can do the work for youLINKSTalk Python In Production, the Book! (https://talkpython.fm/books/python-in-production)Just Enough Python for Data Scientists Course (https://training.talkpython.fm/courses/just-enough-python-for-data-scientists)Agentic AI Programming for Python Course (https://training.talkpython.fm/courses/agentic-ai-programming-for-python)Talk Python To Me (https://talkpython.fm/) and a recent episode with Hugo as guest: Building Data Science with Foundation LLM Models (https://talkpython.fm/episodes/show/526/building-data-science-with-foundation-llm-models)Python Bytes podcast (https://pythonbytes.fm/)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtube.com/live/jfSRxxO3aRo?feature=share)Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (35% off for listeners) (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 1h 02m 56s | ||||||
| 11/22/25 | ![]() Episode 63: Why Gemini 3 Will Change How You Build AI Agents with Ravin Kumar (Google DeepMind) | Gemini 3 is a few days old and the massive leap in performance and model reasoning has big implications for builders: as models begin to self-heal, builders are literally tearing out the functionality they built just months ago... ripping out the defensive coding and reshipping their agent harnesses entirely.Ravin Kumar (Google DeepMind) joins Hugo to breaks down exactly why the rapid evolution of models like Gemini 3 is changing how we build software. They detail the shift from simple tool calling to building reliable "Agent Harnesses", explore the architectural tradeoffs between deterministic workflows and high-agency systems, the nuance of preventing context rot in massive windows, and why proper evaluation infrastructure is the only way to manage the chaos of autonomous loops.They talk through:- The implications of models that can "self-heal" and fix their own code- The two cultures of agents: LLM workflows with a few tools versus when you should unleash high-agency, autonomous systems.- Inside NotebookLM: moving from prototypes to viral production features like Audio Overviews- Why Needle in a Haystack benchmarks often fail to predict real-world performance- How to build agent harnesses that turn model capabilities into product velocity- The shift from measuring latency to managing time-to-compute for reasoning tasksLINKSFrom Context Engineering to AI Agent Harnesses: The New Software Discipline, a podcast Hugo did with Lance Martin, LangChain (https://high-signal.delphina.ai/episode/context-engineering-to-ai-agent-harnesses-the-new-software-discipline)Context Rot: How Increasing Input Tokens Impacts LLM Performance (https://research.trychroma.com/context-rot)Effective context engineering for AI agents by Anthropic (https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/CloimQsQuJM)Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 1h 00m 13s | ||||||
| 10/31/25 | ![]() Episode 62: Practical AI at Work: How Execs and Developers Can Actually Use LLMs | Many leaders are trapped between chasing ambitious, ill-defined AI projects and the paralysis of not knowing where to start. Dr. Randall Olson argues that the real opportunity isn't in moonshots, but in the "trillions of dollars of business value" available right now. As co-founder of Wyrd Studios, he bridges the gap between data science, AI engineering, and executive strategy to deliver a practical framework for execution.In this episode, Randy and Hugo lay out how to find and solve what might be considered "boring but valuable" problems, like an EdTech company automating 20% of its support tickets with a simple retrieval bot instead of a complex AI tutor. They discuss how to move incrementally along the "agentic spectrum" and why treating AI evaluation with the same rigor as software engineering is non-negotiable for building a disciplined, high-impact AI strategy.They talk through:How a non-technical leader can prototype a complex insurance claim classifier using just photos and a ChatGPT subscription.The agentic spectrum: Why you should start by automating meeting summaries before attempting to build fully autonomous agents.The practical first step for any executive: Building a personal knowledge base with meeting transcripts and strategy docs to get tailored AI advice.Why treating AI evaluation with the same rigor as unit testing is essential for shipping reliable products.The organizational shift required to unlock long-term AI gains, even if it means a short-term productivity dip.LINKSRandy on LinkedIn (https://www.zenml.io/llmops-database)Wyrd Studios (https://thewyrdstudios.com/)Stop Building AI Agents (https://www.decodingai.com/p/stop-building-ai-agents)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)🎓 Learn more:In Hugo's course: Building AI Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us! Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 59m 04s | ||||||
| 10/16/25 | ![]() Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in Production | Most AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI.Drawing from his work at Zen ML, Alex details why success requires scaling down and enforcing MLOps discipline to navigate the unpredictable "Agent Reliability Cliff". He provides the essential architectural shifts, evaluation hygiene techniques, and practical steps needed to move beyond guesswork and build scalable, trustworthy AI products.We talk through:- Why "shoving a thousand agents" into an app is the fastest route to unmanageable chaos- The essential MLOps hygiene (tracing and continuous evals) that most teams skip- The optimal (and very low) limit for the number of tools an agent can reliably use- How to use human-in-the-loop strategies to manage the risk of autonomous failure in high-sensitivity domains- The principle of using simple Python/RegEx before resorting to costly LLM judgesLINKSThe LLMOps Database: 925 entries as of today....submit a use case to help it get to 1K! (https://www.zenml.io/llmops-database)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)🎓 Learn more:-This was a guest Q&A from Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us! Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 28m 04s | ||||||
| 9/30/25 | ![]() Episode 60: 10 Things I Hate About AI Evals with Hamel Husain | Most AI teams find "evals" frustrating, but ML Engineer Hamel Husain argues they’re just using the wrong playbook. In this episode, he lays out a data-centric approach to systematically measure and improve AI, turning unreliable prototypes into robust, production-ready systems.Drawing from his experience getting countless teams unstuck, Hamel explains why the solution requires a "revenge of the data scientists." He details the essential mindset shifts, error analysis techniques, and practical steps needed to move beyond guesswork and build AI products you can actually trust.We talk through: The 10(+1) critical mistakes that cause teams to waste time on evals Why "hallucination scores" are a waste of time (and what to measure instead) The manual review process that finds major issues in hours, not weeks A step-by-step method for building LLM judges you can actually trust How to use domain experts without getting stuck in endless review committees Guest Bryan Bischof's "Failure as a Funnel" for debugging complex AI agentsIf you're tired of ambiguous "vibe checks" and want a clear process that delivers real improvement, this episode provides the definitive roadmap.LINKSHamel's website and blog (https://hamel.dev/)Hugo speaks with Philip Carter (Honeycomb) about aligning your LLM-as-a-judge with your domain expertise (https://vanishinggradients.fireside.fm/51)Hamel Husain on Lenny's pocast, which includes a live demo of error analysis (https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill)The episode of VG in which Hamel and Hugo talk about Hamel's "data consulting in Vegas" era (https://vanishinggradients.fireside.fm/9)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtube.com/live/QEk-XwrkqhI?feature=share)Hamel's AI evals course, which he teaches with Shreya Shankar (UC Berkeley): starts Oct 6 and this link gives 35% off! (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME) https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 1h 13m 16s | ||||||
| 9/23/25 | ![]() Episode 59: Patterns and Anti-Patterns For Building with AI | John Berryman (Arcturus Labs; early GitHub Copilot engineer; co-author of Relevant Search and Prompt Engineering for LLMs) has spent years figuring out what makes AI applications actually work in production. In this episode, he shares the “seven deadly sins” of LLM development — and the practical fixes that keep projects from stalling. From context management to retrieval debugging, John explains the patterns he’s seen succeed, the mistakes to avoid, and why it helps to think of an LLM as an “AI intern” rather than an all-knowing oracle. We talk through: - Why chasing perfect accuracy is a dead end - How to use agents without losing control - Context engineering: fitting the right information in the window - Starting simple instead of over-orchestrating - Separating retrieval from generation in RAG - Splitting complex extractions into smaller checks - Knowing when frameworks help — and when they slow you down A practical guide to avoiding the common traps of LLM development and building systems that actually hold up in production.LINKS:Context Engineering for AI Agents, a free, upcoming lightning lesson from John and Hugo (https://maven.com/p/4485aa/context-engineering-for-ai-agents)The Hidden Simplicity of GenAI Systems, a previous lightning lesson from John and Hugo (https://maven.com/p/a8195d/the-hidden-simplicity-of-gen-ai-systems)Roaming RAG – RAG without the Vector Database, by John (https://arcturus-labs.com/blog/2024/11/21/roaming-rag--rag-without-the-vector-database/)Cut the Chit-Chat with Artifacts, by John (https://arcturus-labs.com/blog/2024/11/11/cut-the-chit-chat-with-artifacts/)Prompt Engineering for LLMs by John and Albert Ziegler (https://amzn.to/4gChsFf)Relevant Search by John and Doug Turnbull (https://amzn.to/3TXmDHk)Arcturus Labs (https://arcturus-labs.com/)Watch the podcast on YouTube (https://youtu.be/mKTQGKIUq8M)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)🎓 Learn more:Hugo's course (this episode was a guest Q&A from the course): Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 47m 37s | ||||||
| 9/9/25 | ![]() Episode 58: Building GenAI Systems That Make Business Decisions with Thomas Wiecki (PyMC Labs) | While most conversations about generative AI focus on chatbots, Thomas Wiecki (PyMC Labs, PyMC) has been building systems that help companies make actual business decisions. In this episode, he shares how Bayesian modeling and synthetic consumers can be combined with LLMs to simulate customer reactions, guide marketing spend, and support strategy. Drawing from his work with Colgate and others, Thomas explains how to scale survey methods with AI, where agents fit into analytics workflows, and what it takes to make these systems reliable. We talk through: Using LLMs as “synthetic consumers” to simulate surveys and test product ideas How Bayesian modeling and causal graphs enable transparent, trustworthy decision-making Building closed-loop systems where AI generates and critiques ideas Guardrails for multi-agent workflows in marketing mix modeling Where generative AI breaks (and how to detect failure modes) The balance between useful models and “correct” models If you’ve ever wondered how to move from flashy prototypes to AI systems that actually inform business strategy, this episode shows what it takes. LINKS:The AI MMM Agent, An AI-Powered Shortcut to Bayesian Marketing Mix Insights (https://www.pymc-labs.com/blog-posts/the-ai-mmm-agent)AI-Powered Decision Making Under Uncertainty Workshop w/ Allen Downey & Chris Fonnesbeck (PyMC Labs) (https://youtube.com/live/2Auc57lxgeU)The Podcast livestream on YouTube (https://youtube.com/live/so4AzEbgSjw?feature=share)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 1h 00m 45s | ||||||
| 8/29/25 | ![]() Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank) | While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply. Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines. We talk through: - Treating LLM workflows as ETL pipelines for unstructured text - Error analysis: why you need humans reviewing the first 50–100 traces - Guardrails like retries, validators, and “gleaning” - How LLM judges work — rubrics, pairwise comparisons, and cost trade-offs - Cheap vs. expensive models: when to swap for savings - Where agents fit in (and where they don’t) If you’ve ever wondered how to move beyond unreliable demos, this episode shows how to scale LLMs to millions of documents — without breaking the bank.LINKSShreya's website (https://www.sh-reya.com/)DocETL, A system for LLM-powered data processing (https://www.docetl.org/)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/3r_Hsjy85nk)Shreya's AI evals course, which she teaches with Hamel "Evals" Husain (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 41m 28s | ||||||
| 8/14/25 | ![]() Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters | While much of the AI world chases ever-larger models, Ravin Kumar (Google DeepMind) and his team build across the size spectrum, from billions of parameters down to this week’s release: Gemma 270M, the smallest member yet of the Gemma 3 open-weight family. At just 270 million parameters, a quarter the size of Gemma 1B, it’s designed for speed, efficiency, and fine-tuning. We explore what makes 270M special, where it fits alongside its billion-parameter siblings, and why you might reach for it in production even if you think “small” means “just for experiments.” We talk through: - Where 270M fits into the Gemma 3 lineup — and why it exists - On-device use cases where latency, privacy, and efficiency matter - How smaller models open up rapid, targeted fine-tuning - Running multiple models in parallel without heavyweight hardware - Why “small” models might drive the next big wave of AI adoption If you’ve ever wondered what you’d do with a model this size (or how to squeeze the most out of it) this episode will show you how small can punch far above its weight.LINKSIntroducing Gemma 3 270M: The compact model for hyper-efficient AI (Google Developer Blog) (https://developers.googleblog.com/en/introducing-gemma-3-270m/)Full Model Fine-Tune Guide using Hugging Face Transformers (https://ai.google.dev/gemma/docs/core/huggingface_text_full_finetune)The Gemma 270M model on HuggingFace (https://huggingface.co/google/gemma-3-270m)The Gemma 270M model on Ollama (https://ollama.com/library/gemma3:270m)Building AI Agents with Gemma 3, a workshop with Ravin and Hugo (https://www.youtube.com/live/-IWstEStqok) (Code here (https://github.com/canyon289/ai_agent_basics))From Images to Agents: Building and Evaluating Multimodal AI Workflows, a workshop with Ravin and Hugo (https://www.youtube.com/live/FNlM7lSt8Uk)(Code here (https://github.com/canyon289/ai_image_agent))Evaluating AI Agents: From Demos to Dependability, an upcoming workshop with Ravin and Hugo (https://lu.ma/ezgny3dl)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/VZDw6C2A_8E)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 ($600 off early bird discount for November cohort availiable until August 16) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 45m 41s | ||||||
| 8/12/25 | ![]() Episode 55: From Frittatas to Production LLMs: Breakfast at SciPy | Traditional software expects 100% passing tests. In LLM-powered systems, that’s not just unrealistic — it’s a feature, not a bug. Eric Ma leads research data science in Moderna’s data science and AI group, and over breakfast at SciPy we explored why AI products break the old rules, what skills different personas bring (and miss), and how to keep systems alive after the launch hype fades. You’ll hear the clink of coffee cups, the murmur of SciPy in the background, and the occasional bite of frittata as we talk (hopefully also a feature, not a bug!)We talk through: • The three personas — and the blind spots each has when shipping AI systems • Why “perfect” tests can be a sign you’re testing the wrong thing • Development vs. production observability loops — and why you need both • How curiosity about failing data separates good builders from great ones • Ways large organizations can create space for experimentation without losing delivery focus If you want to build AI products that thrive in the messy real world, this episode will help you embrace the chaos — and make it work for you.LINKSEric' Website (https://ericmjl.github.io/)More about the workshops Eric and Hugo taught at SciPy (https://hugobowne.substack.com/p/stress-testing-llms-evaluation-frameworks)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 ($600 off early bird discount for November cohort availiable until August 16) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 38m 09s | ||||||
| 7/18/25 | ![]() Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference | Colab is cozy. But production won’t fit on a single GPU.Zach Mueller leads Accelerate at Hugging Face and spends his days helping people go from solo scripts to scalable systems. In this episode, he joins me to demystify distributed training and inference — not just for research labs, but for any ML engineer trying to ship real software.We talk through: • From Colab to clusters: why scaling isn’t just about training massive models, but serving agents, handling load, and speeding up iteration • Zero-to-two GPUs: how to get started without Kubernetes, Slurm, or a PhD in networking • Scaling tradeoffs: when to care about interconnects, which infra bottlenecks actually matter, and how to avoid chasing performance ghosts • The GPU middle class: strategies for training and serving on a shoestring, with just a few cards or modest credits • Local experiments, global impact: why learning distributed systems—even just a little—can set you apart as an engineerIf you’ve ever stared at a Hugging Face training script and wondered how to run it on something more than your laptop: this one’s for you.LINKSZach on LinkedIn (https://www.linkedin.com/in/zachary-mueller-135257118/)Hugo's blog post on Stop Buliding AI Agents (https://www.linkedin.com/posts/hugo-bowne-anderson-045939a5_yesterday-i-posted-about-stop-building-ai-activity-7346942036752613376-b8-t/)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/stop-building-agents)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338Zach's course (45% off for VG listeners!): Scratch to Scale: Large-Scale Training in the Modern World (https://maven.com/walk-with-code/scratch-to-scale?promoCode=hugo39) -- https://maven.com/walk-with-code/scratch-to-scale?promoCode=hugo39📺 Watch the video version on YouTube: YouTube link (https://youtube.com/live/76NAtzWZ25s?feature=share) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe | 41m 18s | ||||||
Showing 25 of 78
Sponsor Intelligence
Sign in to see which brands sponsor this podcast, their ad offers, and promo codes.
Chart Positions
2 placements across 2 markets.
Chart Positions
2 placements across 2 markets.

























