Vanishing Gradients

by Hugo Bowne-Anderson

A podcast for people who build with AI. Long-format conversations with people shaping the field about agents, evals, multimodal systems, data infrastructure, and the tools behind them. Guests include Jeremy Howard (fast.ai), Hamel Husain (Parlance Labs), Shreya Shankar (UC Berkeley), Wes McKinney (c...

Insights from recent episode analysis

Audience Interest

Estimated Reach: 1.5K to 13K

Listeners across platforms

Podcast Focus

Categories: technology · science

Publishing Consistency

Frequency: ~3-4 / Week

50+ episodes since 2022

Platform Reach

Insights are generated by CastFox AI using publicly available data, episode content, and proprietary models.

Most discussed topics

Brands & references

Generic platforms filtered out.

Medium Confidence

Total monthly reach

1.5K to 13K

Estimated from 2 chart positions in 2 markets.

By chart position

🇮🇳
IN · Technology
#134
1K to 10K
🇮🇩
ID · Technology
#135
500 to 3K

Per-Episode Audience
Est. listeners per new episode within ~30 days
450 to 3.9K
🎙 Daily cadence·74 episodes·Last published 3d ago
Monthly Reach
Unique listeners across all episodes (30 days)
1.5K to 13K
🇮🇳77%🇮🇩23%
Active Followers
Loyal subscribers who consistently listen
600 to 5.2K

Market Insights

This ShowCategory Avg

No category insights available.

📡

Platform Distribution

Reach across major podcast platforms, updated hourly

Total Followers

—

Total Plays

—

Total Reviews

—

YouTube

Subscribers

—

Views

—

Videos

—

Castbox

Followers

—

Plays

—

Reviews

—

Podcast App

Followers

—

Plays

—

Reviews

—

Podcast Republic

Followers

—

Plays

—

Reviews

—

TuneIn

Followers

—

Plays

—

Reviews

—

* Data sourced directly from platform APIs and aggregated hourly across all major podcast directories.

On the show

From 13 eps

Hosts

Hugo Bowne-Anderson

13 eps

Thomas Wiecki

1 ep

Recent guests

18 across last 13 eps

Wes McKinney

2 eps

Stella Wenxing Liu

1 ep

Eddie Landesberg

1 ep

Katharine Jarmul

1 ep

Sebastian Raschka

1 ep

Bryan Bischof

1 ep

Samuel Colvin

1 ep

Marcel Kornacker

1 ep

Alison Hill

1 ep

Doug Turnbull

1 ep

John Berryman

1 ep

Eleanor Berger

1 ep

Isaac Flaath

1 ep

Eric Ma

1 ep

Thomas Wiecki

1 ep

Vincent Warmerdam

1 ep

Jeremiah Lowin

1 ep

Randy Olson

1 ep

Recent episodes

What Claude Fable Means for Coding Agents

Jul 8, 2026

1h 02m 25s

The Future of Agentic Data Science

May 25, 2026

1h 04m 37s

Agent-Harness.ipynb*

May 20, 2026

1h 19m 46s

Agentic Engineering and the Lost Art of Verification

May 12, 2026

1h 32m 26s

Next Level AI Evals for 2026

Apr 23, 2026

53m 34s

🔗

Social Links & Contact

Official channels & resources

🌐

Official Website

📡

RSS Feed

Episodes

monthly

Avg length

1h 01m 53s

28m 04s – 1h 33m 39s

Range

Feb 2022 – Apr 2026

Topics

software development, ai agents +71

Guests

Wes McKinney +17 · last 13 eps

25 of 25

Date	Episode	Topics	Guests	Brands	Places	Keywords	Sponsor	Length
7/8/26	What Claude Fable Means for Coding Agents	Nicolay Gerold works all day and night on AMP, one of the most interesting coding-agent harnesses out there.If you’re building with coding agents, this conversation will help you understand: * when to trust the model, * when to build harnesses around it,* which model is worth paying for, * which programming languages gives the agent better feedback, and * when to take the keyboard back.Coding-agent products are living inside a blender. Opus 4.8 to Fable changes what the model can be trusted with, eats a workflow, and suddenly the best product decision is to delete code.AMP had handoff because long agent threads used to get messy. Compaction would lose the plot, the model would make worse decisions, and the product needed a way to move the work somewhere cleaner. Then compaction got better. The model ate the feature. AMP killed it.Builders inherit the annoying product test: does this harness code help inspect, verify, recover, or merge model work, or is it just babysitting yesterday’s model?Nico and Hugo riff on why loop engineering is overrated (and when to use it), why Fable is the first model with real engineering taste, and why you should stop writing Python code today and start writing TypeScript and Rust for all your AI Engineering workflows.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.👉 Want to build agents from the ground up? Registration is open for Build AI Agents from First Principles, a live workshop on the loops, tools, context, harnesses, and engineering decisions behind useful AI agents. You'll learn how to design agent systems from first principles, with enough structure to decide which harness patterns your product actually needs. Sign up today with vg-code for 10% off 👈In This Episode* Coding-agent harnesses today: compaction, sandboxes, review flows, and the features frontier models are starting to absorb.* Why AMP keeps deleting its own features when models get better.* The test for every harness feature: does it make the agent’s work easier to inspect, verify, or recover from?* Local agents, cloud sandboxes, and where each fits when bugs, issues, logs, or customer feedback turn into code changes.* Background agents without auto-merge fantasy: how useful work comes back as branches, checkouts, or review candidates.* Loop engineering in practice: tight loops with clear objectives, broad loops that create review overload, and where builders should draw the line.* When deterministic code beats an AI step, and when a single agent with the right tools can replace brittle orchestration.* The TikTok problem for coding: hundreds of agent threads, fragmented attention, and why loop engineering can become a trap.- The TikTok problem for coding: hundreds of agent threads, fragmented attention, and why loop engineering can become a trap.Resources* AMP* AMP Owner’s Manual* Nicolay Gerold’s Show Us Your Agent Skills dossier* Clio: Privacy-Preserving Insights into Real-World AI Use* TigerBeetle TigerStyle* How to Build A Coding Agent with Nico and Hugo Build AI Agents From First Principles👉 Want to build agents from the ground up? Registration is open for Build AI Agents from First Principles, a live workshop on the loops, tools, context, harnesses, and engineering decisions behind useful AI agents. You’ll learn how to design agent systems from first principles, with enough structure to decide which harness patterns your product actually needs. Sign up today with vg-code for 10% off. 👈How You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel* Join one of our other workshops here Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe						1h 02m 25s
5/25/26	The Future of Agentic Data Science✨	agentic data sciencedecision intelligence+3	Thomas Wiecki	PyMCPyMC Labs	—	data sciencedecision engines+3	—	1h 04m 37s
5/20/26	Agent-Harness.ipynb*✨	Python notebooksagent harness+4	Vincent Warmerdam	marimo	—	Claudeslot machine+4	—	1h 19m 46s
5/12/26	Agentic Engineering and the Lost Art of Verification✨	agentic engineeringcode verification+4	Wes McKinneyJeremiah Lowin+1	pandasPOSIT+6	—	Roborevsoftware factory+4	—	1h 32m 26s
4/23/26	Next Level AI Evals for 2026✨	AI evaluationproduct development+4	Stella Wenxing LiuEddie Landesberg	ASUGoogle	—	AI evalsproduct iteration+5	—	53m 34s
4/15/26	Privacy Theater Is Not Privacy Engineering: What It Actually Takes to Ship Safe AI✨	AI privacyprivacy engineering+4	Katharine Jarmul	Practical Data Privacy	—	privacy theaterAI agents+6	—	1h 06m 31s
4/13/26	LLM Architecture in 2026: What You Need to Know with Sebastian Raschka✨	AI architecturelarge language models+4	Sebastian Raschka	Build a Large Language Model from ScratchBuild a Reasoning Model from Scratch+2	—	AI architecturelarge language models+3	—	1h 18m 02s
3/20/26	Episode 72: Why Agents Solve the Wrong Problem (and What Data Scientists Do Instead)✨	data scienceAI agents+4	Bryan Bischof	Theory Ventures	—	data scienceAI agents+5	—	1h 33m 39s
2/18/26	Episode 71: Durable Agents - How to Build AI Systems That Survive a Crash with Samuel Colvin✨	AI engineeringdurability in AI+4	Samuel Colvin	Pydantic AITemporal	—	AIdurability+5	—	51m 27s
2/12/26	Episode 70: 1,400 Production AI Deployments✨	AI deploymentsinfinite loops+5	—	DoorDashELIOS	—	AIinfinite loop+7	—	1h 09m 52s
Want analysis for the episodes below?Free for Pro Submit a request, we'll have your selected episodes analyzed within an hour. Free, at no cost to you, for Pro users.
2/3/26	Episode 69: Python is Dead. Long Live Python! With the Creators of pandas & Parquet✨	agent ergonomicsAI-generated code+4	Wes McKinneyMarcel Kornacker+1	pandasPosit+1	—	PythonGo+7	—	55m 27s
1/23/26	Episode 68: A Builder’s Guide to Agentic Search & Retrieval with Doug Turnbull & John Berryman✨	Agentic Searchinformation retrieval+4	Doug TurnbullJohn Berryman	RedditShopify+4	—	Agentic Searchinformation retrieval+5	—	1h 28m 42s
1/14/26	Episode 67: Saving Hundreds of Hours of Dev Time with AI Agents That Learn✨	AI-assisted codingcontinual learning+5	Eleanor BergerIsaac Flaath	Elite AI Assisted Coding	—	AIcoding+6	—	1h 18m 22s
1/8/26	Episode 66: The Agent Paradox - Why Moderna's Most Productive AI Systems Aren't Agents✨	AI systemsworkflow+4	Eric Ma	ModernaAnthropic	—	agentsworkflows+5	—	42m 58s
12/19/25	Episode 65: The Rise of Agentic Search	We’re really moving from a world where humans are authoring search queries and humans are executing those queries and humans are digesting the results to a world where AI is doing that for us.Jeff Huber, CEO and co-founder of Chroma, joins Hugo to talk about how agentic search and retrieval are changing the very nature of search and software for builders and users alike.We Discuss:* “Context engineering”, the strategic design and engineering of what context gets fed to the LLM (data, tools, memory, and more), which is now essential for building reliable, agentic AI systems;* Why simply stuffing large context windows is no longer feasible due to “context rot” as AI applications become more goal-oriented and capable of multi-step tasks* A framework for precisely curating and providing only the most relevant, high-precision information to ensure accurate and dependable AI systems;* The “agent harness”, the collection of tools and capabilities an agent can access, and how to construct these advanced systems;* Emerging best practices for builders, including hybrid search as a robust default, creating “golden datasets” for evaluation, and leveraging sub-agents to break down complex tasks* The major unsolved challenge of agent evaluation, emphasizing a shift towards iterative, data-centric approaches.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈Oh! One more thing: we’ve just announced a Vanishing Gradients livestream for January 21 that you may dig:* A Builder’s Guide to Agentic Search & Retrieval with Doug Turnbull and John Berryman (register to join live or get the recording afterwards.Show notes* Jeff Huber on Twitter* Jeff Huber on LinkedIn* Try Chroma!* Context Rot: How Increasing Input Tokens Impacts LLM Performance by The Chroma Team* AI Agent Harness, 3 Principles for Context Engineering, and the Bitter Lesson Revisited* From Context Engineering to AI Agent Harnesses: The New Software Discipline* Generative Benchmarking by The Chroma Team* Effective context engineering for AI agents by The Anthropic Team* Making Sense of Millions of Conversations for AI Agents by Ivan Leo (Manus) and Hugo* How we built our multi-agent research system by The Anthropic Team* Upcoming Events on Luma* Watch the podcast video on YouTube👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe						51m 53s
12/3/25	Episode 64: Data Science Meets Agentic AI with Michael Kennedy (Talk Python)	We have been sold a story of complexity. Michael Kennedy (Talk Python) argues we can escape this by relentlessly focusing on the problem at hand, reducing costs by orders of magnitude in software, data, and AI.In this episode, Michael joins Hugo to dig into the practical side of running Python systems at scale. They connect these ideas to the data science workflow, exploring which software engineering practices allow AI teams to ship faster and with more confidence. They also detail how to deploy systems without unnecessary complexity and how Agentic AI is fundamentally reshaping development workflows.We talk through:- Escaping complexity hell to reduce costs and gain autonomy- The specific software practices, like the "Docker Barrier", that matter most for data scientists- How to replace complex cloud services with a simple, robust $30/month stack- The shift from writing code to "systems thinking" in the age of Agentic AI- How to manage the people-pleasing psychology of AI agents to prevent broken code- Why struggle is still essential for learning, even when AI can do the work for youLINKSTalk Python In Production, the Book! (https://talkpython.fm/books/python-in-production)Just Enough Python for Data Scientists Course (https://training.talkpython.fm/courses/just-enough-python-for-data-scientists)Agentic AI Programming for Python Course (https://training.talkpython.fm/courses/agentic-ai-programming-for-python)Talk Python To Me (https://talkpython.fm/) and a recent episode with Hugo as guest: Building Data Science with Foundation LLM Models (https://talkpython.fm/episodes/show/526/building-data-science-with-foundation-llm-models)Python Bytes podcast (https://pythonbytes.fm/)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtube.com/live/jfSRxxO3aRo?feature=share)Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (35% off for listeners) (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe						1h 02m 56s
11/22/25	Episode 63: Why Gemini 3 Will Change How You Build AI Agents with Ravin Kumar (Google DeepMind)	Gemini 3 is a few days old and the massive leap in performance and model reasoning has big implications for builders: as models begin to self-heal, builders are literally tearing out the functionality they built just months ago... ripping out the defensive coding and reshipping their agent harnesses entirely.Ravin Kumar (Google DeepMind) joins Hugo to breaks down exactly why the rapid evolution of models like Gemini 3 is changing how we build software. They detail the shift from simple tool calling to building reliable "Agent Harnesses", explore the architectural tradeoffs between deterministic workflows and high-agency systems, the nuance of preventing context rot in massive windows, and why proper evaluation infrastructure is the only way to manage the chaos of autonomous loops.They talk through:- The implications of models that can "self-heal" and fix their own code- The two cultures of agents: LLM workflows with a few tools versus when you should unleash high-agency, autonomous systems.- Inside NotebookLM: moving from prototypes to viral production features like Audio Overviews- Why Needle in a Haystack benchmarks often fail to predict real-world performance- How to build agent harnesses that turn model capabilities into product velocity- The shift from measuring latency to managing time-to-compute for reasoning tasksLINKSFrom Context Engineering to AI Agent Harnesses: The New Software Discipline, a podcast Hugo did with Lance Martin, LangChain (https://high-signal.delphina.ai/episode/context-engineering-to-ai-agent-harnesses-the-new-software-discipline)Context Rot: How Increasing Input Tokens Impacts LLM Performance (https://research.trychroma.com/context-rot)Effective context engineering for AI agents by Anthropic (https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/CloimQsQuJM)Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe						1h 00m 13s
10/31/25	Episode 62: Practical AI at Work: How Execs and Developers Can Actually Use LLMs	Many leaders are trapped between chasing ambitious, ill-defined AI projects and the paralysis of not knowing where to start. Dr. Randall Olson argues that the real opportunity isn't in moonshots, but in the "trillions of dollars of business value" available right now. As co-founder of Wyrd Studios, he bridges the gap between data science, AI engineering, and executive strategy to deliver a practical framework for execution.In this episode, Randy and Hugo lay out how to find and solve what might be considered "boring but valuable" problems, like an EdTech company automating 20% of its support tickets with a simple retrieval bot instead of a complex AI tutor. They discuss how to move incrementally along the "agentic spectrum" and why treating AI evaluation with the same rigor as software engineering is non-negotiable for building a disciplined, high-impact AI strategy.They talk through:How a non-technical leader can prototype a complex insurance claim classifier using just photos and a ChatGPT subscription.The agentic spectrum: Why you should start by automating meeting summaries before attempting to build fully autonomous agents.The practical first step for any executive: Building a personal knowledge base with meeting transcripts and strategy docs to get tailored AI advice.Why treating AI evaluation with the same rigor as unit testing is essential for shipping reliable products.The organizational shift required to unlock long-term AI gains, even if it means a short-term productivity dip.LINKSRandy on LinkedIn (https://www.zenml.io/llmops-database)Wyrd Studios (https://thewyrdstudios.com/)Stop Building AI Agents (https://www.decodingai.com/p/stop-building-ai-agents)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)🎓 Learn more:In Hugo's course: Building AI Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us! Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe						59m 04s
10/16/25	Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in Production	Most AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI.Drawing from his work at Zen ML, Alex details why success requires scaling down and enforcing MLOps discipline to navigate the unpredictable "Agent Reliability Cliff". He provides the essential architectural shifts, evaluation hygiene techniques, and practical steps needed to move beyond guesswork and build scalable, trustworthy AI products.We talk through:- Why "shoving a thousand agents" into an app is the fastest route to unmanageable chaos- The essential MLOps hygiene (tracing and continuous evals) that most teams skip- The optimal (and very low) limit for the number of tools an agent can reliably use- How to use human-in-the-loop strategies to manage the risk of autonomous failure in high-sensitivity domains- The principle of using simple Python/RegEx before resorting to costly LLM judgesLINKSThe LLMOps Database: 925 entries as of today....submit a use case to help it get to 1K! (https://www.zenml.io/llmops-database)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)🎓 Learn more:-This was a guest Q&A from Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us! Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe						28m 04s
9/30/25	Episode 60: 10 Things I Hate About AI Evals with Hamel Husain	Most AI teams find "evals" frustrating, but ML Engineer Hamel Husain argues they’re just using the wrong playbook. In this episode, he lays out a data-centric approach to systematically measure and improve AI, turning unreliable prototypes into robust, production-ready systems.Drawing from his experience getting countless teams unstuck, Hamel explains why the solution requires a "revenge of the data scientists." He details the essential mindset shifts, error analysis techniques, and practical steps needed to move beyond guesswork and build AI products you can actually trust.We talk through: The 10(+1) critical mistakes that cause teams to waste time on evals Why "hallucination scores" are a waste of time (and what to measure instead) The manual review process that finds major issues in hours, not weeks A step-by-step method for building LLM judges you can actually trust How to use domain experts without getting stuck in endless review committees Guest Bryan Bischof's "Failure as a Funnel" for debugging complex AI agentsIf you're tired of ambiguous "vibe checks" and want a clear process that delivers real improvement, this episode provides the definitive roadmap.LINKSHamel's website and blog (https://hamel.dev/)Hugo speaks with Philip Carter (Honeycomb) about aligning your LLM-as-a-judge with your domain expertise (https://vanishinggradients.fireside.fm/51)Hamel Husain on Lenny's pocast, which includes a live demo of error analysis (https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill)The episode of VG in which Hamel and Hugo talk about Hamel's "data consulting in Vegas" era (https://vanishinggradients.fireside.fm/9)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtube.com/live/QEk-XwrkqhI?feature=share)Hamel's AI evals course, which he teaches with Shreya Shankar (UC Berkeley): starts Oct 6 and this link gives 35% off! (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME) https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe						1h 13m 16s
9/23/25	Episode 59: Patterns and Anti-Patterns For Building with AI	John Berryman (Arcturus Labs; early GitHub Copilot engineer; co-author of Relevant Search and Prompt Engineering for LLMs) has spent years figuring out what makes AI applications actually work in production. In this episode, he shares the “seven deadly sins” of LLM development — and the practical fixes that keep projects from stalling. From context management to retrieval debugging, John explains the patterns he’s seen succeed, the mistakes to avoid, and why it helps to think of an LLM as an “AI intern” rather than an all-knowing oracle. We talk through: - Why chasing perfect accuracy is a dead end - How to use agents without losing control - Context engineering: fitting the right information in the window - Starting simple instead of over-orchestrating - Separating retrieval from generation in RAG - Splitting complex extractions into smaller checks - Knowing when frameworks help — and when they slow you down A practical guide to avoiding the common traps of LLM development and building systems that actually hold up in production.LINKS:Context Engineering for AI Agents, a free, upcoming lightning lesson from John and Hugo (https://maven.com/p/4485aa/context-engineering-for-ai-agents)The Hidden Simplicity of GenAI Systems, a previous lightning lesson from John and Hugo (https://maven.com/p/a8195d/the-hidden-simplicity-of-gen-ai-systems)Roaming RAG – RAG without the Vector Database, by John (https://arcturus-labs.com/blog/2024/11/21/roaming-rag--rag-without-the-vector-database/)Cut the Chit-Chat with Artifacts, by John (https://arcturus-labs.com/blog/2024/11/11/cut-the-chit-chat-with-artifacts/)Prompt Engineering for LLMs by John and Albert Ziegler (https://amzn.to/4gChsFf)Relevant Search by John and Doug Turnbull (https://amzn.to/3TXmDHk)Arcturus Labs (https://arcturus-labs.com/)Watch the podcast on YouTube (https://youtu.be/mKTQGKIUq8M)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)🎓 Learn more:Hugo's course (this episode was a guest Q&A from the course): Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe						47m 37s
9/9/25	Episode 58: Building GenAI Systems That Make Business Decisions with Thomas Wiecki (PyMC Labs)	While most conversations about generative AI focus on chatbots, Thomas Wiecki (PyMC Labs, PyMC) has been building systems that help companies make actual business decisions. In this episode, he shares how Bayesian modeling and synthetic consumers can be combined with LLMs to simulate customer reactions, guide marketing spend, and support strategy. Drawing from his work with Colgate and others, Thomas explains how to scale survey methods with AI, where agents fit into analytics workflows, and what it takes to make these systems reliable. We talk through: Using LLMs as “synthetic consumers” to simulate surveys and test product ideas How Bayesian modeling and causal graphs enable transparent, trustworthy decision-making Building closed-loop systems where AI generates and critiques ideas Guardrails for multi-agent workflows in marketing mix modeling Where generative AI breaks (and how to detect failure modes) The balance between useful models and “correct” models If you’ve ever wondered how to move from flashy prototypes to AI systems that actually inform business strategy, this episode shows what it takes. LINKS:The AI MMM Agent, An AI-Powered Shortcut to Bayesian Marketing Mix Insights (https://www.pymc-labs.com/blog-posts/the-ai-mmm-agent)AI-Powered Decision Making Under Uncertainty Workshop w/ Allen Downey & Chris Fonnesbeck (PyMC Labs) (https://youtube.com/live/2Auc57lxgeU)The Podcast livestream on YouTube (https://youtube.com/live/so4AzEbgSjw?feature=share)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe						1h 00m 45s
8/29/25	Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)	While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply. Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines. We talk through: - Treating LLM workflows as ETL pipelines for unstructured text - Error analysis: why you need humans reviewing the first 50–100 traces - Guardrails like retries, validators, and “gleaning” - How LLM judges work — rubrics, pairwise comparisons, and cost trade-offs - Cheap vs. expensive models: when to swap for savings - Where agents fit in (and where they don’t) If you’ve ever wondered how to move beyond unreliable demos, this episode shows how to scale LLMs to millions of documents — without breaking the bank.LINKSShreya's website (https://www.sh-reya.com/)DocETL, A system for LLM-powered data processing (https://www.docetl.org/)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/3r_Hsjy85nk)Shreya's AI evals course, which she teaches with Hamel "Evals" Husain (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe						41m 28s
8/14/25	Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters	While much of the AI world chases ever-larger models, Ravin Kumar (Google DeepMind) and his team build across the size spectrum, from billions of parameters down to this week’s release: Gemma 270M, the smallest member yet of the Gemma 3 open-weight family. At just 270 million parameters, a quarter the size of Gemma 1B, it’s designed for speed, efficiency, and fine-tuning. We explore what makes 270M special, where it fits alongside its billion-parameter siblings, and why you might reach for it in production even if you think “small” means “just for experiments.” We talk through: - Where 270M fits into the Gemma 3 lineup — and why it exists - On-device use cases where latency, privacy, and efficiency matter - How smaller models open up rapid, targeted fine-tuning - Running multiple models in parallel without heavyweight hardware - Why “small” models might drive the next big wave of AI adoption If you’ve ever wondered what you’d do with a model this size (or how to squeeze the most out of it) this episode will show you how small can punch far above its weight.LINKSIntroducing Gemma 3 270M: The compact model for hyper-efficient AI (Google Developer Blog) (https://developers.googleblog.com/en/introducing-gemma-3-270m/)Full Model Fine-Tune Guide using Hugging Face Transformers (https://ai.google.dev/gemma/docs/core/huggingface_text_full_finetune)The Gemma 270M model on HuggingFace (https://huggingface.co/google/gemma-3-270m)The Gemma 270M model on Ollama (https://ollama.com/library/gemma3:270m)Building AI Agents with Gemma 3, a workshop with Ravin and Hugo (https://www.youtube.com/live/-IWstEStqok) (Code here (https://github.com/canyon289/ai_agent_basics))From Images to Agents: Building and Evaluating Multimodal AI Workflows, a workshop with Ravin and Hugo (https://www.youtube.com/live/FNlM7lSt8Uk)(Code here (https://github.com/canyon289/ai_image_agent))Evaluating AI Agents: From Demos to Dependability, an upcoming workshop with Ravin and Hugo (https://lu.ma/ezgny3dl)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)Watch the podcast video on YouTube (https://youtu.be/VZDw6C2A_8E)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 ($600 off early bird discount for November cohort availiable until August 16) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe						45m 41s
8/12/25	Episode 55: From Frittatas to Production LLMs: Breakfast at SciPy	Traditional software expects 100% passing tests. In LLM-powered systems, that’s not just unrealistic — it’s a feature, not a bug. Eric Ma leads research data science in Moderna’s data science and AI group, and over breakfast at SciPy we explored why AI products break the old rules, what skills different personas bring (and miss), and how to keep systems alive after the launch hype fades. You’ll hear the clink of coffee cups, the murmur of SciPy in the background, and the occasional bite of frittata as we talk (hopefully also a feature, not a bug!)We talk through: • The three personas — and the blind spots each has when shipping AI systems • Why “perfect” tests can be a sign you’re testing the wrong thing • Development vs. production observability loops — and why you need both • How curiosity about failing data separates good builders from great ones • Ways large organizations can create space for experimentation without losing delivery focus If you want to build AI products that thrive in the messy real world, this episode will help you embrace the chaos — and make it work for you.LINKSEric' Website (https://ericmjl.github.io/)More about the workshops Eric and Hugo taught at SciPy (https://hugobowne.substack.com/p/stress-testing-llms-evaluation-frameworks)Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)🎓 Learn more:Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 ($600 off early bird discount for November cohort availiable until August 16) Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe						38m 09s