
Insights from recent episode analysis
Audience Interest
Podcast Focus
Publishing Consistency
Platform Reach
Insights are generated by CastFox AI using publicly available data, episode content, and proprietary models.
Total monthly reach
Estimated from 1 chart position in 1 market.
By chart position
- 🇦🇺AU · Technology#1715K to 30K
- Per-Episode Audience
Est. listeners per new episode within ~30 days
2.5K to 15K🎙 Weekly cadence·20 episodes·Last published 1w ago - Monthly Reach
Unique listeners across all episodes (30 days)
5K to 30K🇦🇺100% - Active Followers
Loyal subscribers who consistently listen
2K to 12K
Market Insights
Platform Distribution
Reach across major podcast platforms, updated hourly
Total Followers
—
Total Plays
—
Total Reviews
—
* Data sourced directly from platform APIs and aggregated hourly across all major podcast directories.
On the show
Recent episodes
Every Learner Gets Their Own Andrew
May 14, 2026
22m 17s
Everything But the Model: Harness Engineering
May 7, 2026
26m 58s
Chunking Isn’t Dead. One Size Doesn’t Fit All
Mar 18, 2026
10m 21s
Stop Shipping Agents With Chat UIs
Feb 24, 2026
35m 44s
MCP Was Built for Tools, Not for Agents That Write
Feb 9, 2026
22m 16s
Social Links & Contact
Official channels & resources
Official Website
Login
RSS Feed
Login
| Date | Episode | Description | Length | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 5/14/26 | ![]() Every Learner Gets Their Own Andrew | The video course had a good run. Ryan Keenan from DeepLearning.ai thinks it's over. In this episode, recorded live at AI Dev 26 in San Francisco, Yuval sits down with Ryan to talk about what replaces the current online courses - and why the answer isn't another format: it's a conversation. They dig into why Jupyter Notebooks are out, why React is in, how Andrew Ng's voice clone became a teaching assistant, and what it actually takes to make learning feel personal at scale. The one-size-fits-all era of online education is ending. What comes next is wilder than you'd think. | 22m 17s | ||||||
| 5/7/26 | ![]() Everything But the Model: Harness Engineering | Everything you need to know about harness engineering, in less than 30 minutes. In this episode, Yuval sits down with Mike Chambers from AWS to unpack harness engineering - the term that's quietly taken over the AI developer conversation. They dig into what a harness actually is (spoiler: it's everything outside the LLM), how it differs from context engineering and scaffolding, and why getting agents into production has been so hard. Along the way: MCP's rocky debut, OpenClaw's calendar-clearing chaos, and why multi-tenanted agent architecture is harder than it sounds - but doesn't have to be. | 26m 58s | ||||||
| 3/18/26 | ![]() Chunking Isn’t Dead. One Size Doesn’t Fit All | Chunking is still one of the least-discussed but most decisive parts of RAG. In this episode, we break down why no single chunk size works for all questions, how different queries benefit from different window sizes, and why fixed-window indexing quietly limits retrieval performance. We walk through a multi-window chunking approach, show how rank fusion ties it together, and explain why better agents can’t fix retrieval when the data is indexed the wrong way. | 10m 21s | ||||||
| 2/24/26 | ![]() Stop Shipping Agents With Chat UIs | Chat was a great prototype. It’s a terrible product. In this episode, Yuval sits down with CopilotKit co-founder Atai to unpack why most agentic apps stall at “chat + vibes”, and why the real bottleneck in production AI isn’t models or reasoning. It’s UI. They break down what actually changed in the last year, why agents fundamentally break the request-response paradigm, and how a new generation of protocols is emerging to connect agents to real users. The conversation covers: AG-UI, Model Context Protocol (and MCP Apps) and Agent-to-Agent (A2A) Protocols The messy (but inevitable) transition from text-only chat to component-rich, voice-enabled, agentic applications. If you’ve built an agent that works but users still bounce, this episode explains why, and what the new “glue layer” of AI UIs is starting to look like. | 35m 44s | ||||||
| 2/9/26 | ![]() MCP Was Built for Tools, Not for Agents That Write | MCP standardized tool calling for agents but breaks down once agents start mutating state. In this episode, Yuval sits with Eran Gat from AI21 to dig into what happens when writing agents run in parallel, why shared environments fall apart, and how workspace isolation becomes a missing execution layer. Using real coding workloads and benchmarks, we walk through the architectural trade-offs behind making concurrent agents actually work. | 22m 16s | ||||||
| 1/15/26 | ![]() Why AI Leaderboards Miss the Point | Leaderboards reward “best average score.” Real users reward “answer fast, don’t hallucinate, don’t bankrupt me.” In this special deep dive episode, AI21’s CTO Barak Lenz walks through four gaps between what models can do and what real AI systems deliver: validation, contextualization (pick the right approach per input), latency (parallelize and stop early), and decomposition (making those choices continuously inside long workflows). Less “best model.” More “best execution.” | 56m 30s | ||||||
| 1/13/26 | ![]() The Agent Swarm Fallacy | Running multiple agents can improve quality. Doing it right is the hard part. This time we look at the Agent Swarm Fallacy: the idea that throwing more agents at a problem automatically makes systems better. Yuval sits with Or Dagan, AI21 CPO, to explore why this breaks in practice, what happens when agents act instead of just think, and how test-time compute, structured execution, and smart decision points offer a solution. | 30m 07s | ||||||
| 1/1/26 | ![]() This Deep Research Agent Ignored the Benchmark and Still Won | Tavily built a Deep Research Agent with production in mind. Something they could actually scale. So they did the unsexy work. They went through millions of agent logs, found where tokens were being wasted, and optimized each section of the system. The result surprised them: they cut token consumption by more than half (!), then tested quality and discovered they topped the DeepResearch Bench without even trying. In this YAAP episode, Yuval sits down with Dean from Tavily to break down how they built it, what they did differently from the usual top approaches, and which design choices made better results possible with far fewer tokens. What you’ll learn: How to reduce token burn without tanking quality Why reading millions of logs beats chasing the flashiest tech The design choices that pushed quality up while tokens dropped hard | 29m 59s | ||||||
| 12/29/25 | ![]() Don’t Learn Distributed Systems. Just import ray | You wanted to build an agent. You ended up debugging GPUs, scaling workers, and chasing OOMs. In this episode of YAAP, Yuval sits down with Linda from Anyscale to unpack why Ray exists and how it helps AI teams scale without turning every developer into a distributed systems expert. We trace Ray’s roots in reinforcement learning research, then zoom out to how it’s used today across the AI pipeline: data processing, training, inference, and agents. Along the way, we cover why libraries like vLLM build on Ray, when Ray vs. SaaS makes sense, and why unstructured and multimodal data push traditional big-data tools to their limits. | 30m 56s | ||||||
| 12/23/25 | ![]() GenAI Meets Wall Street: Why Every Bank Thinks It’s a Snowflake | Banks love GenAI. They just don’t trust it. Yet. In this episode of YAAP, Yuval talks with Renee Lau from AWS, a financial services industry specialist who works hands on with banks, insurers, and hedge funds as they try to move generative AI from pilots into production. Renee shares what she sees across the market, what actually works, and where teams get stuck. They explore the two sides of GenAI adoption in finance. Cost cutting back office automation and revenue driven use cases like hedge fund research. Along the way, they dig into compliance, pricing, human in the loop workflows, and the crawl walk run path to deployment. You will also hear why every bank believes it is a special snowflake, why that instinct is understandable, and how builders can still create solutions that scale across financial services. | 35m 59s | ||||||
Want analysis for the episodes below?Free for Pro Submit a request, we'll have your selected episodes analyzed within an hour. Free, at no cost to you, for Pro users. | |||||||||
| 12/15/25 | ![]() Everyone’s got the same model. Now what? | Everyone’s building on the same foundation models. So how do you stand out? For Imagen AI, the answer isn’t bigger models, it’s smarter loops. CEO Yotam Gil joins Yuval to unpack how personalization, workflow integration, and continuous feedback turned Imagen’s photo-editing engine into a true moat. But that’s only half the story. The other half is speed: how a two-person Commando Squad at Imagen uses “vibe-coding in production” to prototype new ideas in one or two sprints, test them in the wild, and kill what doesn’t stick — without hurting the core product. It’s a conversation about differentiation when models are commodities, and about building a culture that moves as fast as the tech it’s built on. | 31m 25s | ||||||
| 11/11/25 | ![]() The House That Builds Builders – The Origin Story of AGI House | Three years ago, it was just a house full of friends geeking out about AI. Today, it’s where researchers, founders, and engineers collide — and where hackathon demos turn into real startups. In this episode, Yuval sits down with Henry Yin, Co-founder & CTO of AGI House, to unpack how a pandemic project became the Bay Area’s builder epicenter. From fine-tuning meetups to venture funding, they trace the journey of turning one house into the heart of a movement. | 11m 17s | ||||||
| 10/28/25 | ![]() Scraping Without Getting Sued (Or Falling Asleep) | Everyone (and we do mean EVERYONE) needs data, and the web is the largest database humanity has ever built. But tapping into it at scale requires more than technical skills. If your product touches web data, scraping isn't just a backend task, it can be risky and have real consequences. In this episode, Yuval sits down with Rony Shalit, Chief Compliance and Ethics Officer at Bright Data, to talk about what can go wrong when you treat data collection as “just an implementation detail”. From lawsuits with Meta and X to wild edge cases and vendor breakdowns, they dive into what it takes to collect data responsibly and stay out of trouble. | 48m 34s | ||||||
| 8/26/25 | ![]() The Judge Model Diaries: Judging the Judges | Your LLM gave a great answer. But who decides what “great” means? In this episode, Yuval talks with Noam Gat about judge language models — reward models, critic models, and how LLMs can be trained to rate, rank, and critique each other. They dive into the difference between scoring and feedback, how to use judge models during inference, and why most evaluation benchmarks don’t tell the full story. Turns out, getting a good answer is easy. Knowing it’s good? That’s the hard part. | 30m 23s | ||||||
| 8/12/25 | ![]() RLVR Lets Models Fail Their Way to the Top | Think you know fine-tuning? If your answer is RLHF, you don’t. In this episode, Itay, who leads the Alignment group at AI21, gives a no-fluff crash course on RLVR (Reinforcement Learning with Verifiable Rewards), the method powering today’s smartest coding and reasoning models. He explains why RLVR beats RLHF at its own game, how “hard to solve, easy to verify” tasks unlock exploration without chaos, and the emergent behaviors you only get when models are allowed to screw up. If you want to actually understand RLVR (and use it), start here. Key topics: How RLVR outsmarts RLHF in real-world training The “verified rewards” trick that kills reward hacking Emergent skills you don’t get with hand-holding: self-verification, backtracking, multi-path reasoning Why coding models took a giant leap forward Practical steps to train (and actually benefit from) RLVR models | 49m 10s | ||||||
| 7/29/25 | ![]() RAG Is Not Solved – Your Evaluation Just Sucks | <p><b>RAG Is Not Solved – Your Evaluation Just Sucks</b></p><p>Your RAG pipeline is passing benchmarks, but failing reality. In this episode, Yuval sits down with Niv from AI21 to expose why most RAG evaluation is fundamentally flawed. From overhyped retrieval scores to chunking strategies that collapse under real-world complexity, they break down why your system isn’t as good as you think — and how structured RAG solves problems that traditional pipelines simply can’t. </p><p>Bonus: what do Seinfeld trivia, World Cup stats, and your enterprise SharePoint have in common? (hint: your RAG pipeline chokes on all of them).</p><p><strong>Key Topics:</strong></p><ol><li>Why most RAG benchmarks reward the wrong thing (and hide real failures)</li><li>The chunking trap: how bad segmentation sabotages good retrieval</li><li>When LLMs ace the answer—but your pipeline still fails</li><li>Structured RAG: pipeline that solves RAG problem over aggregative data (such as financial reports)</li><li>Evaluation tips, tricks, and traps for AI builders<p></p></li></ol><p><br></p> | 43m 44s | ||||||
| 7/15/25 | ![]() The Call Is Coming From Inside the Agent (And It Has Your Credentials) | <p><b>The Call Is Coming From Inside the Agent (And It Has Your Credentials)</b></p><p>You’ve shipped your first agent. It works. It’s useful. It might also be a security liability you don’t even know about. In this episode, Yuval talks to Zenity CTO Michael Bargury about how easy it is to hijack popular agent systems like Copilot and Cursor, what “zero-click” attacks look like in the agent era, and how to monitor, constrain, and secure your AI Agent in production. From sneaky prompt injections to memory-based persistence and infected multi-agent workflows, this is the “oh no” moment every builder needs.</p><p>Key Topics:</p><ul><li>Why “ignore previous instructions” still works better than it should<p></p></li><li>How one agent goes rogue… and infects the others<p></p></li><li>Real-world attacks: social media triggers, CRM leaks, and logic bombs<p></p></li><li>Observability 101 for AI: logs, reasoning traces, and root cause sanity<p></p></li><li>The new rule: build like it <em>will</em> go rogue—because one day it will<p></p></li></ul><p><br></p> | 49m 31s | ||||||
| 7/1/25 | ![]() Building Enterprise RAG: Lessons from 2+ Years of Production Deployments | <p>Building production AI systems is hard — especially when you’re pioneering entirely new categories. In this episode, Yuval speaks with Guy Becker, Group Product Manager at AI21, to trace the evolution from task-specific models to Agent planning and orchestration systems. Guy shares hard-won lessons from building some of the first RAG-as-a-service offerings when there were literally zero handbooks to follow.</p><p><strong>Key Topics:</strong></p><ol><li><strong>Task-specific models vs. general LLMs</strong>: Why focused, smaller models with pre and post-processing beat general purpose LLMs for business use cases.</li><li><strong>Building RAG before it was cool</strong>: Creating one of the first RAG-as-a-service platforms in early 2023 without any established patterns.</li><li><strong>The one-size-fits-all problem</strong>: Why chunking strategies, embedding models, and retrieval parameters need customization per use case.</li><li><strong>From SaaS to on-prem</strong>: Scaling deployment models for enterprise customers with sensitive data.</li><li><strong>When RAG breaks down</strong>: Multi-hop queries, metadata filtering, and why semantic search isn’t always enough.</li><li><strong>Multi-agent orchestration</strong>: How AI21 Maestro uses automated planning to break complex queries into parallelizable subtasks.</li><li><strong>Production lessons</strong>: Evaluation strategies, quality guarantees, and building explainable AI systems for enterprise..</li></ol> | 37m 57s | ||||||
| 6/19/25 | ![]() Trailer | No description provided. | 0m 44s | ||||||
| 6/17/25 | ![]() You Can’t Have an Agent Without a Plan: What 90% of ’Agents’ Are Missing | <p>Everyone’s talking about AI agents, but most of what we call "agents" are just workflows in disguise. Real autonomous agents require planning. And that, changes everything. In this episode, Yuval speaks with AI21’s Algo Tech Lead, Nitzan Cohen about why the popular React framework isn’t enough and how planning architecture unlocks true agent capabilities.</p><p>Key Topics:<br>1. The difference between workflows/chains and real autonomous agents<br>2. Why React agents fail at complex tasks, parallel execution, and user transparency<br>3. Free text vs. code-based planning approaches and their trade-offs<br>4. How planning enables multi-agent systems and model delegation<br>5. Training planners with reinforcement learning and replanning mechanisms<br>6. Evaluation challenges: Gaia benchmark, Agent Bench, and building custom datasets<br>7. Practical advice: When to upgrade from React and which frameworks to use</p><p>From competitive analysis that runs in parallel to breaking down complex coding tasks, discover how planning transforms AI agents from simple tool-calling loops into sophisticated problem-solving systems.</p> | 33m 18s | ||||||
| 6/10/25 | ![]() The Hard Truths About AI Agents: Why Benchmarks Lie and Frameworks Fail | <p>Building AI agents that actually work is harder than the hype suggests — and most people are doing it wrong. In this special "YAAP: Unplugged" episode (a live panel from AI Tinkerers meetup at the Hugging Face offices in Paris), Yuval sits down with Aymeric Roucher (Project Lead for Agents at Hugging Face) and Niv Granot (Algorithms Group Lead at AI21 Labs) for an unfiltered discussion about the uncomfortable realities of agent development.</p><p><strong><br>Key Topics:</strong></p><ol><li><strong>Why current benchmarks are broken</strong>: From MMLU’s limitations to RAG leaderboards that don’t reflect real-world performance</li><li><strong>The tool use illusion</strong>: Why 95% accuracy on tool calling benchmarks doesn’t mean your agent can actually plan</li><li><strong>LLM-as-a-judge problems</strong>: How evaluation bottlenecks are capping progress compared to verifiable domains like coding</li><li><strong>Framework: friend or foe?</strong> When to ditch LangChain, LlamaIndex, and why minimal implementations often work better</li><li><strong>The real agent stack</strong>: MCP, sandbox environments, and the four essential components you actually need</li><li><strong>Beyond the hype cycle</strong>: From embeddings that can’t distinguish positive from negative numbers to what comes after agents</li></ol><p>From FIFA World Cup benchmarks that expose retrieval failures to the circular dependency problem with LLM judges, this conversation cuts through the marketing noise to reveal what it <em>really</em> takes to build agents that solve real problems — not just impressive demos.</p><p><em>Warning: Contains unpopular opinions about popular frameworks and uncomfortable truths about the current state of AI agent development.</em></p> | 39m 54s | ||||||
| 5/29/25 | ![]() Tool Calling 2.0: How MCP Is Standardizing AI Connections | <p><strong>MCP (Model Context Protocol) is changing how developers connect AI applications to external tools – but what exactly is it, and why should you care?</strong> In this episode, Yuval speaks with Etan Grundstein, Technical Product Manager (and formerly Director of Engineering) at AI21, to break down the protocol that’s standardizing AI integrations, moving beyond basic weather APIs and calculators to real-world productivity workflows.</p><p><strong>Key Topics:<br></strong>1)<strong> </strong>What MCP actually is and how it differs from traditional tool calling<br>2) Real-world examples: Connecting AI to Jira, Notion, Git, and even Blender<br>3) The evolution from local MCP servers to cloud integrations<br>4) Authentication challenges and how they’re being addressed<br>5) Why developers are building MCP servers to build other MCP servers<br>6) Looking ahead: Agent-to-Agent protocols and what comes next</p> | 29m 26s | ||||||
Showing 22 of 22
Sponsor Intelligence
Sign in to see which brands sponsor this podcast, their ad offers, and promo codes.
Chart Positions
1 placement across 1 market.
Chart Positions
1 placement across 1 market.






















