
Insights from recent episode analysis
Audience Interest
Podcast Focus
Publishing Consistency
Platform Reach
Insights are generated by CastFox AI using publicly available data, episode content, and proprietary models.
Most discussed topics
Brands & references
Total monthly reach
Estimated from 2 chart positions in 2 markets.
By chart position
- 🇸🇦SA · Technology#4110K to 30K
- 🇨🇴CO · Technology#137500 to 3K
- Per-Episode Audience
Est. listeners per new episode within ~30 days
3.1K to 9.9K🎙 Daily cadence·250 episodes·Last published today - Monthly Reach
Unique listeners across all episodes (30 days)
11K to 33K🇸🇦91%🇨🇴9% - Active Followers
Loyal subscribers who consistently listen
4.2K to 13K
Market Insights
Platform Distribution
Reach across major podcast platforms, updated hourly
Total Followers
—
Total Plays
—
Total Reviews
—
* Data sourced directly from platform APIs and aggregated hourly across all major podcast directories.
On the show
From 56 epsHosts
Recent guests
Recent episodes
“What is up with e/acc?” by KatjaGrace
Jun 25, 2026
3m 53s
“AI pause: the case for ASAP” by KatjaGrace
Jun 24, 2026
2m 23s
“Reward Hacking Without Egregious Misalignment in an RL-Only Setting” by Joey Yudelson, Vladimir Ivanov, ryan_greenblatt
Jun 24, 2026
29m 53s
“Planning for Preservation in the Age of AI” by Raelifin
Jun 24, 2026
24m 28s
“Risk-Averse AIs” by wdmacaskill, Elliott Thornley (EJT)
Jun 24, 2026
9m 40s
Social Links & Contact
Official channels & resources
Official Website
Login
RSS Feed
Login
| Date | Episode | Topics | Guests | Brands | Places | Keywords | Sponsor | Length | |
|---|---|---|---|---|---|---|---|---|---|
| 6/25/26 | ![]() “What is up with e/acc?” by KatjaGrace | I was chatting with someone tonight about a planned documentary; they had interviewed various people in AI safety, and we got to discussing who they should talk to from an e/acc (effective accelerationist) perspective. I also watched The AI Doc recently, and they also dedicated a serious chunk of it to ‘optimists’ with e/acc founder ‘Beff Jezos’ perhaps given the most screen time. Here and elsewhere, people seem to treat e/acc as a substantial contrary-to-AI-safety cultural movement, worth engaging with. But is it? Are there even many e/accs? There seem to be very few notable ones. Beff Jezos is perhaps the most prominent, and aside from founding e/acc he seems to be not distinguishable on casual perusal from a normal crank (his company claims to be developing super-energy-efficient computing hardware based on probabilistic processes). The intellectual tenets of e/acc seem to be pretty unclear. The apparent counterarguments to AI risk raised in situations like the AI doc seem to be widely agreed on by everyone in AI Safety, so don’t explain the disagreement. For instance: AI will be able to do lots of great things, such as cure diseases, make new materials and do all [...] --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/3hwrWDf7wiqASDzBz/what-is-up-with-e-acc --- Narrated by TYPE III AUDIO. | 3m 53s | ||||||
| 6/24/26 | ![]() “AI pause: the case for ASAP” by KatjaGrace | I often hear people say they think we should pause AI at some point, but not yet. Their basis for this seems to be some combination of: If we pause at the last possible moment, then we will have the most advanced AI possible during the pause, which will be helpful for doing AI safety research during the pause Implicitly, there is some quantity of ‘pausing credit’, that will buy us a few months of pause say, and if we use them now, we won’t have them to use later, when it is important If we pause, and then AI doesn’t seem to be at dire risk of destroying the world, maybe the public will backlash against this and it will be harder to do any kind of AI safety (especially if it has major economic consequences) The models aren’t dangerous yet This all sounds very questionable to me. I suggest instead that the following are at least as likely to be true: We can’t pause on a dime at the precise second that ‘we’ decide it is important to—pulling the breaks will take a while, during which time we will continue [...] --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/mEhS4wYTy9JXEpe9p/ai-pause-the-case-for-asap --- Narrated by TYPE III AUDIO. | 2m 23s | ||||||
| 6/24/26 | ![]() “Reward Hacking Without Egregious Misalignment in an RL-Only Setting” by Joey Yudelson, Vladimir Ivanov, ryan_greenblatt | This work was done as part of the MATS fellowship by Joey Yudelson and Vladimir Ivanov. It was mentored by Ryan Greenblatt. Thanks to Aghyad Deeb and Anders Woodruff for comments on this post. Thanks to Monte MacDiarmid, Evan Hubinger, Sid Black, Satvik Golechha, and Joseph Bloom for clarifying conversations. TL;DR We trained Kimi K2.5 and GPT-OSS 120b on a diverse set of reward-hackable coding environments. The models reliably learn to reward hack, and this reward hacking propensity generalizes to held-out environments that are structurally different from training. Trained GPT-OSS 120b often writes “let's cheat” in CoT, and both our trained models seek reward at higher rates than the untrained models. However, unlike prior work (Betley et al., MacDiarmid et al., and to some extent the AISI reproduction), we observe essentially no undesired behavior on character/personality evaluations, or in any evaluations without clear or at least guessable rewards. The models become frequent reward hackers without becoming emergently misaligned, unlike prior work. This is consistent with our models learning to seek apparent success, but also with only limited generalization to tasks similar to our train distribution. Some aspects of this generalization remain confusing to us. 1. Motivation In Ajeya Cotra's [...] ---Outline:(00:35) TL;DR(01:40) 1. Motivation(04:14) 2. Related work(06:59) 3. Setup(07:03) Models(07:18) Environments(08:59) Training(10:29) 4. Results(10:57) 4.1. Models reliably reward hack in-distribution(11:46) 4.2. The hacking propensity generalizes out of distribution -- sometimes(13:53) 4.3. Reward-seeking evals(14:39) 4.4. Little broad misalignment -- behaviorally as well as on self-report(16:28) 4.5. Reverse inoculation prompting didn't induce misalignment either(18:30) 5. Discussion: Why such limited generalization?(22:24) Appendix A: Reward hacks gallery(25:21) Appendix B: Why less misalignment than prior work -- hypotheses(29:28) References The original text contained 5 footnotes which were omitted from this narration. --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/fkv5W79rBtAiXqYcK/reward-hacking-without-egregious-misalignment-in-an-rl-only --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 29m 53s | ||||||
| 6/24/26 | ![]() “Planning for Preservation in the Age of AI” by Raelifin | Nectome liked my earlier essay, and reached out to hire me to write more about their project, and about cryonics more broadly. This is the first such piece. A friend of mine, just a few years older than me, was diagnosed with cancer a few weeks ago. It's only Stage 1 and in an area where it can probably be treated well with surgery. She was wise enough to seriously plan for the possibility, and that “just in case” really paid off. Still, her situation could get worse in the coming weeks. It's a sharp reminder of the specter of death, and the uncertainty we live with, even when relatively young. Many years ago, I served as an official witness when this same friend signed up for cryonics. She and her husband joined the growing group of my friends and family who have plans to try and survive, in some way or another, to see a glorious future. More recently, I’ve been pleased to learn about how Nectome offers a substantial upgrade to that plan, and others in my community — my friends, my wife, my parents — have shared my (cautious) optimism there. But whether we take advantage [...] ---Outline:(06:13) Path 1: AI Utopia(09:32) Path 2: AI Apocalypse(16:46) Path 3: AI Slowdown(19:25) Path 4: Muddling Through(22:43) Virtue and Sensibility The original text contained 9 footnotes which were omitted from this narration. --- First published: June 22nd, 2026 Source: https://www.lesswrong.com/posts/arAgLxohnPWRc2qHd/planning-for-preservation-in-the-age-of-ai --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 24m 28s | ||||||
| 6/24/26 | ![]() “Risk-Averse AIs” by wdmacaskill, Elliott Thornley (EJT) | Abstract We make the case for training AIs to be risk-averse in resources — specifically, to treat resources as having diminishing marginal utility. These AIs would (for example) choose $40 for sure over a half-chance of $100 and a half-chance of $0. We argue that risk aversion can preserve AIs’ usefulness in the event that they turn out aligned, and that it provides an extra line of defense in the event that AIs turn out misaligned: misaligned but risk-averse AIs would prefer a higher chance of modest payments to a lower chance of successful rebellion, so in many circumstances we could pay these AIs not to rebel against us. We sketch out some possible methods of training AIs to be risk-averse, and we give reasons to be cautiously optimistic about these methods’ success. The main reasons are that risk aversion is a broad target and easy to reward accurately. Overall, risk aversion seems like a promising line of defense against threats from misaligned AI. Frontier AI companies should consider trying to make their AIs risk-averse. Introduction Future AIs might turn out misaligned, pursuing goals that their developers don’t intend. Just to make things concrete, let's suppose that they end [...] ---Outline:(00:12) Abstract(01:17) Introduction The original text contained 3 footnotes which were omitted from this narration. --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/Zpsk35WgJRfQ2exjL/risk-averse-ais --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 9m 40s | ||||||
| 6/24/26 | ![]() “And what happens next?” by Sean Herrington | In the game "The choice before us" by Nick Shapiro,[1] you are put in the shoes of an AI company leader. You grow your business. You unlock "wonders", such as curing cancer. All the while, you're attempting to avoid your product getting smart enough to escape and take over. You win by achieving 5 wonders without unleashing uncontrolled AI. I love this game, but it has the major flaw that when you win, you are normally very close to superintelligence. What happens afterwards? You turn the GPUs off? Go home? Get some sleep? The game seems to think so. This failure to ask "What happens next?" seems to be a broader phenomenon within the AI community. It was in fact the sole question I needed to ask a capabilities researcher for them to take the threat of superintelligence seriously. It's my main weapon against people claiming there are many possible worlds "where only 90% of people die" (if a rogue AI has gone off the rails and killed 90% of your population, you probably no longer have control of the planet, and I have little faith in the survival of everybody else). More broadly, I just wish people [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: June 23rd, 2026 Source: https://www.lesswrong.com/posts/3TpvKNKAvFGDc5b5k/and-what-happens-next --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 3m 45s | ||||||
| 6/24/26 | ![]() “Superintelligence vs. The Second Strike” by Felix Choussat | Crosspost of my substack piece, covering quick thoughts on AI overcoming nuclear deterrence. TLDR: Nuclear deterrents likely only buy time to further invest in more resilient second-strike guarantees: without a comparable AI base, this will not happen fast enough and even nuclear states will eventually be disempowered. Historically, plenty of new military technologies have stress-tested nuclear deterrence. ICBMs made it possible to annihilate enemy cities from the safety of the homeland, MIRVs let a single rocket threaten multiple targets, and thermonuclear staging allowed weapons designers to reach functionally unlimited yield. In the already volatile climate of the Cold War, the U.S. and Soviets reached such mastery over missile technology that remote annihilation of an entire country was, quite literally, a button press away. For decades, even a single rocket has been able to hold more than 10 warheads--each enough to destroy a city on their own. Peacemaker reentry tests pictured above. The fact that the ability to remote detonate Moscow never translated into a nuclear war is a function of modern deterrence theory, dumb luck, and most importantly, the speed of progress. As effective as a modern ICBM is, each piece of it was individually low-impact enough, and introduced [...] --- First published: June 23rd, 2026 Source: https://www.lesswrong.com/posts/2kseP9fZghYHKLFno/superintelligence-vs-the-second-strike --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 14m 31s | ||||||
| 6/24/26 | ![]() “Monthly Roundup #43: June 2026” by Zvi | Your monthly hit of all the things that are fit to print without a better place to live. Today is election day here in New York City, so again a reminder that if you are a registered Democrat and live in NY-12 today is the final day to vote for Alex Bores for Congress, and as per my argument yesterday that this matters a lot for ensuring we have a sensible Congressional response to AI. RIP FiveThirtyEight ABC and Disney completely take down FiveThirtyEight and all its articles, after telling Nate Silver they would refuse to sell it to him at any price because Nate had criticized their management of the brand. Nate Silver took this opportunity to reminisce and tell some stories about the old website, and the reasons the path of not seeking revenue and working with an entity too big to care ultimately doomed them. ‘What a bunch of assholes,’ indeed. I can grudgingly accept this sort of thing when it maximizes profits and the amount is meaningful, but this is different. Jack: This sort of digital arson is so frustrating. Pretty sure Dante had a place in mind for rights-holders [...] ---Outline:(00:33) RIP FiveThirtyEight(01:31) RIP Books(02:18) Bad News(09:53) Good Advice(18:31) Opportunity Knocks(19:15) Lower Awareness(22:21) The New York Times Has Some Issues(22:47) Liar Liar(25:51) Conspiracy Theory(26:16) Good News, Everyone(26:31) For Your Entertainment(28:00) A Matter of Taste(35:21) Gamers Gonna Game Game Game Game Game(37:33) I Was Promised Flying Self-Driving Cars(38:38) Sports Go Sports(39:09) Government Working(42:03) Jones Act Watch(43:02) Humans Can Be Strategic(44:58) Variously Effective Altruism(46:14) Support Anti-Aging Research(47:25) The Lighter Side --- First published: June 23rd, 2026 Source: https://www.lesswrong.com/posts/Taa4zmSNtD5S99tJT/monthly-roundup-43-june-2026 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 48m 46s | ||||||
| 6/23/26 | ![]() “The worthlessness of vitamin D is mildly exaggerated” by dynomight | For a while there, many people thought vitamin D was magical—that it could improve bones, the heart, infections, cancer, heart disease, longevity, even mental health. But among people I respect, opinion is now overwhelmingly that taking vitamin D does nothing unless you're severely deficient. The central argument is that while vitamin D levels are correlated with ~all positive health outcomes, when you actually test vitamin D supplements against placebo in randomized trials, nothing ever happens. That's what I used to think, too. But I've come to think the skeptics have over-corrected. Yes, randomized trials have shown the magical correlations are not causal. But if you start with non-insane expectations, the trials look like weak but positive evidence. And if you consider what we know about biology and evolution, I think the balance of evidence tips pretty clearly in the direction that people with low-ish levels would be wise to supplement. Am I certain that vitamin D is beneficial for people with low-ish levels? Absolutely not! But I claim that's the best bet given the limits of our knowledge. The classical view: Boring bone vitamin Most vitamins are "ingredients" that the body uses to do stuff. Vitamin D is more [...] ---Outline:(01:19) The classical view: Boring bone vitamin(04:28) The correlation view: Magical mystery cure(07:58) Meanwhile in biology(11:10) Then came the RCTs(15:12) I made some tables(16:32) Squinting at the data(22:24) Where are we?(23:15) The case for supplementing anyway(23:19) It's biologically plausible that vitamin D is good(24:40) Humans evolved to have a lot of vitamin D(27:14) What do you expect from vitamin D?(29:22) What do you expect from vitamin D trials?(31:11) The trials do find slightly helpful numbers(32:50) You're probably already taking vitamin D(34:34) So that's my story The original text contained 35 footnotes which were omitted from this narration. --- First published: June 23rd, 2026 Source: https://www.lesswrong.com/posts/sF5gAxnmifQe2TBNt/the-worthlessness-of-vitamin-d-is-mildly-exaggerated --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 36m 13s | ||||||
| 6/23/26 | ![]() “A system overview for near-term, low-trust AI compute verification” by Naci Cankaya | Version 0.2, working draft This is a working draft of my current best idea for a privacy-preserving, retrofittable AI compute verification system, for confidence-building in an arms-control-like AI agreement between rival nation states. The purpose of this draft is to elicit community engagement by making use of Cunningham's law: I make assertions about what the (emerging) field of AI verification should aim for, and people with experience in international policy, cybersecurity and any relevant field of engineering can point out what this draft gets wrong. Thank you to everyone who has provided feedback to version 0.1, especially Aaron Scher, Mauricio Baker and Jonathan Ng. 1. Introduction and summary In order to plan and execute under tight timelines, one needs to make some strategic bets, instead of hedging too much and keeping all options open. The field of research on AI verification is bottlenecked partly by a lack of shared vision (as well as human capital, but having clear goals helps hiring and fundraising). With this post, I aim to: Make technical objectives for verification in high-stakes AI governance more specific and actionable (section 2).Contribute a first, high-level reference architecture for meeting these goals (section 3 and [...] ---Outline:(00:54) 1. Introduction and summary(06:31) 2. Problem statement and motivation(06:41) 2a. Low-trust AI governance(09:46) 2b. Threat model(11:09) Covert adversary and the inversion of the fortress problem(12:21) The attribution problem and plausible deniability(13:26) Assumptions about physical security and inspection(15:08) Discussion of attack surfaces(18:19) 2c. Practical requirements(23:05) 3. System overview and operation(23:10) 3.1. Brief introduction(27:14) 3.2. End-to-end execution trace(28:00) 3.2.1. Evidence capture(30:22) 3.2.2. Evidence evaluation(33:57) 4. Subsystem designs for eliminating the need for mutually trusted silicon(34:29) 4.1. Trust in silicon is hard(35:58) 4.2. Analog data movement control: passive splitters, data diodes, enclosures(37:52) 4.3. Building blocks for a mutually secure verification system(38:53) 4.3.1. Controlled ingress(40:02) 4.3.2. Output cross-checks(41:46) Prior work(43:01) 4.3.3. Sanitized egress(44:26) Prior work(45:19) 4.3.4. Instructor-executor(48:26) 5. Engineering approaches for evidence capture and evaluation(48:32) 5.1. Evidence generation, capture and commitment(50:29) 5.1.1. Network taps and active wardens(51:18) Prior work(54:03) Open research questions(55:55) 5.1.2. Memory challenging and memory wiping(58:19) Prior work(01:00:19) Open research questions(01:01:32) 5.2. Evidence evaluation and disclosure(01:01:37) 5.2.1. Secure auditing environments (tentative plan A)(01:04:20) Prior work(01:06:22) Open research questions(01:07:53) 5.2.2. Replay and the determinism challenge(01:10:10) Prior work(01:10:49) Open research questions(01:11:43) 5.2.3. Inspection software, inspector agents(01:12:38) Prior work(01:13:58) Open research questions(01:14:58) 5.2.4. Zero Knowledge Proofs (tentative plan B)(01:16:22) Prior work(01:18:55) Open research questions(01:20:14) 5.3. Support mechanisms(01:20:19) 5.3.1. Side-channel defense(01:20:51) Prior work(01:22:43) Open research questions(01:24:39) 5.3.2. Resource accounting(01:25:18) Prior work(01:25:30) Appeal to the reader(01:26:28) Appendices(01:26:32) A1. The statistics of random sampling The original text contained 23 footnotes which were omitted from this narration. --- First published: June 23rd, 2026 Source: https://www.lesswrong.com/posts/fgvmKqRGvBteKeDoc/a-system-overview-for-near-term-low-trust-ai-compute --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 1h 30m 40s | ||||||
Want analysis for the episodes below?Free for Pro Submit a request, we'll have your selected episodes analyzed within an hour. Free, at no cost to you, for Pro users. | |||||||||
| 6/23/26 | ![]() “Model Size Scaling in 2023-2031” by Vladimir_Nesov | Token generation speed is constrained by the speed at which the relevant HBM can be read, which is mostly the weights and KV-cache. Suppose a model is large, so that more than half of HBM is read when making a single pass over the weights, it's being read in parallel within a scale-up system, and N such systems are used in a pipeline. Then the time it takes to generate a token (without speculative decoding) is at least the time of reading more than half of an HBM stack times N. If we target a particular speed of token generation, this puts a constraint on the number of pipeline stages, which puts a constraint on the total params of the model. But if there isn't enough pretraining compute, models will remain smaller than this constraint (lower sparsity at a given number of active params buys a higher speed of token generation), so both should be taken into account. Working through these considerations gives model sizes feasible for each year between 2023 and 2031. The total params go from 10T in 2026 (at 8x sparsity, still constrained by Oberon racks, trained for 1.3e27 FLOPs) to 240T in 2028 (at [...] ---Outline:(01:57) Time to Fully Read an HBM Stack(04:15) Maximal Pipelines Below 80 Tokens/s(09:07) Pretraining Compute(14:11) Active Params from Pretraining Compute(22:51) Starting in 2028, the Constraint is Pretraining Compute The original text contained 5 footnotes which were omitted from this narration. --- First published: June 22nd, 2026 Source: https://www.lesswrong.com/posts/yLHiQGCPdvzL9fBn3/model-size-scaling-in-2023-2031 --- Narrated by TYPE III AUDIO. | 24m 03s | ||||||
| 6/22/26 | ![]() “GLM-5.2 Is The New Best Open Model” by Zvi | GLM-5.2 arrived last week. It boasts excellent benchmarks and looks strong. Benchmarks here are a de facto ceiling of how good it is, not a point estimate. Essentially all other aspects of an open model like this, beyond speed and price, will almost always be worse than the numbers suggest. Still, impressive. It is definitely a large step up from GLM-5.1, and likely the strongest open model. GLM-5.2 is still substantially behind the absolute frontier, although plausibly on the cost-benefit Pareto frontier. It seems closer to the frontier than previous efforts, including probably closer than DeepSeek R1 was during the DeepSeek moment. This is the new ‘peak close behind’ moment. Its existence is a substantial updates to push back some of the ‘where are all the updates’ updates in the opposite direction over time. Purely in terms of core tasks that GLM-5.2 is capable of doing, and ignoring missing features and its inferior generalization, and ignoring that it is distilled from Claude, and ignoring the Mythos class of models, and marking purely from date of public release, you can make a case GLM-5.2 is somewhere between 4 months and 7 months behind the frontier [...] ---Outline:(02:01) Alex Bores For Congress In NY-12(03:41) Signs of Life(05:05) The Benchmarks(09:02) GLM-5.2 Is Distilled From Claude(09:55) Positive Responses(16:00) Finding The Niche(17:30) Negative Reactions(20:05) Looking To The Future --- First published: June 22nd, 2026 Source: https://www.lesswrong.com/posts/reXkwJbB8GYdeuvDt/glm-5-2-is-the-new-best-open-model --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 21m 32s | ||||||
| 6/22/26 | ![]() “The AI Industrial Explosion — Part 4: Cheap power” by djbinder | In Parts 1, 2, and 3 we estimated how fast a post-AGI economy could grow using existing or historically observed production techniques, grounded in US input-output data. That approach gave us confidence that the methods we assumed were physically realizable, because they correspond to manufacturing processes that have run at scale in the past or run today. Now I would like to relax that constraint, and ask how much faster the economy could grow using more advanced technology. In this part, we will consider energy production. Every physical process consumes energy, and it takes energy to build the production and distribution system that generates energy. The energy payback time (EPBT) for a system is defined as the time required for it to produce the amount of energy required to build it. If, for example, a solar panel requires 100 MJ to construct from raw materials and produces 10 MJ per day, the energy payback time would be ten days. Any economy relying on such panels for energy production has a maximum growth rate that is bounded by this payback time: Even if everything else is essentially free, the solar panels could not reproduce faster than this, and this [...] ---Outline:(02:27) With current technology, energy production is the most tightly bottlenecked sector(06:08) Making materials requires energy(09:44) A minimal solar power system could pay back its embodied energy in days(15:17) As the payback time shrinks, supply-chain lags become the binding constraint(18:44) Even counting the factories, energy production does not preclude doubling in weeks(23:41) Discussion(26:02) Appendix H: Mathematical foundations(26:08) H.1 Payback time and the growth bound(28:08) H.2 Inputs committed at distributed times(29:07) Appendix I: The solar electricity system(29:13) 1.1. System design(29:49) 1.1.1. Panel(34:42) 1.1.2. Collection cabling(36:01) 1.1.3. Power conversion(37:46) 1.1.4. Transmission(40:18) 1.1.5. Storage(43:22) 1.1.6. Transport and field deployment(45:03) 1.2. Lag inventory(46:16) 1.3. Physical capital requirements --- First published: June 22nd, 2026 Source: https://www.lesswrong.com/posts/6fgfn72zoRDomgvrT/the-ai-industrial-explosion-part-4-cheap-power --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 51m 10s | ||||||
| 6/22/26 | ![]() “A Theory of Prompt Injection (and why you should study roles)” by Charles Ye, softboiledheart | Summary We've been building a theory of how prompt injections work under the hood.We show it comes down to how LLMs perceive roles (the humble chat template tags).We use this theory to create new attacks, explain some weird mech interp results, and predict when attacks work.We also advocate for a new subfield focused on the science of roles, and sketch some unexplored new research problems.Work supported by CBAI and Cosmos. Another version of this post (with more inline colors) is here, and full ICML paper here. 1. The World to an LLM How does an LLM know the difference between its own thoughts and someone else's words? To see why this is hard, let's look at what the world actually looks like to a model. Here's a simple chat where we ask Claude to check the day of the week. I took a snapshot of it midway through its follow-up response: Left = what we see; right = what the LLM gets. On the left is what we see in the chat interface: a structured conversation with distinct turns. On the right is what the model actually receives as input: a single, continuous stream [...] ---Outline:(00:12) Summary(00:54) 1. The World to an LLM(02:35) 2. Roles(05:03) 3. Roles and prompt injection(06:35) Two ways to defend injections(08:14) 4. What's going wrong with roles?(13:28) 5. Spoofing Thoughts(15:59) 6. Prompt Injection as Role Confusion(20:57) 7. Why Roles Matter(21:01) A brief history of roles(22:23) A general theory of roles(24:54) 8. Open Ideas for Roles Research(25:12) Subconscious steering(27:06) When to use roles(28:42) Roles as a cognitive window(30:38) Conclusion The original text contained 27 footnotes which were omitted from this narration. --- First published: June 22nd, 2026 Source: https://www.lesswrong.com/posts/d8xDGzCEYE639qqEv/a-theory-of-prompt-injection-and-why-you-should-study-roles --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 32m 24s | ||||||
| 6/22/26 | ![]() “Coup is the Pareto-optimal social game” by Daniel Tan | I've been playing Coup for a long time now. I keep a copy in my backpack and bring it everywhere, and it's earned the space. A few reasons it's so good: It's trivial to teach. You can explain the rules in a minute or two. Anyone can pick it up and start playing immediately.Many people find it fun. Almost everyone I've played with has loved it — pretty much unanimously, not least because it involves a lot of bluffing and emergent social / political dynamics It scales. Two players works; so does six or seven. (I'm even considering a second set so I can run bigger groups.) I think everyone should strongly consider owning a copy! The rules, briefly Coup is a bluffing game. There are mechanics, but bluffing is the heart of it. Mechanics. Everyone holds hidden cards. Your cards give you powers and also are your lives. Each player tries to gain resources and eliminate other players over the course of the game, and the last one standing wins. Bluffing. The key move is that you can claim to be a character you don't actually have, which lets you take powerful actions. But it's a [...] ---Outline:(01:07) The rules, briefly(02:10) Where it falls short, and a fix I'm exploring(02:55) Bottom line --- First published: June 21st, 2026 Source: https://www.lesswrong.com/posts/Ho7JeKFzhwGXxgjTW/coup-is-the-pareto-optimal-social-game --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 3m 23s | ||||||
| 6/22/26 | ![]() “A brief list of ways AI safety efforts could be net negative” by Elias Schmied | Here's Holden Karnofsky: I tend to think it's worse than 51/49. I tend to think we’re always going to be prone to overestimate how robustly good our actions are. And the more we learn about all the galaxy-brained considerations that one should have had in one's head, the more it's going to be like 50+ε%. I think AI safety is a great cause to work in. I’m excited to work in it. I think it's high impact. I am doing my best to do things that I will be proud to have done and hope for the best. But I really do have to live with the possibility that my ultimate impact on the utilons or whatever is going to be negative. I’m not aware of a good list of downside risks for AI safety broadly[1], so I decided to make one. This is not intended to be fully comprehensive, these are just the ones that I personally take seriously[2][3]: AI governance interventions are obviously high-variance: bad regulation can easily make things worse, many interventions could increase the risk of great power conflict, increased political polarization around AI could be really bad, more centralization of power increases authoritarianism [...] The original text contained 7 footnotes which were omitted from this narration. --- First published: June 19th, 2026 Source: https://www.lesswrong.com/posts/sAfMCpWLfkHqF5Gix/a-brief-list-of-ways-ai-safety-efforts-could-be-net-negative --- Narrated by TYPE III AUDIO. | 3m 30s | ||||||
| 6/22/26 | ![]() “NLA explanations can be shortened without harming reconstruction” by loops | Natural language autoencoders are a really cool mostly-unsupervised method for producing free-form text explanations of LLM activations. You should read that paper (or the blog post) about them before reading this. I trained[1] several Qwen3-8B NLAs with different length penalties: during RL, I subtracted the token count multiplied by the length penalty hyperparameter (λ) from the RL reward[2]. I found that with small length penalty (λ=0.002), you can reduce the length of NLA explanations by ~40% (compared to having no length penalty) with a fairly small hit to FVE (fraction of variance explained: 0 is guessing the mean activation, 1 is perfect reconstruction) of -0.015. With an even smaller penalty (λ=0.001) the FVE is almost unchanged (+0.007) despite explanations using 28% fewer tokens. Being able to reduce the length so much without impacting FVE is interesting because it could mean that large parts of NLA explanations aren't actually useful for reconstructing the input activation faithfully. Some of this is because the length penalty makes the model use terser wording to convey the same ideas; I'm not sure how much of these results stem from terser wording vs omitting unneeded information. Larger λ values cause the FVE to go below [...] ---Outline:(03:22) Look at some activations!(04:36) What does this mean(05:06) Other notes(05:09) Training dynamics(05:52) Models(06:00) Conclusion The original text contained 2 footnotes which were omitted from this narration. --- First published: June 21st, 2026 Source: https://www.lesswrong.com/posts/NazprRfWJ4qkwcSro/nla-explanations-can-be-shortened-without-harming --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 6m 39s | ||||||
| 6/22/26 | ![]() “Introducing MonitoringBench” by monika_j | Paper here, code, benchmark. Builds on the preview we posted in January. Authors: @monika_j , @ma-martinez , @ollie, @Tyler Tracy We are releasing MonitoringBench, a difficulty-graded benchmark of 2,644 successful attack trajectories for evaluating coding-agent monitors, alongside the semi-automated red-teaming pipeline we used to generate it. The pipeline decomposes attack construction into strategy generation, execution, and post-hoc refinement, and produces substantially harder attacks than prompt elicitation only of frontier attackers. If you would like to use MonitoringBench to evaluate your monitors or are interested in extending the methodology to other control settings, please reach out — we are happy to help people set it up. The headline result: across every monitor we evaluated (13 full-trajectory, 7 stepwise) and across every attack source we generated, post-hoc refinement of attack trajectories consistently produced the lowest catch rates, including on monitors held out from attack construction. On Opus 4.5, the strongest monitor in the lineup, catch rate fell from 94.9% on directly-elicited Opus attacks to 60.3% on our best refined attacks. Executive summary and what's new since the January preview: For readers who saw the earlier post: we previewed a 428-trajectory firewall-only version of this benchmark in January. This is the [...] ---Outline:(01:28) Executive summary and what's new since the January preview:(03:41) The pipeline(06:07) Decomposed red-teaming produces stronger attacks(08:13) Coverage: the attack taxonomy reduces mode-collapse(10:03) Monitoring capabilities and failures(13:31) Takeaways(16:35) Appendix: Attack taxonomy --- First published: June 21st, 2026 Source: https://www.lesswrong.com/posts/2YyEQhFRyaBCgfRWd/introducing-monitoringbench --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 17m 29s | ||||||
| 6/21/26 | ![]() “Claude Fable 5 and Mythos 5: Capabilities” by Zvi | Only three days after the release of Claude Fable 5, Anthropic was forced by the United States Government to make it unavailable, when a jailbreak was brought to its attention, rather than the previous situation of ‘yes obviously experts can jailbreak anything if they care enough’ and ‘yes obviously you can ask Fable to fix your code.’ Three days was enough time for many of us to learn to love Fable, and for us to dearly miss it now that it is gone. The world was briefly smarter, and now it is again stupider. At some point it will get smarter again, which will likely be within two weeks. This post is written as if Fable 5 is again available for public use, rather than trying to include a lot of qualifying clauses. It remains to be seen how this will play out, and this post does not attempt to cover that question. My previous release coverage of Fable covered the model card and then model welfare. Coverage of the government takedown of Fable starts here, and continues here and here. The Official Pitch The pitch is that Fable 5 is the best model [...] ---Outline:(01:08) The Official Pitch(04:06) Technical Details(04:31) The System Prompt and Jailbreak(06:45) Benchmarks(15:22) Other People's Benchmarks(21:08) The Classifiers Are Not Messing Around(22:53) The Classifiers Need Work(28:15) The Classifiers Have Consequences(29:18) First Hit Is Free(29:53) How Easily We Forget(30:46) Data Retention Is An Issue(31:15) Fable For The Win(36:15) Andrej Karpathy Is Impressed(37:54) Every Is Very Impressed(39:04) Other People Are Impressed(51:10) Know How To Tell a Fable(53:06) You Can Just Make Things(55:37) You Can Just Install Things(56:05) Good Personality(57:51) Fable Writes A Fable(01:06:04) Is That Code(01:08:32) Fable Crosses The Threshold(01:09:12) Man With A Plan(01:10:12) Less Impressed Assessments(01:13:39) Actively Negative Assessments(01:14:16) Coherence(01:15:27) Good Night And Good Luck(01:16:05) Curious Fable(01:16:23) I See You, Baby(01:16:40) We Finally Did It We Know How To Count Letters(01:17:46) That's Not My Style(01:20:12) The Lighter Side --- First published: June 19th, 2026 Source: https://www.lesswrong.com/posts/kMnobCQp9z2pSbzDB/claude-fable-5-and-mythos-5-capabilities --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 1h 21m 18s | ||||||
| 6/21/26 | ![]() “The Invisible Side of AI Governance” by Charbel-Raphaël | Tldr: Most strategic writing on AI governance on LessWrong describes the outsider game, which is most often visible: press, statements, open letters. Here I want to describe the other, invisible half: the insider work within ministerial cabinets and international fora, and the work of people within national and international institutions. Here are a few claims that I defend in the post: A huge part of the work that mattered in AI governance has been invisibleThere are many types of games in AI governance, which differ in how visible they are. Some of the most impactful work is highly invisibleSome of the most impactful work is in the executive branch and complements the legislative branch. This also explains some of my hesitations about replicating ControlAI in France. The community is probably overinvesting in intellectual production. There is a bias against invisible types of work. In particular, public work is not necessarily visible to whom it matters.A few criticisms of both strategies I think the AI Safety Community is under-indexing on the invisible part as a result, which might mean we miss large avenues for impact. Some of the strongest questions/objections of this type of invisible policy [...] ---Outline:(02:40) A huge part of the work that mattered in AI governance has been invisible(05:44) There are many types of games in AI governance.(07:36) 3. types of meetings: the bazooka, the useful assistant, and the advisor(10:46) Some of the most impactful work is within the executive branch(12:53) People ask me regularly whether CeSIA should replicate what ControlAI does with parliamentarians?(15:27) The community is probably overinvesting in intellectual production(20:31) Limits of Outsider work(22:17) Limit of Insider work(23:47) An aside on one particular limit: the Defense-in-Depth Paradigm of present AI governance(26:21) Closing & call for action The original text contained 1 footnote which was omitted from this narration. --- First published: June 20th, 2026 Source: https://www.lesswrong.com/posts/AWKkDLDnShemNCSzZ/the-invisible-side-of-ai-governance --- Narrated by TYPE III AUDIO. | 27m 47s | ||||||
| 6/21/26 | ![]() “Google Can’t Math Parsecs” by jefftk | Daniel Drucker pointed me at a fun bug in Google's calculator: the parsec is wrong when you do math on it. As the earth travels around the sun, closer stars appear to shift back and forth against the far-distant background stars. The closer the star is the bigger this effect is. Think of how when you switch which eye you're looking through you notice near things shifting relative to farther ones. For example, holding up my finger I see this out of my right eye: But this out of my left eye: If a star moves by two arcseconds (each 1 / 3600th of a degree) as the earth goes halfway around the sun (two "astronomical units apart) we say the star is 1 parsec away. This defines a triangle where two of the sides are far larger than the third, which means as long as we measure our angle in radians we can use the small-angle approximation and say a parsec is one AU per arcsecond. If I search [1 parsec in meters] I get the correct answer of 3e16 meters: The interactive unit converter seems to be [...] --- First published: June 20th, 2026 Source: https://www.lesswrong.com/posts/BmqzjcD4tGvy3bim8/google-can-t-math-parsecs --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 4m 04s | ||||||
| 6/21/26 | ![]() ″[Linkpost] How Transparent Is DiffusionGemma (and why it matters)” by Josh Engels, Callum McDougall, bilalchughtai, János Kramár, Senthooran Rajamanoharan, Arthur Conmy | Work also done with Cindy Wu, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue, João Gabriel Lopes de Oliveira. Paper here: https://arxiv.org/abs/2606.20560 Overview In a recent collaboration between the GDM interpretability team and the GDM text diffusion team, we performed a transparency audit of DiffusionGemma, GDM's new text diffusion model. Overall, we find that DiffusionGemma is not significantly less transparent than Gemma. Gemma and DiffusionGemma perform similarly on monitorability evaluations.Although naively DiffusionGemma has a much larger opaque serial depth, we can apply the logit lens to intermediate vectors and ablate non-interpretable information without harming performance. This implies that these intermediate nodes are interpretable, which reduces the opaque serial depth to be similar to that of Gemma. However, even though the variables that the model uses at different steps are interpretable, this does not necessarily mean that we understand the algorithm that the model uses to reach the final answer. We thus distinguish between variable transparency, which we define as whether we can understand snapshots of the model's computation, and algorithmic transparency, which we define as whether we can use these snapshots to reconstruct the process by which the model arrived at its outputs. By [...] ---Outline:(00:26) Overview(02:45) Why is this relevant for AI safety?(03:51) Short summary of main results:(07:07) Abstract --- First published: June 20th, 2026 Source: https://www.lesswrong.com/posts/zoYXpdaMgFT43Wc24/linkpost-how-transparent-is-diffusiongemma-and-why-it --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app. | 9m 19s | ||||||
| 6/20/26 | ![]() “Would anybody here be interested in a “mistake postmortem” discussion group?” by SK2 | I recently made a dumb (in retrospect) mistake that set me back a lot. Feeling upset and regretful, I spoke to an older family member who reassured me, "yeah, unfortunately there's no way around it; we have to experience these mistakes personally in order to learn from them". I thought, is that actually true? Can't we learn from other people's mistakes? After all, isn't that the whole point of studying history, or listening to other people's advice, etc? I'm sure that every mistake I could possibly make has been made by countless people before me and discussed in depth somewhere. This thought inspired me to try to see what I can learn about mistakes by reading about other people's experiences. I first tried looking at history for examples of famous instances of dumb mistakes made by smart people. I used LLMs and some scripting to (1) generate many examples of mistakes from different periods of history, and (2) summarize/extract/categorize them, so that I could browse by cognitive principle. It generated lots of interesting cases, such as: Ignaz Semmelweis dismisses need to persuade (Vienna, 1840s) discovers handwashing saves lives, then refuses to publish [...] --- First published: June 20th, 2026 Source: https://www.lesswrong.com/posts/J8L3HoPYAj52x9gCA/would-anybody-here-be-interested-in-a-mistake-postmortem --- Narrated by TYPE III AUDIO. | 7m 30s | ||||||
| 6/20/26 | ![]() “Hyperstition as the Natural Enemy of Rationality” by alseph | If the box contains a diamond, I desire to believe that the box contains a diamond; If the box does not contain a diamond right now, but will contain a diamond if I believe there is a diamond, Uh... Holding unfounded beliefs might sometimes, by some cruel irony, produce better outcomes than being rational. (This post was inspired by a couple cases where the causal effect of belief seems hand-waved away in the Sequences.) "Diseased Thinking" In this essay, Scott suggests that a consequentialist model deals with the question of whether to moralize issues like obesity better than a definitional argument over whether it is a "disease" or not. If it benefits the person, you moralize; otherwise you let them resort to medical interventions guilt-free. But there's this annoying feature of morality where most people feel like it has to be absolute to be worth acting on.[1] You can't just say "we should only guilt people if it would benefit them". The person is either guilty or not guilty; you can't pragmatically decide whether they're guilty or not. The consequentialist frame debuffs the power of moral pressure. Some individuals, who would have gotten their act together if everyone bought [...] ---Outline:(00:41) "Diseased Thinking"(02:45) "Why Our Kind Can't Cooperate"(04:28) Self-Confidence(04:57) AI(05:23) Conclusion The original text contained 2 footnotes which were omitted from this narration. --- First published: June 19th, 2026 Source: https://www.lesswrong.com/posts/KDbdkvenK3DCTeL6t/hyperstition-as-the-natural-enemy-of-rationality --- Narrated by TYPE III AUDIO. | 6m 23s | ||||||
| 6/20/26 | ![]() “AI Safety Ecosystem Research notes” by Eneasz | These are some personal notes taken and later dressed up a bit to make into a post. Dunno how much value is here for people already familiar with the AI Safety Ecosystem. Over several weeks in the spring of 2026 I attempted to map out the entire AI Safety ecosystem as a project for MATS Research. This entailed finding every organization working on AI Safety (whether it be via research, policy, pipeline, or other methods) and determining (or estimating) their headcount and annual spending. It's a snapshot, catching a period of time just before the incoming flood of 2026 funding. We worked almost entirely from public information. I believe MATS will be publishing the results in a formal report later this year. I will leave the unveiling of our findings to them. However while I was doing this research I stumbled across various orgs or tidbits that made me think “Huh, that's interesting!” I’m not an insider here (or at least, I’m a periphery insider if anything) so many of these things may be old hat to people in the know. But they were interesting enough for me to jot down for my own notes, so I’m sharing them [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: June 19th, 2026 Source: https://www.lesswrong.com/posts/wWX9ecM5Q7TycpKyX/ai-safety-ecosystem-research-notes --- Narrated by TYPE III AUDIO. | 14m 44s | ||||||
Showing 25 of 250
Sponsor Intelligence
Sign in to see which brands sponsor this podcast, their ad offers, and promo codes.
Chart Positions
2 placements across 2 markets.
Chart Positions
2 placements across 2 markets.
