29 Sep 2025- Turing winner Richard Sutton argues LLMs mimic language without goals; advocates continual, reward-driven OaK agents that learn from experience, highlighting autonomy versus safety amid rapid compute scaling.
29 Sep 2025
Richard Sutton — the Turing Award winner often called the “father of reinforcement learning” — told interviewer Dwarkesh Patel that today’s large language models (LLMs) like ChatGPT are fundamentally limited. Sutton’s critique: LLMs are trained to predict what humans would say, not to learn from real-world outcomes. They lack intrinsic goals, can’t be surprised by consequences, and therefore mimic intelligence instead of acquiring it through trial and error. Sutton proposes an alternate approach (which he calls OaK): continual agents that learn from streams of sensation, action, and reward, adapting on the job rather than relying on massive offline training runs.
The Neuron newsletter frames the debate against recent industry moves: OpenAI reportedly expanded compute 9x this year and — per Alex Heath’s reporting on Sam Altman’s Slack messages — plans a further ~125x increase by 2033 (Peter Gostev warns that chip efficiency may understate the real horsepower growth). The piece highlights a paradox: Sutton’s goal-driven agents could be the path to real autonomy — but that autonomy (AIs setting their own goals) raises safety questions, and Sutton even suggests the current “goal-less” LLMs might unintentionally be safer for that reason.