A case for the dreamers 💭✨🌙
The Karpathy interview, Qwen + DeepSeek launches, and the browser as a new OS
A lot of cool launches (Atlas, DeepSeek-OCR, Qwen3-VL-32B, and more) this week, which I’ll cover later in this blog. But if you’ll permit me some whimsy, I’d like to spend a good chunk of this week’s blog on the much debated Dwarkesh x Andrej Karpathy interview. I know the interview was great fodder for the “we’re so over” and “progress is incremental” bear case for AI, but pfff, I don’t want to talk about that. Not yet, anyways. For a brief moment, I’d like to talk about dreams, instead! ✨✨
Early in the interview, Dwarkesh references a neuroscience paper on dreaming:
“Have you seen this super interesting paper that dreaming is a way of preventing this kind of overfitting and collapse? The reason dreaming is evolutionary adaptive is to put you in weird situations that are very unlike your day-to-day reality, so as to prevent this kind of overfitting.
They go on to have a fascinating discussion about how human memory works (bad), which means we are forced to reason - i.e., keep us our “algorithms of thought,” instead of just regurgitating everything we’ve ever learned or read or heard somewhere.”
This was Karpathy’s response:
The LLMs, when they come off, they’re what we call “collapsed.” They have a collapsed data distribution. One easy way to see it is to go to ChatGPT and ask it, “Tell me a joke.” It only has like three jokes. It’s not giving you the whole breadth of possible jokes. It knows like three jokes. They’re silently collapsed.
You’re not getting the richness and the diversity and the entropy from these models as you would get from humans. Humans are a lot noisier, but at least they’re not biased, in a statistical sense. They’re not silently collapsed. They maintain a huge amount of entropy. So how do you get synthetic data generation to work despite the collapse and while maintaining the entropy? That’s a research problem.
The entropy problem
If you pull the thread on that part of their conversation, they’re making two claims:
LLMs collapse particularly when they are training on their own synthetic data
Humans, in turn, prevent collapse with entropy – unique experiences and situations force unique responses. This goes back to the overfitted brain paper, and the claim that dreaming is one way we prevent overfitting/collapse.
Research frontiers: teaching AI to dream
So, how do we make LLMs more intelligent, more human-like? The message I took away from their interview is that a potential research frontier to prevent collapse, and force more entropy into LLM systems, is by encouraging them to, well, “dream” somehow.
Throughout the interview, Dwarkesh & Karpathy come up with various other out-of-the-box, entropy-forcing research ideas, ranging from pushing LLMs to create their own culture for themselves (why shouldn’t LLMs write books for other LLMs and learn from each other?), self-play outside from today’s RL environments, and creating a evolution-like competition system between the models – i.e. have them come up with more and more complex problems for each other rather than 1) humans coming up with them, or 2) a single LLM coming up with benchmarks/evals for itself.
I found this whole thread extremely fascinating. We spend so much time discussing the last mile reliability of agents – from computer use to RL – but I’ve never really stopped to think about how to train LLMs to sample weirdness and randomness and oddities as a means of making them smarter and think novel ideas.
My read of Karpathy’s statement around AI coding not being good enough yet (which sparked a lot of “we’re so over” takes online), was actually not that he thinks it’s so horrible; just that it’s only good for problems it has seen before. That makes sense, that is the perennial “stochastic parrot” point. So how do we build models or agents to “think” in a novel way? We need to somehow engender ways for continuous training that force new environments and thus new solutions.
Said differently – we need to teach AI to dream. :)
The cognitive core thesis
The interview sparked the predictable “bubble popped” vs “we’re so back” discourse this week, but beyond the narrative wars, Karpathy’s most compelling technical insight was about “cognitive cores.” He argues we over-train models with too much noisy internet data, making today’s models bloated with memory work. Refine the data and distill properly, and a ~1B-param core could converse well while looking things up for facts. This connects directly back to the entropy problem: smaller cores that reason, rather than memorize, might actually maintain more diversity in their outputs.
This week’s model releases validate exactly this thesis. Let’s dig in:
This week’s notable model releases
DeepSeek-OCR: context compression through images
DeepSeek released an open model that treats long text as images, compressing historical context 7–20× while maintaining ~97% OCR accuracy on their tests. Think cheaper ingestion + much longer “recall” at a fixed token budget.
Karpathy called this a potential “JPEG moment” for context – tokenizers are awkward and expensive, while pixels may offer a cleaner path for long-context pipelines.
Another thoughtful take, from Jeffrey Emanuel, positioned this as a breakthrough because it could mean a future where frontier LLMs have a 10 or 20 million context window.
Why this matters
If your product spends real money keeping timelines, logs, or documents inside context windows, OCR-style compression + re-decode could be the first practical way to hold far more state without melting your budget.
Qwen3-VL-32B: small, fast, competitive
Alibaba added a 32B visual-language model that (per their reports) matches or beats much larger systems—including on GUI/OSWorld tasks—while staying nimble enough for real products. Even the 2B tier targets edge devices. This is consistent with the “smaller, specialized, cheaper” trend we’ve been calling out for months. The sticky wedges right now remain latency+toolability, open weights+license, and distribution into workflows – i.e. exactly where Qwen keeps winning mindshare!
I expect “smaller, specialized, cheaper” models to keep stealing workloads from frontier stacks; Qwen’s VL-32B is simply the latest proof point.
Also, apparently Brian Chesky (Airbnb) agrees:
This week’s features & launches
The browser is the new OS: ChatGPT Atlas
Just a few weeks back, I wrote about how interfaces were melting – new surfaces, new UIs, new UX paths all abound. We’re back at it this week. OpenAI’s new Atlas browser is very clearly built to make “chat + action” the default web workflow. Memory and a split-screen ChatGPT companion are on by design (prompting some privacy concerns backlash), and there’s inline “cursor chat” for editing in web forms. In short: answers, links, and actions live in one place now.
Why it matters:
We’ve been tracking the shift from “interface as doorway” to “interface as workflow.” Atlas is exactly that move.
One lingering question: did Google drop the ball on being first to an AI-native browser?
It’s tough to change your default interface; it’s classic innovator’s dilemma. But worth remembering that teams inside Google invented some astonishing technology (chromium for browsers; the “attention is all you need” seminal transformers paper).
And from those inventions we now have literally the entire AI industry— OpenAI, Anthropic, Perplexity’s Comet, and every AI agent company in the world. So much of that could have been just… Google!
Anthropic x2
Claude Code on the web
Anthropic’s coding agents have a proper web surface now: spin up multiple tasks, let runs proceed in Anthropic’s cloud, and manage from a browser. This continues the trend of “agents in the interface you already use,” not yet another IDE plugin.
Claude for Life Sciences
Really awesome launch. This follows a thread I’ve been pursuing: last week at Decibel’s annual AI Pioneers Summit, I spoke with Behnam Neyshabur, co-lead of the Discovery team, about Anthropic’s push towards building an AI co-scientist. Awesome to see more “AI x life sciences” features go live.
“Vibe coding,” updated (Google AI Studio)
Google refreshed their vibe coding flow in AI Studio. Funny how quickly narratives shift—wasn’t vibe coding declared dead in October? If only this had launched in September.
My stance hasn’t changed: vibe coding is fun for greenfield and internal tools; for production, keep the rails (linting, tests, review) close. Or, as Simon Willison put it last week, do some “vibe engineering” instead.
“Project Mercury”: agents go to Wall Street
This is more of a future launch, but current news bucket: Bloomberg reports OpenAI has >100 ex-bankers training a system to automate junior-banker work (models, memos, comps) under the internal codename Mercury. RIP to the quarter I spent learning corporate financial modeling. I used to submit all my problem sets under the team codename “Models & Bottles,” which never ceased to make me laugh at myself. In hindsight, why did I ever learn how to build an LBO? (although, learning to build a VC proceeds waterfall actually WAS useful). Anyways – I digress.
What’s interesting about this is less that we’ll have AI excel wizards helping us build and debug financial models (about time), but that it shows yet again that OpenAI is intent on building not just generalized models, but also domain-specific agents tied to very specific workflows.
US-made Blackwell wafers: symbolic, but meaningful
Last launch I’ll cover, which isn’t really a launch per se, but certainly is cool news: TSMC’s Phoenix fab produced the first U.S.-made wafer that will become NVIDIA Blackwell chips – a big milestone for de-risking supply chains. It’s an early step, not full localization, but it points in the right direction given demand. I have a lot of heartburn that the entire AI industry is built around a single Taiwan point of failure. May we build many more chips in the USA! 🇺🇸
Parting thoughts
The pattern across this week’s launches is clear: compression + specialization + better interfaces. DeepSeek and Qwen prove you don’t need frontier-scale models for production work. Atlas and Claude Code prove the interface matters as much as the model. And somewhere in the background, the real research frontier – teaching AI to maintain entropy, to dream – waits for someone to crack it.
We’re neither in a bubble pop nor at an AGI finish line. We’re in the messy middle where transformational technology always lives: building, iterating, occasionally breaking through.
The dreamers might be onto something after all. ✨
















