Discussion about this post

User's avatar
Yuan Xue's avatar

Really enjoyed this! Timely and clear coverage of continual learning. Thanks very much for sharing!

One additional thought: the current wave (TTT, agentic context engineering, recursive loops) is mostly centered on the learning mechanism - how to capture knowledge after deployment via scoped weight updates, better context/memory, and closed-loop improvement. But I’m also curious about the other half of the equation: what the “learning data” looks like in production, and how we capture it.

In real systems, the learning signal rarely arrives as clean benchmarks or as raw text such as Common Crawl, It’s often extracted from interaction traces, human edits, preference/feedback, tool outcomes, and implicit signals (latency, retries, abandonment, downstream success). That raises a bunch of open design questions: how to represent these signals (procedural vs. factual vs. preference), how to preserve causality/credit assignment from the feedback signals/rewards. This connects directly to your closing point: the promised land is less about the algorithm, more about the controls for knowing when you got better.

Tarunyaa Sivakumar's avatar

This was such a clarifying read — the “where does learning live?” framing (weight space vs. text space) genuinely shifted how I think about self-improving agents. The way you tie it to the stability–plasticity tradeoff — that forgetting is the price you pay for learning — makes a lot of sense.

The evaluator being the bottleneck is interesting. Yeah I guess we need an external evaluator otherwise there will be mode collapse. But then it makes me wonder if decentralized emergent agent swarm can self-improve without signals outside the system.

I’m curious to see how we solve the context issue. I’ve seen some papers about lossless context compression or another LLM organizing agent memory.

Ready for more?