Claude Gets Self-Improving Agents and ChatGPT Cuts Hallucinations by 52%: What You Can Use Right Now
Two of the biggest AI platforms just shipped updates that actually change how you work — not in six months, today. Here's a technical breakdown of what landed, why it matters, and concrete ways to apply it.
Claude's Managed Agents Now "Dream" Between Sessions
Anthropic quietly shipped three new capabilities to Claude Managed Agents on May 7, 2026. The headline feature is called Dreaming — and it's more useful than the name suggests.
What Dreaming Actually Does
Dreaming is a scheduled background process that reviews your agent's past sessions, extracts recurring patterns, and updates its memory store automatically. Think of it as a nightly optimization pass. Your agent notices it keeps misidentifying a variable naming convention in your codebase? It self-corrects before the next run.
You control how aggressive it is:
- Automatic mode — memory updates without review
- Manual mode — you approve changes before they apply
This matters because current AI agents have zero persistence between sessions by default. Every run starts cold. Dreaming breaks that ceiling.
Multiagent Orchestration Is Now Production-Ready
The second new feature: a lead agent can now delegate subtasks to specialist subagents, each with its own model, prompt, and tool access. The entire flow is traceable inside the Claude Console.
Practical example: a lead agent runs a bug investigation while subagents fan out simultaneously through deploy history, error logs, Prometheus metrics, and support tickets. What used to take a developer two hours of context-switching now runs in parallel.
Outcomes: AI That Grades Its Own Work
The third feature — Outcomes — lets you write a rubric in markdown describing what a successful result looks like. A separate grader evaluates the agent's output in its own context window (isolated from the agent's reasoning, so there's no self-justification bias) and sends it back to revise until it passes.
Anthropic's internal benchmarks show Outcomes lifted task success by up to 10 points on the hardest problems in their test suite.
How to use it today: If you're building Claude agents via the Anthropic API, add an outcomes parameter pointing to your rubric file. If you're on Claude.ai, this is visible under the Managed Agents section of the platform console.
ChatGPT's Default Model Now Hallucinates 52.5% Less
OpenAI replaced its default model on May 5, 2026. GPT-5.5 Instant is now what every ChatGPT user — including free tier — runs by default.
The number that matters: on high-stakes prompts in medicine, law, and finance, GPT-5.5 Instant produced 52.5% fewer hallucinated claims than its predecessor (GPT-5.3 Instant). On conversations users had already flagged for factual errors, inaccurate claims dropped 37.3%.
Memory Sources: Finally Transparent Personalization
Alongside the model update, OpenAI rolled out Memory Sources across all consumer ChatGPT plans. This is a transparency layer that shows you exactly which past chats, saved memories, or connected files influenced a specific response.
Before this, ChatGPT's memory was a black box — it "remembered" things, but you couldn't audit what shaped any given answer. Now you can:
- See which past conversation a response pulled from
- Delete specific memories without wiping your entire history
- Use temporary chats that explicitly don't touch memory at all
This is immediately useful for anyone using ChatGPT for recurring professional tasks. If a response feels oddly skewed, you can trace exactly why.
Codex Goes Mobile
OpenAI also shipped Codex in the ChatGPT mobile app. More than 4 million users run Codex weekly. The mobile release lets you monitor active coding tasks from your phone — approve commands, steer execution direction, review diffs and test results — while Codnet keeps running on your Mac or remote environment.
For engineers managing longer-running refactors or CI-adjacent workflows, this closes the loop without requiring a laptop in hand.
Side-by-Side: What to Use Each For Right Now
| Use Case | Best Tool (May 2026) |
|---|---|
| Autonomous multi-step coding agent | Claude Managed Agents + Outcomes |
| Everyday writing and research | ChatGPT GPT-5.5 Instant |
| Self-improving background automation | Claude Dreaming |
| Personalization audit / memory control | ChatGPT Memory Sources |
| Mobile coding oversight | ChatGPT Codex mobile |
| Long-context document analysis (500K+) | Claude |
The Practical Takeaway
Both platforms are moving away from "chat tool" territory into persistent, autonomous systems. Claude is betting on agent infrastructure — memory, self-improvement, and orchestration. OpenAI is betting on model reliability and personalization at scale.
Neither approach is wrong. The smarter move is treating them as complementary layers: use Claude's Managed Agents for structured, recurring engineering workflows where accuracy and iteration matter, and GPT-5.5 for high-volume, lower-stakes tasks where speed and personalization carry more weight.
The features above are live. None of them require a waitlist.
Explore more applied AI breakdowns at AI Engineering Labs.
