Claude Gets Self-Improving Agents and ChatGPT Cuts Hallucinations by 52%: What You Can Use Right Now

Two of the biggest AI platforms just shipped updates that actually change how you work — not in six months, today. Here's a technical breakdown of what landed, why it matters, and concrete ways to apply it.

Claude's Managed Agents Now "Dream" Between Sessions

Anthropic quietly shipped three new capabilities to Claude Managed Agents on May 7, 2026. The headline feature is called Dreaming — and it's more useful than the name suggests.

What Dreaming Actually Does

Dreaming is a scheduled background process that reviews your agent's past sessions, extracts recurring patterns, and updates its memory store automatically. Think of it as a nightly optimization pass. Your agent notices it keeps misidentifying a variable naming convention in your codebase? It self-corrects before the next run.

You control how aggressive it is:

Automatic mode — memory updates without review
Manual mode — you approve changes before they apply

This matters because current AI agents have zero persistence between sessions by default. Every run starts cold. Dreaming breaks that ceiling.

Multiagent Orchestration Is Now Production-Ready

The second new feature: a lead agent can now delegate subtasks to specialist subagents, each with its own model, prompt, and tool access. The entire flow is traceable inside the Claude Console.

Practical example: a lead agent runs a bug investigation while subagents fan out simultaneously through deploy history, error logs, Prometheus metrics, and support tickets. What used to take a developer two hours of context-switching now runs in parallel.

Outcomes: AI That Grades Its Own Work

The third feature — Outcomes — lets you write a rubric in markdown describing what a successful result looks like. A separate grader evaluates the agent's output in its own context window (isolated from the agent's reasoning, so there's no self-justification bias) and sends it back to revise until it passes.

Anthropic's internal benchmarks show Outcomes lifted task success by up to 10 points on the hardest problems in their test suite.

How to use it today: If you're building Claude agents via the Anthropic API, add an outcomes parameter pointing to your rubric file. If you're on Claude.ai, this is visible under the Managed Agents section of the platform console.

ChatGPT's Default Model Now Hallucinates 52.5% Less

OpenAI replaced its default model on May 5, 2026. GPT-5.5 Instant is now what every ChatGPT user — including free tier — runs by default.

The number that matters: on high-stakes prompts in medicine, law, and finance, GPT-5.5 Instant produced 52.5% fewer hallucinated claims than its predecessor (GPT-5.3 Instant). On conversations users had already flagged for factual errors, inaccurate claims dropped 37.3%.

Memory Sources: Finally Transparent Personalization

Alongside the model update, OpenAI rolled out Memory Sources across all consumer ChatGPT plans. This is a transparency layer that shows you exactly which past chats, saved memories, or connected files influenced a specific response.

Before this, ChatGPT's memory was a black box — it "remembered" things, but you couldn't audit what shaped any given answer. Now you can:

See which past conversation a response pulled from
Delete specific memories without wiping your entire history
Use temporary chats that explicitly don't touch memory at all

This is immediately useful for anyone using ChatGPT for recurring professional tasks. If a response feels oddly skewed, you can trace exactly why.

Codex Goes Mobile

OpenAI also shipped Codex in the ChatGPT mobile app. More than 4 million users run Codex weekly. The mobile release lets you monitor active coding tasks from your phone — approve commands, steer execution direction, review diffs and test results — while Codnet keeps running on your Mac or remote environment.

For engineers managing longer-running refactors or CI-adjacent workflows, this closes the loop without requiring a laptop in hand.

Side-by-Side: What to Use Each For Right Now

Use Case	Best Tool (May 2026)
Autonomous multi-step coding agent	Claude Managed Agents + Outcomes
Everyday writing and research	ChatGPT GPT-5.5 Instant
Self-improving background automation	Claude Dreaming
Personalization audit / memory control	ChatGPT Memory Sources
Mobile coding oversight	ChatGPT Codex mobile
Long-context document analysis (500K+)	Claude

The Practical Takeaway

Both platforms are moving away from "chat tool" territory into persistent, autonomous systems. Claude is betting on agent infrastructure — memory, self-improvement, and orchestration. OpenAI is betting on model reliability and personalization at scale.

Neither approach is wrong. The smarter move is treating them as complementary layers: use Claude's Managed Agents for structured, recurring engineering workflows where accuracy and iteration matter, and GPT-5.5 for high-volume, lower-stakes tasks where speed and personalization carry more weight.

The features above are live. None of them require a waitlist.

Explore more applied AI breakdowns at AI Engineering Labs.

The Best AI Tools,
Prompts & Guides for 2026

AI News & Breakthroughs

Best AI Tools, Reviews & Prompt Guides

How to Use AI in Your Daily Life

Claude Managed Agents & GPT-5.5: 2 AI Updates You Can Use Today

Claude Managed Agents & GPT-5.5: 2 AI Updates You Can Use Today

Claude Managed Agents & GPT-5.5: 2 AI Updates You Can Use Today

Claude Gets Self-Improving Agents and ChatGPT Cuts Hallucinations by 52%: What You Can Use Right Now

Claude's Managed Agents Now "Dream" Between Sessions

What Dreaming Actually Does

Multiagent Orchestration Is Now Production-Ready

Outcomes: AI That Grades Its Own Work

ChatGPT's Default Model Now Hallucinates 52.5% Less

Memory Sources: Finally Transparent Personalization

Codex Goes Mobile

Side-by-Side: What to Use Each For Right Now

The Practical Takeaway

The Best AI Tools,Prompts & Guides for 2026

AI News & Breakthroughs

Best AI Tools, Reviews & Prompt Guides

How to Use AI in Your Daily Life

Claude Managed Agents & GPT-5.5: 2 AI Updates You Can Use Today

Claude Managed Agents & GPT-5.5: 2 AI Updates You Can Use Today

Claude Managed Agents & GPT-5.5: 2 AI Updates You Can Use Today

Claude Gets Self-Improving Agents and ChatGPT Cuts Hallucinations by 52%: What You Can Use Right Now

Claude's Managed Agents Now "Dream" Between Sessions

What Dreaming Actually Does

Multiagent Orchestration Is Now Production-Ready

Outcomes: AI That Grades Its Own Work

ChatGPT's Default Model Now Hallucinates 52.5% Less

Memory Sources: Finally Transparent Personalization

Codex Goes Mobile

Side-by-Side: What to Use Each For Right Now

The Practical Takeaway

The Best AI Tools,
Prompts & Guides for 2026