Hello from The AI Night,
Today in AI:
DeepSeek Launches V4 With Default 1M Context
Anthropic Launches Persistent Memory for Claude Agents
xAI Launches Grok Voice Think Fast 1.0
Here's the deal: DeepSeek released the preview of V4 on April 24, an open-weight MoE series shipped in two variants. V4-Pro and V4-Flash both went live on the DeepSeek API and chat.deepseek.com, with weights posted to Hugging Face.
The Breakdown:
V4-Pro: 1.6T total parameters, 49B active per token. V4-Flash: 284B total, 13B active.
Both models support 1M context natively. DeepSeek states 1M is now the standard across all official services, not a premium tier.
The architecture pairs token-wise compression with DeepSeek Sparse Attention (DSA), which the team credits for cutting long-context compute and memory costs.
API keeps the existing base_url, supports OpenAI ChatCompletions and Anthropic-compatible formats, and exposes thinking and non-thinking modes.
Models are integrated with Claude Code and OpenCode, and DeepSeek says V4 already runs its in-house agentic coding.
The bigger picture: A year ago million-token context cost premium API pricing from Google or Anthropic. DeepSeek just made it free and open-weight. Forty-nine billion active parameters at 1.6 trillion total with full context is not a research preview anymore. It is an open-source baseline that closed labs now have to justify charging for.
Here's the deal: Anthropic launched Memory for Claude Managed Agents in public beta, letting agents retain information across sessions through a filesystem-based memory layer accessible via the Claude Console and a new CLI.
The Breakdown:
Memory mounts directly onto a filesystem, so Claude reads and writes using its existing bash and code execution tools instead of a separate retrieval system.
Stores support scoped permissions: org-wide read-only stores, per-user read/write stores, and concurrent multi-agent access without overwrites.
All memories are files that can be exported and managed via API, with audit logs tracking which agent and session created each entry, plus version rollback and redaction.
Rakuten reports 97% fewer first-pass errors, 27% lower cost, and 34% lower latency on long-running agents using memory.
Wisedocs cut document verification time by 30% by letting agents remember recurring issues across sessions.
Netflix and Ando are using it to replace manual prompt updates and in-house memory infrastructure.
The bigger picture: Every team building production agents has invented its own memory hack. Vector databases, prompt caching, custom retrieval layers. All fragile, all expensive to maintain. Anthropic just made that entire stack disposable. Rakuten's numbers tell the story. Ninety-seven percent fewer first-pass errors means the agent gets smarter every session without anyone touching the code.
Here's the deal: xAI released grok-voice-think-fast-1.0, a flagship voice model built for complex, multi-step workflows in customer support, sales, and enterprise applications. It already powers Starlink's phone sales and support line at +1 (888) GO STARLINK.
The Breakdown:
Topped τ-voice Bench across retail (67.3%), airline (62.3%), and telecom (66%), ahead of Gemini 3.1 Flash Live, GPT Realtime 1.5, and the previous Grok Voice Fast 1.0
Performs reasoning in the background, so it can think through edge cases without adding response latency
Natively supports 25+ languages and is tested on telephony audio, accents, interruptions, and speech disfluencies including spoken corrections
Handles structured data entry (emails, street addresses, phone numbers, account numbers) with read-back confirmation
At Starlink: 20% sales conversion rate on inquiries, 70% of support cases resolved autonomously, one agent using 28 tools across hundreds of workflows
The bigger picture: One in five callers bought Starlink after talking to an AI. Not a demo. A live phone line handling real money. Every call center operator just got a benchmark to measure against. The question is no longer whether voice agents work in production. It is how fast human teams get restructured around them.
The era of manual marketing ends this May!
Manual marketing had a good run.
But the teams winning right now aren't briefing, approving, and repeating. They're directing AI agents that execute the whole strategy for them.
The Agentic Marketing Summit (May 4–8) is a free, five-day event that shows you exactly how it works in practice. Not theory. Not a PDF checklist. Step-by-step insight to help you become an expert in AI marketing agents.
Hosted by 3x Inc 5000 founder Manick Bhan alongside the sharpest minds in the marketing world today.
The era of doing it yourself is over!
What else you need to know:
Cursor launched /multitask in its new Cursor 3 interface, letting async subagents run requests in parallel rather than queue them, alongside improved worktrees and multi-root workspaces for cross-repo work.
OpenAI released GPT-5.5 and GPT-5.5 Pro in its API on April 24, 2026, following the April 23 ChatGPT and Codex launch, with updated safeguards documented in the system card.
Meta announced a deal with AWS to deploy tens of millions of Graviton cores for its agentic AI workloads, becoming one of the largest Graviton customers globally.
Cua released cua-driver, an open-source v0.1 macOS driver using Apple's private SkyLight framework to let any agent operate real Mac apps in the background without hijacking the user's cursor or focus.
OpenAI launched a Bio Bug Bounty for GPT-5.5 in Codex Desktop, offering $25,000 to the first researcher finding a universal jailbreak defeating its five-question bio safety challenge.
That’s it for today’s edition of The AI Night.
Our goal is to cut through the noise, surface what actually changed, and explain why it matters.
2 ways to support us:
Forward this to your AI-curious friend → https://www.theainight.com
Sponsor The AI Night and reach 500+ AI builders daily → passionfroot.me/theainight






