Hello from The AI Night,
Today in AI:
Anthropic Launches Claude for Excel, Word, PowerPoint, and Outlook
OpenAI Launches Three Realtime Voice API Models
Google Gemma 4 MTP Drafters Boost Speed 3x
Here's the deal: Anthropic launched Claude for Excel, PowerPoint, and Word as generally available, with Outlook entering public beta. One conversation carries context across all four apps, letting users move between tasks without re-explaining their work.
The Breakdown:
Changes in one open file can automatically update linked content in other open documents, spreadsheets, and slides.
Claude for Outlook triages inboxes, drafts replies with pre-filled recipients and subjects, and creates calendar invites after checking availability.
Conversations persist with each file across sessions, accessible via keyboard or voice.
Enterprise admins get OpenTelemetry support to stream prompts and tool calls to their own security collectors.
Organizations can route traffic through Amazon Bedrock, Vertex AI, or Microsoft Foundry instead of direct API access.
Microsoft 365 Copilot customers can access Claude models directly in Excel and PowerPoint.
The bigger picture: The routing detail is the whole story. Enterprise IT teams don't block Claude because the model is bad. They block it because approving a new vendor takes months. Letting companies route through Bedrock, Vertex, or Foundry turns Claude from a procurement headache into a configuration change. Anthropic just made the IT department's objection disappear.
Here's the deal: OpenAI released three audio models in its Realtime API. GPT-Realtime-2 adds GPT-5-class reasoning to live voice. GPT-Realtime-Translate handles 70+ input languages, and GPT-Realtime-Whisper delivers streaming transcription.
The Breakdown:
GPT-Realtime-2 features a 128K context window, up from 32K, with five adjustable reasoning levels from minimal to xhigh.
On Big Bench Audio, GPT-Realtime-2 scored 96.6% accuracy versus 81.4% for the previous model.
Translate supports 70+ input languages and 13 output languages for live speech translation.
New features include parallel tool calls, spoken preambles, and stronger recovery behavior for production agents.
Pricing: Realtime-2 costs $32/1M input tokens and $64/1M output tokens. Translate runs $0.034/min. Whisper runs $0.017/min.
The bigger picture: Voice AI has been a three-vendor problem. One for transcription, one for translation, one for reasoning. OpenAI just collapsed that into a single API. Whisper at $0.017/min undercuts Deepgram and AssemblyAI on price alone. The play is not selling voice models. It is making OpenAI the only vendor call centers need.
Here's the deal: Google released Multi-Token Prediction drafters for Gemma 4. These lightweight companion models use speculative decoding to deliver up to 3x faster inference with zero quality loss.
The Breakdown:
MTP drafters predict several tokens at once; the target model verifies them in a single forward pass.
Covers the full Gemma 4 lineup: E2B, E4B, 26B MoE, and 31B Dense.
Draft models share the target model's KV cache, eliminating redundant context calculations.
The 26B MoE hits 2.2x speedups at batch sizes 4 to 8 on Apple Silicon, with similar gains on Nvidia A100.
Edge models use clustering in the embedder to further reduce latency.
Ships under Apache 2.0 on Hugging Face and Kaggle, with vLLM, MLX, SGLang, Ollama, and Transformers support.
The bigger picture: Sixty million downloads and the only knock on Gemma was speed. Not quality, not compatibility, not licensing. Just speed. Google fixed the one weakness that kept local inference behind paid APIs and gave it away under Apache 2.0. Every team still budgeting for cloud inference should rerun their cost projections this week. The math just broke in local's favor.
4x more context into every prompt. Zero extra effort.
You think faster than you type. Which means every typed prompt leaves out the constraints, examples, and edge cases that would have made the output actually useful.
Wispr Flow turns your voice into paste-ready text inside any AI tool. Speak naturally — include "um"s, tangents, half-finished thoughts — and Flow cleans everything up. You get detailed, structured prompts without touching a keyboard.
89% of messages sent with zero edits. Used by teams at OpenAI, Vercel, and Clay. Free on Mac, Windows, and iPhone.
What else you need to know:
Anthropic introduced Natural Language Autoencoders, a method that translates Claude's internal activations into plain text, used during pre-deployment safety audits of Opus 4.6 and Mythos Preview to surface unverbalized reasoning.
Perplexity launched Personal Computer in a new Mac app, letting its AI agent run tasks continuously across local files, native Mac apps, and the web from any device.
Cursor launched /orchestrate, a Cursor SDK feature that recursively spawns planner, worker, and verifier agents to handle complex tasks, cutting internal token use by 20% and cold start times by 80%.
Google made Gemini 3.1 Flash-Lite generally available, positioning it as its fastest and most cost-efficient model for high-volume agentic tasks like tool calling, orchestration, and automated pipelines at scale.
Mozilla shipped 423 security bug fixes in Firefox's April releases, with 271 found by Claude Mythos Preview through an agentic harness that generated reproducible test cases across the browser's codebase.
That’s it for today’s edition of The AI Night.
Our goal is to cut through the noise, surface what actually changed, and explain why it matters.
2 ways to support us:
Forward this to your AI-curious friend → https://www.theainight.com
Sponsor The AI Night and reach 500+ AI builders daily → passionfroot.me/theainight






