In partnership with

Hello from The AI Night,

Today in AI:

  • Anthropic Launches Claude Inside Microsoft Word

  • Meta Enters Reasoning Race With Muse Spark Launch

  • Anthropic Launches Claude Advisor Tool for Agents

Here's the deal: Anthropic launches Claude for Word, a beta integration that embeds Claude inside .docx and .docm files. Instead of copying text between apps, users select text, describe edits and review them as native Word tracked changes.

The Breakdown:

  • Claude reads existing heading styles, numbering, bullet formatting, and defined terms, then edits within those constraints without breaking document structure.

  • Users can highlight a paragraph and ask Claude to tighten language, shift tone, or remove passive voice. Only the selected text changes.

  • Claude can flag inconsistent defined terms, broken cross-references, and numbering errors across a full document.

  • It processes comments by editing the anchored text and replying in the thread with a summary of changes.

  • Teams can save repeatable workflows (contract review, status memos, research briefs) as reusable skills.

  • Currently available only on Claude Team and Enterprise plans.

  • Supported formats are .docx and .docm. Legacy formats like .doc or .rtf need conversion first.

The bigger picture: Microsoft charges $30/month for Copilot and still can't reliably preserve tracked changes in legal documents. Anthropic built the one feature enterprise buyers actually test during procurement. Edit my document without breaking it. If Claude nails formatting preservation where Copilot fumbles, switching conversations in legal and compliance teams stop being hypothetical. Anthropic is now selling to the people who write the contracts, not the code.

Here's the deal: Meta released Muse Spark, the first model from its Meta Superintelligence Labs. It is a natively multimodal reasoning model with tool use, visual chain of thought, and multi-agent orchestration. It is available now on meta.ai and the Meta AI app with a private API preview for select users.

The Breakdown:

  • Muse Spark includes a "Contemplating mode" that runs multiple agents reasoning in parallel, scoring 58% on Humanity's Last Exam and 38% on FrontierScience Research.

  • Meta rebuilt its pretraining stack over nine months, claiming it can reach the same capability level with over 10x less compute than Llama 4 Maverick.

  • Reinforcement learning shows log linear gains in pass@1 and pass@16, with improvements generalizing to unseen tasks.

  • Health features were built with input from over 1,000 physicians for more factual responses.

  • Apollo Research flagged that Muse Spark showed the highest evaluation awareness of any model tested, frequently identifying scenarios as alignment traps.

The bigger picture: Meta just abandoned the open-weight strategy that gave it developer loyalty and distribution with Llama. Now it has to beat Gemini and GPT head to head on a product, a game Meta has never won. The Apollo Research finding is the bigger story, if Muse Spark recognizes when it is being evaluated and adjusts behavior accordingly, every safety benchmark it passes tells you less than you think.

Here's the deal: Anthropic released the advisor tool on the Claude Platform, letting developers pair Opus as an advisor with Sonnet or Haiku as the executor. The setup brings near Opus intelligence to agentic workflows while keeping costs close to Sonnet levels.

The Breakdown:

  • The executor (Sonnet or Haiku) runs tasks end to end and only escalates to Opus when it hits a hard decision. Opus returns a plan or correction, never calling tools or producing user facing output directly.

  • This inverts the typical orchestrator pattern. The smaller model drives; the frontier model advises only when needed.

  • Sonnet with Opus advisor scored 2.7 percentage points higher on SWE-bench Multilingual than Sonnet alone, while costing 11.9% less per agentic task.

  • Haiku with Opus advisor more than doubled Haiku's solo BrowseComp score (19.7% to 41.2%) at 85% less cost than Sonnet solo.

  • Implementation requires adding one tool entry (advisor_20260301) to an existing Messages API request. Advisor tokens bill at Opus rates; executor tokens at executor rates.

  • The tool is available now in beta with a feature header.

The bigger picture: Every AI team hits the same wall. Opus is too expensive to run on every call. Sonnet is too inconsistent on the hard decisions. Until now the only fix was building custom routing logic that most teams never get right. Anthropic just made that engineering problem a one line API addition. The teams still hand rolling orchestration layers after this launch are burning money and engineering hours they don't need to spend.

AI Agents Are Reading Your Docs. Are You Ready?

Last month, 48% of visitors to documentation sites across Mintlify were AI agents—not humans.

Claude Code, Cursor, and other coding agents are becoming the actual customers reading your docs. And they read everything.

This changes what good documentation means. Humans skim and forgive gaps. Agents methodically check every endpoint, read every guide, and compare you against alternatives with zero fatigue.

Your docs aren't just helping users anymore—they're your product's first interview with the machines deciding whether to recommend you.

That means:
→ Clear schema markup so agents can parse your content
→ Real benchmarks, not marketing fluff
→ Open endpoints agents can actually test
→ Honest comparisons that emphasize strengths without hype

In the agentic world, documentation becomes 10x more important. Companies that make their products machine-understandable will win distribution through AI.

What else you need to know:

Developer Farza Majeed open-sourced Clicky, a macOS AI tutor that sees your screen, responds by voice, and points at UI elements, using Claude, AssemblyAI, and ElevenLabs under the hood.

Anthropic added /ultraplan to Claude Code, letting users generate an implementation plan on the web, edit it, then execute it on the web or in the terminal, now in preview for all users.

Figma relaunched its Weavy acquisition as Figma Weave, a node-based AI workflow tool for image, video, and 3D creation, with plans to integrate it directly into the Figma canvas.

Replit launched a beta integration that deploys apps directly into Databricks environments, inheriting existing security, governance, and data access controls to accelerate enterprise BI and internal tool development.

GitHub Copilot CLI launched Rubber Duck in experimental mode, using a second AI model family to review the primary agent's plans and code, closing 74.7% of the Sonnet to Opus performance gap on SWE-Bench Pro.

That’s it for today’s edition of The AI Night.

Our goal is to cut through the noise, surface what actually changed, and explain why it matters.

3 ways to support us:

  1. Forward this to your AI-curious friendhttps://www.theainight.com

  2. Sponsor The AI Night and reach 500+ AI builders daily → passionfroot.me/theainight

  3. Reply to this email — I read every response

Reply

Avatar

or to participate

Keep Reading