In partnership with

Hello from The AI Night,

Today in AI:

  • Anthropic's Most Powerful Model Revealed in Data Leak

  • Microsoft Launches Multi-Model Deep Research With Critique and Council

  • OpenAI Announces Sora Discontinuation, Video App and API Shutting Down

Here's the deal: A CMS misconfiguration at Anthropic left nearly 3,000 unpublished assets in a publicly searchable data store including draft blog posts detailing an unreleased model called Claude Mythos. Security researchers and Fortune discovered the exposed files before Anthropic restricted access.

The Breakdown:

  • Leaked drafts describe Mythos under two names: "Mythos" as the model generation and "Capybara" as a new tier above Opus, Haiku and Sonnet.

  • Internal benchmarks in the drafts claim it significantly outperforms Claude Opus 4.6 in coding, academic reasoning and cybersecurity.

  • Anthropic's own draft flags the model's cyber capabilities as a dual-use risk, warning it could exploit vulnerabilities faster than defenders can patch them.

  • Planned rollout starts with a restricted early access program for cyber defense organizations.

  • The draft notes the model is expensive to serve with no general availability date set.

  • Anthropic confirmed the model's existence after the leak, calling it "a step change" in capability.

The bigger picture: The leak reveals Anthropic is building beyond its current Opus ceiling. A defender-first rollout for a model with flagged cyber risks sets a new precedent for how frontier labs may gate their most capable systems.

Here's the deal: Microsoft introduced two new multi-model capabilities for Researcher, its deep research agent inside Microsoft 365 Copilot. Critique pairs a generation model with a separate review model, while Council runs Anthropic and OpenAI models side by side for comparative analysis.

The Breakdown:

  • Critique splits research into two roles: one model handles planning, retrieval and drafting, while a second model reviews for source reliability, completeness and evidence grounding before producing the final report.

  • On the DRACO benchmark (100 complex tasks across 10 domains), Critique scored 7.0 points higher than single model Researcher and 13.88% above Perplexity Deep Research with Claude Opus 4.6, previously the top performing system.

  • Statistically significant gains appeared across all four DRACO dimensions with the largest in breadth and depth of analysis (+3.33) and presentation quality (+3.04).

  • Council generates two independent reports from Anthropic and OpenAI models, then a judge model summarizes where they agree, diverge or contribute unique findings.

  • Both features are now available in Microsoft's Frontier program.

The bigger picture:  This signals a shift toward multi-model architectures as the default for enterprise research tools. Teams using Microsoft 365 Copilot get measurably better accuracy and depth without switching platforms.

Here's the deal: OpenAI announced it is fully discontinuing Sora, its AI video generation tool. The web and mobile app shut down on April 26, 2026. The Sora API shuts down on September 24, 2026. All user data will be permanently deleted after the shutdown window closes.

The Breakdown:

  • Users must download videos and images from their Sora library before April 26. OpenAI is still deciding whether a brief post-shutdown export window will be offered.

  • Unused ChatGPT/Sora credits can be redirected to Codex, OpenAI's coding agent. Refunds follow standard ChatGPT subscription policy.

  • Per a WSJ investigation, Sora's users peaked around one million then collapsed below 500,000 while the app burned roughly $1 million per day in compute.

  • The Disney partnership which included a $1 billion investment and licensing of 200+ characters is now dead. No money changed hands.

The bigger picture: OpenAI is trading an expensive consumer experiment for compute it needs to compete on coding and enterprise AI. For developers on the Sora API, the September deadline gives six months to migrate to alternatives like Google Veo or Kling.

88% resolved. 22% stayed loyal. What went wrong?

That's the AI paradox hiding in your CX stack. Tickets close. Customers leave. And most teams don't see it coming because they're measuring the wrong things.

Efficiency metrics look great on paper. Handle time down. Containment rate up. But customer loyalty? That's a different story — and it's one your current dashboards probably aren't telling you.

Gladly's 2026 Customer Expectations Report surveyed thousands of real consumers to find out exactly where AI-powered service breaks trust, and what separates the platforms that drive retention from the ones that quietly erode it.

If you're architecting the CX stack, this is the data you need to build it right. Not just fast. Not just cheap. Built to last.

What else you need to know:

Google expanded its Live Translate feature to iOS and added support for seven new countries across both iOS and Android, covering 70-plus languages via headphones.

Claude Code creator Boris Cherny shared a 17-tweet thread highlighting under-used features including mobile coding via iOS, cross-device session teleporting, and automated task scheduling through /loop and /schedule commands.

Anthropic CEO said Claude now writes most of its own code and helps design its next version, with engineers reviewing output rather than coding directly

OpenAI demoed a clinic concierge voice agent built with gpt-realtime-1.5 that converses with patients, gathers intake details, and books appointments in real time.

Eli Lilly and Insilico Medicine signed an AI drug discovery deal worth up to $2.75 billion, with $115 million upfront, expanding a collaboration that began in 2023

That’s it for today’s edition of The AI Night.

Our goal is to cut through the noise, surface what actually changed, and explain why it matters.

If this was useful, you’ll get the same signal here tomorrow.

Reply

Avatar

or to participate

Keep Reading