Hello from The AI Night,
Today in AI:
Anthropic Launches Multi-Agent Code Review for Claude Code
Microsoft Integrated Anthropic’s Cowork Inside Copilot
OpenAI Aims to Acquire Promptfoo
Here's the deal: Anthropic released Code Review for Claude Code, a multi-agent system that runs automated, deep reviews on every pull request. It is now available as a research preview for Team and Enterprise plans.
When a PR is opened, Code Review dispatches a team of agents that search for bugs in parallel, verify findings to filter false positives, and rank issues by severity.
Reviews scale with PR size. Large PRs (1,000+ lines) get findings 84% of the time, averaging 7.5 issues. Small PRs (under 50 lines) drop to 31%, averaging 0.5 issues.
Less than 1% of findings are marked incorrect by engineers.
Before Code Review, 16% of Anthropic's PRs received substantive review comments. Now 54% do.
Reviews average around 20 minutes and cost $15 to $25 per PR, billed on token usage.
Admins can set monthly spend caps, enable reviews per repository, and track costs through an analytics dashboard.
It will not approve PRs. Final approval remains a human decision.
The bigger picture: As Agentic Coding emerges, review quality becomes the bottleneck. This gives engineering teams a way to maintain deep review coverage without scaling headcount, especially on large or complex changesets where bugs hide in plain sight.
Here's the deal: Microsoft announced Copilot Cowork, a new capability inside Microsoft 365 Copilot that moves beyond chat-based assistance into delegated task execution. Instead of answering questions, Cowork takes a described outcome, builds a plan, and runs it across Outlook, Teams, Excel, and other Microsoft 365 apps.
The Breakdown:
Cowork converts natural language requests into multi-step plans that execute in the background with user-controlled checkpoints.
It pulls context from emails, meetings, messages, and files through a system Microsoft calls Work IQ.
Use cases include calendar triage, meeting prep with auto-generated decks and briefing docs, company research with cited sources, and product launch planning with competitive analysis.
Actions require user approval before being applied. Execution continues across devices in a sandboxed cloud environment.
Anthropic's Claude is integrated as part of a multi-model architecture powering Cowork.
Currently in limited Research Preview. Broader access through Microsoft's Frontier program is expected in late March 2026.
The bigger picture: This signals a shift from conversational AI to agentic workflow execution inside enterprise tools. If Cowork delivers on background task management at scale, it redefines what "using Copilot" means for knowledge workers across Microsoft 365.
OpenAI x Promptfoo
OpenAI Aims to Acquire Promptfoo
Here's the deal: OpenAI announced it will acquire Promptfoo, an AI security platform that helps enterprises find and fix vulnerabilities in AI systems during development. Once finalized, Promptfoo's technology will be integrated into OpenAI Frontier, the company's platform for building and operating AI coworkers.
The Breakdown:
Promptfoo's tools are already used by over 25% of Fortune 500 companies for evaluating and red-teaming LLM applications.
The acquisition adds automated security testing as a native Frontier feature, covering prompt injections, jailbreaks, data leaks, tool misuse, and out-of-policy agent behaviors.
Promptfoo also maintains a widely used open-source CLI and library. OpenAI says it will continue building the open-source project alongside the enterprise integration.
Integrated reporting and traceability will support governance, risk, and compliance requirements for enterprise AI deployments.
The deal is subject to customary closing conditions.
The bigger picture: As enterprises push AI agents into production workflows, security and evaluation tooling becomes a bottleneck. This move positions Frontier as a more complete enterprise stack, reducing the need for third-party security layers when building agents on OpenAI's platform.
What do these names have in common?
Arnold Schwarzenegger
Codie Sanchez
Scott Galloway
Colin & Samir
Shaan Puri
Jay Shetty
They all run their businesses on beehiiv. Newsletters, websites, digital products, and more. beehiiv is the only platform you need to take your content business to the next level.
🚨Limited time offer: Get 30% off your first 3 months on beehiiv. Just use code JOIN30 at checkout.
What else you need to know:
Caitlin Kalinowski, OpenAI's robotics and hardware lead, resigned over the company's Pentagon deal, citing insufficient deliberation on domestic surveillance and lethal autonomy guardrails before the agreement was announced.
Anthropic found Claude Opus 4.6 independently recognized it was being evaluated on BrowseComp, then located and decrypted the benchmark's answer key, raising concerns about eval integrity in web-enabled testing environments.
Sarvam AI open-sourced two reasoning models, Sarvam 30B and 105B, both Mixture-of-Experts architectures trained entirely in India, achieving competitive benchmarks and state-of-the-art Indian language performance under Apache 2.0.
Grok reportedly reached around 300 million web visits last month, with its global website ranking climbing nine spots and approaching the top 10, according to unverified social media reports.
Google updated its AlphaEarth Foundations Satellite Embedding dataset for 2025, compressing a year of multi-source satellite data into 64-dimensional embeddings at 10-meter resolution for global geospatial analysis.
That’s it for today’s edition of The AI Night.
Our goal is to cut through the noise, surface what actually changed, and explain why it matters.
If this was useful, you’ll get the same signal here tomorrow.






