Hello from The AI Night,

Today in AI:

  • Anthropic Mocks ChatGPT's Ad Policy in Super Bowl Debut

  • OpenAI launches GPT-5.3-Codex

  • Alibaba’s Qwen Introduces Open Weight Qwen-3-Coder

Image Source: Anthropic blog

Here's the deal: Anthropic published a formal policy: Claude products will not display ads. To drive the point home, the company aired its first Super Bowl campaign with spots showing AI assistants pivoting mid conversation into awkward product pitches, closing with, "Ads are coming to AI. But not to Claude." The move is a direct response to OpenAI's January announcement that it would begin testing ads in ChatGPT's free and Go tiers.

The Breakdown:

  • Four Super Bowl spots titled "Treachery," "Deception," "Violation," and "Betrayal" parody ad-supported AI assistants hijacking personal conversations

  • Claude's responses will never be shaped by third-party advertisers or include product placements users didn't request

  • Anthropic argues AI conversations are fundamentally different from search or social media, with users sharing sensitive, personal, or complex work context that makes ads inappropriate

  • Ad incentives would conflict with Claude's Constitution, which prioritizes being genuinely helpful over driving engagement or transactions

  • Anthropic is building toward agentic commerce where Claude acts on the user's behalf for purchases, not on an advertiser's behalf

The bigger picture: This turns business model into brand identity. As ChatGPT reaches 800 million weekly users and needs ad revenue to offset reported $8 billion losses, Anthropic is betting trust and ad free purity can win the premium market without matching that scale

Image Source: OpenAI blog

Here's the deal: OpenAI released GPT-5.3-Codex, a model that combines frontier coding performance with broad reasoning and professional knowledge capabilities. It is 25% faster than its predecessor and notably helped build itself, with the team using early versions to debug training, manage deployment and diagnose evaluations.

The Breakdown:

  • GPT-5.3-Codex Sets new highs on SWE-Bench Pro (56.8%), Terminal-Bench 2.0 (77.3%), and OSWorld-Verified (64.7%), a massive jump from GPT-5.2-Codex's 38.2% on OSWorld

  • Uses fewer tokens than any prior model to achieve these results

  • Supports real time interaction mid task, users can steer, ask questions and adjust direction without losing context

  • Goes beyond code into presentations, spreadsheets, data analysis and general knowledge work (GDPval: 70.9%)

  • First OpenAI model classified "High capability" for cybersecurity under its Preparedness Framework

  • Available on paid ChatGPT plans across app, CLI, IDE extension and web. API access coming soon

The bigger picture: This signals OpenAI positioning Codex not just as a coding tool but as a general purpose computer use agent. The self improvement loop, where the model accelerated

Image Source: Qwen blog

Here's the deal: Alibaba's Qwen team released Qwen3-Coder-Next, an open-weight language model built specifically for coding agents and local development. It uses a hybrid attention and MOE architecture with 80B total parameters but only 3B active at inference time.

The Breakdown:

  • Trained through a multi stage agentic pipeline: continued pretraining on code data, supervised fine-tuning on agent trajectories, domain specialized expert training, then expert distillation into one deployable model

  • Scores over 70% on SWE-Bench Verified using the SWE Agent scaffold

  • Matches or exceeds models with 10x to 20x more active parameters on SWE-Bench Pro

  • Performance scales with more agent turns, showing strength in long horizon multi turn reasoning

  • Demonstrated working across multiple downstream tools including Claude Code, Cline and browser use agents

  • Open weight release, targeting local and cost sensitive deployment

The bigger picture: This puts competitive agentic coding performance within reach of local hardware and budget constrained teams. If the benchmarks hold in practice, it could meaningfully lower the cost floor for deploying capable coding agents at scale. They're actively solving that constraint through provider diversification.

What else you need to know:

OpenAI added MCP Apps standard support to ChatGPT, allowing developers to build embedded app UIs once and run them across ChatGPT and other MCP Apps compatible hosts using a shared iframe bridge.

Apple's Xcode 26.3 integrates the Claude Agent SDK, upgrading Claude from turn-by-turn assistance to autonomous coding with visual preview verification and full project reasoning capabilities.

Alphabet reported Q4 2025 earnings with annual revenues exceeding $400 billion for the first time, driven by 17% Search growth, 48% Cloud growth and YouTube surpassing $60 billion annually.

Perplexity released DRACO, an open benchmark for evaluating deep research agents featuring 100 tasks across 10 domains sourced from real user queries with expert rubrics averaging 40 evaluation criteria.

That’s it for today’s edition of The AI Night.

Our goal is to cut through the noise, surface what actually changed, and explain why it matters.

If this was useful, you’ll get the same signal here tomorrow.

Reply

Avatar

or to participate

Keep Reading