Hello from The AI Night,
Today in AI:
OpenAI Launches GPT-5.5, Outperforming Claude Opus 4.7
Anthropic Expands Claude to 15 Consumer Apps Including Uber and Spotify
Alibaba Open-Sources Qwen3.6-27B, Outperforms 397B Model on Coding
Here's the deal: OpenAI launched GPT-5.5 and GPT-5.5 Pro in ChatGPT and Codex for Plus, Pro, Business, and Enterprise users. The models target agentic coding, computer use, and research workflows, matching GPT-5.4's latency while using fewer tokens per task. API access is coming soon.
The Breakdown:
Matches GPT-5.4 per-token latency while using fewer tokens to complete the same Codex tasks.
Hits 82.7% on Terminal-Bench 2.0, ahead of Claude Opus 4.7 (69.4%) and Gemini 3.1 Pro (68.5%).
Scores 84.9% on GDPval and 78.7% on OSWorld-Verified for knowledge work and computer use.
API priced at $5 per 1M input and $30 per 1M output tokens, with a 1M context window; Pro lands at $30 / $180.
Co-designed, trained and served on NVIDIA GB200 and GB300 NVL72 systems.
Rated "High" in biology and cybersecurity under OpenAI's Preparedness Framework, below Critical.
The bigger picture: OpenAI just did something expensive models rarely do. It got faster while getting smarter. Fewer tokens per task at the same latency means enterprise teams pay less for better results. The Terminal-Bench gap over Claude and Gemini is wide enough that engineering leads will benchmark this before the week ends.
Here's the deal: Anthropic expanded Claude's connector directory to include consumer apps for travel, food, entertainment, and finance. The update adds 15 new integrations and introduces dynamic in-conversation app suggestions based on user context.
The Breakdown:
New connectors include AllTrails, Audible, Booking.com, Instacart, Credit Karma, TurboTax, Resy, Spotify, StubHub, Taskrabbit, Thumbtack, Tripadvisor, Uber, Uber Eats, and Viator.
The directory now spans over 200 connectors since its July 2025 launch.
Claude surfaces relevant apps based on conversation context, and shows multiple options when more than one applies, ranked by usefulness.
Connected app data is not used to train Anthropic models, and each app cannot see other Claude conversations.
Claude requires user confirmation before bookings or purchases.
Available across all plans, with mobile in beta, and Anthropic states Claude remains ad-free with no paid placements.
The bigger picture: Anthropic kept saying Claude would never show ads. This is the alternative. Instead of selling attention to advertisers, Claude becomes the layer between users and services, taking a cut when someone books an Uber or orders through Instacart. Two hundred connectors is not a feature list. It is a revenue model taking shape.
Here's the deal: Alibaba open-sourced Qwen 3.6-27B, a dense 27-billion-parameter multimodal model that supports both thinking and non-thinking modes. It is positioned as an agentic coding model at a widely deployable scale, available via open weights and API.
The Breakdown:
Outperforms Qwen3.5-397B-A17B (397B total, 17B active MoE) on every major coding benchmark despite having roughly 15x fewer total parameters.
SWE-bench Verified 77.2 vs 76.2, SWE-bench Pro 53.5 vs 50.9, Terminal-Bench 2.0 59.3 vs 52.5, and SkillsBench 48.2 vs 30.0.
Scores 87.8 on GPQA Diamond and 94.1 on AIME26, competitive with significantly larger models.
Dense architecture avoids MoE routing complexity; natively handles text, images, and video in one checkpoint.
Available on Qwen Studio, Hugging Face, ModelScope, and Alibaba Cloud Model Studio, with support for Claude Code, Qwen Code, and OpenClaw.
The bigger picture: Alibaba's own 397B model just got beaten by its own 27B model. That is not incremental progress. That is a generation of scaling assumptions collapsing in a single release. If 27 billion dense parameters can do what 397 billion couldn't, every team paying for massive model inference should be asking what they are actually paying for.
Talk to your AI tools the way you'd talk to a colleague.
You don't send a colleague a three-word brief. You explain the context, the constraints, what you've already tried. But typing all that into ChatGPT takes forever — so you don't.
Wispr Flow lets you speak your prompts instead. Talk through your thinking naturally and get clean, paste-ready text. No filler words. No cleanup. Just detailed prompts that actually get you useful answers on the first try.
Millions of users worldwide. Works system-wide on Mac, Windows, and iPhone.
What else you need to know:
Moonshot released Kimi K2.6 Agent Swarm, raising parallel sub-agents from 100 to 300 and steps per run from 1,500 to 4,000, with outputs delivered as files rather than chat.
OpenAI launched ChatGPT for Excel and Google Sheets in beta, letting users build spreadsheets, edit formulas, and query data in plain language across Business, Enterprise, Edu, Pro, and Plus tiers.
Luca Ronin released Tolaria, a free open-source macOS app for managing markdown knowledge bases, designed as a collaboration surface where AI agents can create, connect, and edit notes alongside humans.
OpenAI updated Codex with GPT-5.5, extending its reach across the browser, files, docs, and the user's computer, and expanding browser use to interact with web apps, test flows, and iterate via screenshots.
Developer Zain Shah introduced Flipbook, a prototype that streams every on-screen pixel directly from an AI model, eliminating HTML, layout engines, and conventional code rendering.
That’s it for today’s edition of The AI Night.
Our goal is to cut through the noise, surface what actually changed, and explain why it matters.
3 ways to support us:
Forward this to your AI-curious friend → https://www.theainight.com
Sponsor The AI Night and reach 500+ AI builders daily → passionfroot.me/theainight
Reply to this email — I read every response






