Hello from The AI Night,
Today in AI:
OpenAI New Flagship Model GPT-5.4 Is Here
Cursor Introduces Automations for Always-On AI Coding Agents
Stripe Goes Live on Vercel's Marketplace
Here's the deal: OpenAI released GPT-5.4, its new flagship reasoning model, across ChatGPT, the API and Codex. It merges GPT-5.3-Codex's coding capabilities with general reasoning and native computer use into a single model.
The Breakdown:
First general purpose OpenAI model with native computer use; 75.0% on OSWorld-Verified, surpassing the 72.4% human baseline and GPT-5.2's 47.3%.
Matches industry professionals on 83.0% of knowledge work tasks (GDPval, 44 occupations) up from 70.9% for GPT-5.2.
Tool search cuts token usage by 47% on MCP-heavy workflows with no accuracy loss.
33% fewer false claims and 18% fewer error-containing responses vs. GPT-5.2
Supports 1M token context window in Codex (experimental).
Priced at $2.50/M input tokens vs. $1.75 for GPT-5.2; Pro tier at $30/M input.
Available now to ChatGPT Plus, Team and Pro users, API model string: gpt-5.4.
The bigger picture: Developers building agentic systems now have a single model that can reason, write code, operate software and handle large tool ecosystems without stitching together specialized models. The token efficiency gains make production deployment meaningfully cheaper despite the higher per-token price.
Here's the deal: Cursor has released Cursor Automations, a system that lets engineers deploy always-on agents triggered by scheduled events or external signals like merged PRs, Linear issues, Slack messages or PagerDuty incidents. Agents run in cloud sandboxes, use configured MCPs and models and verify their own output.
The Breakdown:
Agents spin up in cloud sandboxes per invocation and include a memory tool that lets them improve from past runs.
Built-in integrations cover Slack, GitHub, Linear and PagerDuty; custom events supported via webhooks.
Cursor uses three internal review automations: security audits on every push to main, risk-based PR assignment logged to Notion and incident response that files a proposed fix PR before engineers are paged.
Chore automations handle weekly repo digests, test coverage gaps and bug triage to Linear.
The bigger picture: Code generation has scaled faster than review and maintenance. Cursor Automations directly targets that gap, letting teams apply agent capacity to the parts of the dev lifecycle that still run at human speed.
Here's the deal: Vercel has made Stripe generally available on its Marketplace and in v0, moving beyond the previous Sandbox-only beta. Developers can now connect live Stripe accounts directly to Vercel projects with automated credential provisioning.
The Breakdown:
Single-command setup via vercel integration add stripe creates or connects a Stripe account and auto-provisions environment variables.
A new cryptographic key exchange API, built jointly by Vercel and Stripe, eliminates manual copy-paste of API credentials.
Secret keys are scoped server-side only; publishable keys are separated per environment (dev, preview, production) automatically.
Supported use cases at GA; live ecommerce, SaaS subscriptions, usage-based billing and invoicing.
Previously, going from Sandbox to live payments required manual key retrieval and environment reconfiguration across multiple systems.
The bigger picture: Payment setup has historically been a post-build configuration step requiring manual coordination. This integration collapses that into the same deploy workflow developers already use, reducing both setup friction and credential exposure risk for teams shipping commerce products.
Want to get the most out of ChatGPT?
ChatGPT is a superpower if you know how to use it correctly.
Discover how HubSpot's guide to AI can elevate both your productivity and creativity to get more things done.
Learn to automate tasks, enhance decision-making, and foster innovation with the power of AI.
What else you need to know:
The Pentagon formally designated Anthropic a supply chain risk, the first such label for a U.S. company, after Anthropic refused to remove guardrails blocking autonomous weapons and mass surveillance use.
VS Code 1.110 adds agent plugins, an integrated browser agents can drive, persistent session memory, context compaction, and chat forking to support longer, more complex agentic coding workflows.
Google's Stitch, its AI design tool now supports Gemini 3.1 Pro, offering improved spatial reasoning and better handling of complex design systems, dashboards, and detailed UI generation.
Google researchers trained LLMs to mimic an optimal Bayesian model's predictions, improving probabilistic reasoning and enabling generalization to unseen domains like hotel recommendations and web shopping.
xAI's Grok iOS app reportedly reached 1 million ratings with a 4.9-star average, making it the highest-rated major AI app, ahead of ChatGPT, Gemini, and Claude.
That’s it for today’s edition of The AI Night.
Our goal is to cut through the noise, surface what actually changed, and explain why it matters.
If this was useful, you’ll get the same signal here tomorrow.






