Heroku AI: Accelerating AI Development With New Models, Performance Improvements, and Messages API

This month marks significant expansion for Heroku Managed Inference and Agents, directly accelerating our AI PaaS framework. We’re announcing a substantial addition to our model catalog, providing access to leading proprietary AI models such as Claude Opus 4.5, Nova 2, and open-weight models such as Kimi K2 thinking, MiniMax M2, and Qwen3. These resources are fully managed, secure, and accessible via a single CLI command. We have also refreshed aistudio.heroku.com, please navigate to aistudio.heroku.com from your Managed Inference and Agents add-on to access the models you have provisioned.

Whether you are building complex reasoning agents or high-performance consumer applications, here’s what’s new in our platform. All of the open-weight models you access on Heroku are running on secure compute on AWS servers. Neither Heroku nor the model provider has access to your data and it is not used in training.

Expanding Heroku’s AI catalog with new state of the art models

Claude 4.5 models

We now support the full Claude 4.5 family in both US and EU regions, replacing the prior Claude 3 models which are scheduled for depreciation in January of 2026.

Claude Opus 4.5: Designed for deep reasoning, complex task orchestration, and long-horizon planning. Recommended for demanding agentic workflows.
Claude Sonnet 4.5: Balanced model for enterprise workloads, coding, and analysis.
Claude Haiku 4.5: Low-latency model for high-volume tasks and classification.

Open-weight models

We have added several open-weights models to Heroku Managed Inference and Agents.

Kimi K2 Thinking: Specialized for chain-of-thought processing, writing, and reasoning tasks.
MiniMax M2: optimized for creative generation, roleplay, and coding agents.
Qwen3 (235B & Coder 480B): Large models delivering exceptional performance as coding agents.

Nova models

Amazon Nova 2 Lite: The Nova 2 family is now available, replacing the previous generation. These models provide updated multimodal capabilities and improved price-performance ratios.

Anthropic’s Messages API (Heroku preview)

Heroku now offers preview support for the Messages API format for all Anthropic models on Heroku. The API format is an alternative to the standard chatCompletions API and aligns with the Claude SDKs, enabling direct integration with Claude Code and the Claude Agents SDK.

Technical implementation and authentication

Authentication detail for the v1/messages endpoint, the authentication structure mirrors Anthropic’s standard practice. Set the value of your Heroku add-on’s INFERENCE_KEY as the value for the x-api-key HTTP header in your request.

Quickstart with Anthropic Python SDK

import os
from anthropic import Anthropic
inference_url = os.getenv("INFERENCE_URL")
inference_key = os.getenv("INFERENCE_KEY")
inference_model = os.getenv("INFERENCE_MODEL")
client = Anthropic(
api_key=inference_key,
base_url=inference_url
)
message = client.messages.create(
model=inference_model,
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, what should I build today?"}
]
)

Key Constraints for Developers

Beta Features: We do not currently support the anthropic-beta header.
Claude Code: To ensure compatibility, set CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1.
Scope: The Messages API is exclusively available for Anthropic models.

Performance boost: automatic prompt caching

Heroku now caches system prompts and tool definitions to reduce latency on repeated requests. Prompt caching is enabled by default with no code changes required. Only system prompts and tool definitions are cached; user messages and conversation history are excluded and automatically expire to ensure privacy and security. You can disable caching for any request by adding a single HTTP header: X-Heroku-Prompt-Caching: false.

Lifecycle updates

Deprecations

Claude 3 Family: The Claude 3 models (Sonnet 3.5, Sonnet 3.7, Haiku 3, and Haiku 3.5) will be deprecated as of Jan 30th, 2026. Workloads should migrate to the Claude 4.5 family.
Nova 1st Gen: will be deprecated as of Feb 28th, 2026 in favor of Nova 2.
Model Fallback: We are working on a default model fallback mechanism where if your model is deprecated, you’ll automatically switch over to a similar more recent model in the same family of models.

Heroku AI PaaS: Accelerating AI Development

This release brings state-of-the-art reasoning and efficient open-weight models to the Heroku platform. With the addition of prompt caching you can now optimize latency with minimal configuration. We recommend validating your applications with the Claude 4.5 and Nova 2 families ahead of the upcoming deprecation cycle. We would love to hear your feedback and feature requests, please reach out to heroku-ai-feedback@salesforce.com.

The post Heroku AI: Accelerating AI Development With New Models, Performance Improvements, and Messages API appeared first on Heroku.