New Dec 18, 2025

Heroku AI: Accelerating AI Development With New Models, Performance Improvements, and Messages API

Company/Startup Blogs All from Blog | Heroku View Heroku AI: Accelerating AI Development With New Models, Performance Improvements, and Messages API on heroku.com

This month marks significant expansion for Heroku Managed Inference and Agents, directly accelerating our AI PaaS framework. We’re announcing a substantial addition to our model catalog, providing access to leading proprietary AI models such as Claude Opus 4.5, Nova 2, and open-weight models such as Kimi K2 thinking, MiniMax M2, and Qwen3. These resources are fully managed, secure, and accessible via a single CLI command. We have also refreshed aistudio.heroku.com, please navigate to aistudio.heroku.com from your Managed Inference and Agents add-on to access the models you have provisioned.

Whether you are building complex reasoning agents or high-performance consumer applications, here’s what’s new in our platform. All of the open-weight models you access on Heroku are running on secure compute on AWS servers. Neither Heroku nor the model provider has access to your data and it is not used in training.

Expanding Heroku’s AI catalog with new state of the art models

Claude 4.5 models

We now support the full Claude 4.5 family in both US and EU regions, replacing the prior Claude 3 models which are scheduled for depreciation in January of 2026.

Open-weight models

We have added several open-weights models to Heroku Managed Inference and Agents.

Nova models

Anthropic’s Messages API (Heroku preview)

Heroku now offers preview support for the Messages API format for all Anthropic models on Heroku. The API format is an alternative to the standard chatCompletions API and aligns with the Claude SDKs, enabling direct integration with Claude Code and the Claude Agents SDK.

Technical implementation and authentication

Authentication detail for the v1/messages endpoint, the authentication structure mirrors Anthropic’s standard practice. Set the value of your Heroku add-on’s INFERENCE_KEY as the value for the x-api-key HTTP header in your request.

Quickstart with Anthropic Python SDK

import os
from anthropic import Anthropic

inference_url = os.getenv("INFERENCE_URL") inference_key = os.getenv("INFERENCE_KEY") inference_model = os.getenv("INFERENCE_MODEL")

client = Anthropic( api_key=inference_key, base_url=inference_url )

message = client.messages.create( model=inference_model, max_tokens=1024, messages=[ {"role": "user", "content": "Hello, what should I build today?"} ] )

Key Constraints for Developers

Performance boost: automatic prompt caching

Heroku now caches system prompts and tool definitions to reduce latency on repeated requests. Prompt caching is enabled by default with no code changes required. Only system prompts and tool definitions are cached; user messages and conversation history are excluded and automatically expire to ensure privacy and security. You can disable caching for any request by adding a single HTTP header: X-Heroku-Prompt-Caching: false.

Lifecycle updates

Deprecations

Heroku AI PaaS: Accelerating AI Development

This release brings state-of-the-art reasoning and efficient open-weight models to the Heroku platform. With the addition of prompt caching you can now optimize latency with minimal configuration. We recommend validating your applications with the Claude 4.5 and Nova 2 families ahead of the upcoming deprecation cycle. We would love to hear your feedback and feature requests, please reach out to heroku-ai-feedback@salesforce.com.

The post Heroku AI: Accelerating AI Development With New Models, Performance Improvements, and Messages API appeared first on Heroku.

Scroll to top