DigitalOcean Blog

New Jun 4, 2026
Model Evaluations: Prove Your Routing Policy Actually Works

Most teams running inference at scale do not fail because they cannot find a “good” model. They fail because they ship a routing policy that looks fine in a playground, but drifts the moment it se...

View Model Evaluations: Prove Your Routing Policy Actually Works on digitalocean.com
New Jun 3, 2026
The Team Behind Deploy: Shipping AI, the DigitalOcean Way

Deploy 2026 came and went, and we’re still buzzing. For one day at Convene 100 Stockton in San Francisco, developers, startup founders, customers, and partners filled the room to talk about a shared c...

View The Team Behind Deploy: Shipping AI, the DigitalOcean Way on digitalocean.com
New Jun 3, 2026
Powering the Inference Era: Inside the DigitalOcean Data & Learning Layer

Building an AI-native application requires a data layer that can do two things at once: handle the structured, transactional queries your application runs on, and understand meaning well enough to po...

View Powering the Inference Era: Inside the DigitalOcean Data & Learning Layer on digitalocean.com
New Jun 2, 2026
Open by Design: How NVIDIA and DigitalOcean Are Building the Stack for the Always-On Agentic Era

The growth of generative AI isn’t driven solely by AI companies with proprietary models. Open-source AI is reshaping the developer ecosystem, fueled by a growing community of builders. But what do...

View Open by Design: How NVIDIA and DigitalOcean Are Building the Stack for the Always-On Agentic Era on digitalocean.com
New Jun 1, 2026
The Inference Tax: How Prefix-Aware Routing Eliminates the Hidden Cost of LLMs at Scale

Introduction Inference demand is growing fast, and it’s only accelerating. By 2030, inference is expected to account for the majority of AI compute globally. But scaling inference i...

View The Inference Tax: How Prefix-Aware Routing Eliminates the Hidden Cost of LLMs at Scale on digitalocean.com
New Jun 1, 2026
DigitalOcean Serverless Inference: A Deep Dive

The Problem: Inference Gets Hard at Scale If you’ve shipped an AI feature to production, you already know: the hard part isn’t making a model resp...

View DigitalOcean Serverless Inference: A Deep Dive on digitalocean.com
New May 29, 2026
AI Disruptors: How the Next Generation of Business is Being Built

Getting your hands on a capable AI model is the easy part now. Every team can reach the same frontier models through an API, so a strong model is not what sets a product apart. What separates a wo...

View AI Disruptors: How the Next Generation of Business is Being Built on digitalocean.com
New May 28, 2026
OpenCode Now Supports DigitalOcean Inference Router for Intelligent Model Routing

Coding agents today have a massive spending problem. Every request, whether you’re designing system architecture or writing a single-line docstring, often gets routed to the same expensive frontier mo...

View OpenCode Now Supports DigitalOcean Inference Router for Intelligent Model Routing on digitalocean.com
New May 27, 2026
Scalable, Cost-Efficient AI: Introducing Unified Batch Inference on DigitalOcean

At Deploy 2026, we introduced the DigitalOcean AI-Native Cloud, built for the inference era. Batch Inference on the DigitalOcean Inference Engine enables high-volume asynchronous workloads. As develop...

View Scalable, Cost-Efficient AI: Introducing Unified Batch Inference on DigitalOcean on digitalocean.com
New May 22, 2026
Request-Based Autoscaling Is Now Generally Available on App Platform

Traffic doesn’t spike on a schedule. A product launch, a viral moment, or a flash sale can send request volume through the roof in seconds, long before your CPU metrics catch up. That gap is where pe...

View Request-Based Autoscaling Is Now Generally Available on App Platform on digitalocean.com
New May 20, 2026
How We Built DigitalOcean Inference Router

Most teams building on LLMs today make a single model decision and apply it uniformly across every request. They reach for a frontier model not because every task demands it, but because building th...

View How We Built DigitalOcean Inference Router on digitalocean.com
New May 13, 2026
Your Model Doesn't Matter. Your Infrastructure Does.

Everyone calling an LLM API has access to the same models. So what actually sets technical teams apart? It’s everything around the model like the routing logic, the live data pipelines, and the abili...

View Your Model Doesn't Matter. Your Infrastructure Does. on digitalocean.com
New May 4, 2026
Powering the Inference Era: Inside the DigitalOcean AI-Native Cloud

I’ve spent the last fifteen years building cloud services: early days of AWS building S3 and EBS, helping launch Oracle Cloud Infrastructure from inception, and now building the agentic cloud at Di...

View Powering the Inference Era: Inside the DigitalOcean AI-Native Cloud on digitalocean.com
New Apr 28, 2026
Introducing DigitalOcean AI-Native Cloud for Production AI Workloads

The AI industry has a compounding bottleneck, and it isn’t the models. It’s inference. What used to be a single model call has become a system of continuous interaction. Applications now orchestra...

View Introducing DigitalOcean AI-Native Cloud for Production AI Workloads on digitalocean.com
New Apr 28, 2026
How we built the most performant DeepSeek V3.2, MiniMax-M2.5 and Qwen 3.5 397B on DigitalOcean Serverless Inference

Today at Deploy, we are announcing the general availability of DeepSeek V3.2, MiniMax-M2.5, and Qwen 3.5 397B on DigitalOcean Serverless Inference. On DeepSeek V3.2 and Qwen 3.5 397B, we deliver #1 ou...

View How we built the most performant DeepSeek V3.2, MiniMax-M2.5 and Qwen 3.5 397B on DigitalOcean Serverless Inference on digitalocean.com
New Apr 25, 2026
DigitalOcean Dedicated Inference: A Technical Deep Dive

Getting a model to answer 10 inference requests concurrently is tricky but simple enough; getting it to handle 2,000 engineers hitting a coding assistant with long contexts, all day, without ru...

View DigitalOcean Dedicated Inference: A Technical Deep Dive on digitalocean.com
New Apr 23, 2026
Beyond the Abyss Project Poseidon’s Quest for Zero-Downtime Reliability

In large-scale cloud environments, unpredictable hypervisor crashes carry real operational cost. While traditional reactive monitoring that relies on static thresholds and post-hoc alerts were on...

View Beyond the Abyss Project Poseidon’s Quest for Zero-Downtime Reliability on digitalocean.com
New Apr 23, 2026
From Incident Counting to SLIs: How DigitalOcean Rethought Availability

Our journey to truly understand our customer experience began with a hard look at our internal availability numbers at the start of 2025. We saw something uncomfortable: the numbers didn’t match ou...

View From Incident Counting to SLIs: How DigitalOcean Rethought Availability on digitalocean.com
New Apr 22, 2026
The LLM Inference Trilemma: Throughput, Latency, Cost

We know how to scale traditional web services: throw a load balancer in front of stateless microservices and horizontally scale your CPU instances as traffic grows. Large Language Models break this pl...

View The LLM Inference Trilemma: Throughput, Latency, Cost on digitalocean.com
New Apr 21, 2026
Mastering the 600B+ Frontier: Optimizing Large Model Deployments on the Inference Cloud

We have moved past the point where a 70GB model was considered “heavy.” With the rise of models like DeepSeek-V3, the GLM series, and other massive Mixture-of-Experts (MoE) architectur...

View Mastering the 600B+ Frontier: Optimizing Large Model Deployments on the Inference Cloud on digitalocean.com
New Apr 17, 2026
The Inference Cloud Memory Layer: A Technical Dive into DigitalOcean Managed Databases

As AI moves from experimental chat interfaces to production-grade agents, the need for a foundational memory layer to transform these AI-powered tasks into stateful models is apparent. The absence o...

View The Inference Cloud Memory Layer: A Technical Dive into DigitalOcean Managed Databases on digitalocean.com
New Apr 15, 2026
Load Balancing and Scaling LLM Serving

Load balancing for LLMs is fundamentally different from load balancing for traditional services like web servers, APIs, or databases. Prompt caching is the reason. Prompt caching typically cuts inp...

View Load Balancing and Scaling LLM Serving on digitalocean.com
New Apr 13, 2026
Building a Robust Documentation Agent with DigitalOcean Gradient AI Platform

At DigitalOcean, documentation has always been a priority. Developers come to our docs to get unstuck, and the faster they find what they need, the better. Traditional docs pages work, but they re...

View Building a Robust Documentation Agent with DigitalOcean Gradient AI Platform on digitalocean.com
New Apr 7, 2026
Advanced Prompt Caching at Scale

Introduction Prompt caching is the process of reusing already computed KV states across inference requests in order to save money and reduce latency. Within a single replica, m...

View Advanced Prompt Caching at Scale on digitalocean.com
New Apr 3, 2026
The Hidden Cost of Complex AI Platforms: Why Developer Experience Matters

The cloud AI platform ecosystem today looks more powerful than ever, with access to powerful GPUs like NVIDIA H100 and H200, massive libraries of pre-trained models, and full pipelines for fine-tu...

View The Hidden Cost of Complex AI Platforms: Why Developer Experience Matters on digitalocean.com
New Apr 2, 2026
The Glue Problem in Modern AI Development

AI is now central to modern software development. Teams across industries are turning to AI to solve product and workflow problems in software. But building production systems is still complex. The ha...

View The Glue Problem in Modern AI Development on digitalocean.com
New Apr 2, 2026
The Agentic Era Demands a New Class of Infrastructure: DigitalOcean Acquires Katanemo Labs

At DigitalOcean, we have been vocal about our strategic shift: we are building the world’s premier Agentic Inference Cloud. Our mission is to provide the foundation where AI-native enterprises bu...

View The Agentic Era Demands a New Class of Infrastructure: DigitalOcean Acquires Katanemo Labs on digitalocean.com
New Apr 1, 2026
Run Advanced Reasoning on DigitalOcean with Arcee AI's Trinity Large-Thinking

Today, we’re announcing that Arcee AI’s Trinity Large-Thinking is now available in Public Preview on DigitalOcean’s Agentic Inference Cloud, giving developers the ability to run frontier-class reaso...

View Run Advanced Reasoning on DigitalOcean with Arcee AI's Trinity Large-Thinking on digitalocean.com
New Apr 1, 2026
Now Available: DigitalOcean Cloud Security Posture Management (CSPM)

Keeping cloud infrastructure secure at scale is challenging. Infrastructure drift, exposed services, and sprawling identities create risk, and teams don’t always have the time or expertise to ma...

View Now Available: DigitalOcean Cloud Security Posture Management (CSPM) on digitalocean.com
New Mar 27, 2026
NVIDIA GTC 2026 Confirmed It: The Inference Era Is Here

Last week at NVIDIA GTC 2026, one message was clear: AI has moved beyond the training era and into the era of production inference. The conversation was no longer just about building faster chips an...

View NVIDIA GTC 2026 Confirmed It: The Inference Era Is Here on digitalocean.com
New Mar 24, 2026
DigitalOcean India: Inside Our Growing Hub for AI and Cloud Innovation

At DigitalOcean, our philosophy, ‘We ship in hours and days, not months or quarters,’ delivers real results for our customers. This is the exact velocity that has defined DigitalOcean’s India re...

View DigitalOcean India: Inside Our Growing Hub for AI and Cloud Innovation on digitalocean.com
New Mar 23, 2026
Enhancing Security with User-Specific Access Keys for DigitalOcean Functions

As teams grow and scale their serverless workloads, managing security postures becomes just as critical as managing code. Our goal at DigitalOcean is to support your growth at every stage. One way we...

View Enhancing Security with User-Specific Access Keys for DigitalOcean Functions on digitalocean.com
New Mar 19, 2026
Meet the New Standard for High-Performance, Low-Cost Inference: NVIDIA Dynamo 1.0 is now available to DigitalOcean Customers

NVIDIA Dynamo 1.0, which was released on Monday at NVIDIA GTC, is now available to DigitalOcean customers to help drive performance enhancements and cost efficiency. NVIDIA Dynamo 1.0 offers a 7x in...

View Meet the New Standard for High-Performance, Low-Cost Inference: NVIDIA Dynamo 1.0 is now available to DigitalOcean Customers on digitalocean.com
New Mar 17, 2026
Prompt Caching for Anthropic and OpenAI Models: Building Cost-Efficient AI Systems

Large Language Models (LLMs) have become a foundational component for modern AI applications, from developer copilots and documentation assistants to advanced troubleshooting tools. As these ap...

View Prompt Caching for Anthropic and OpenAI Models: Building Cost-Efficient AI Systems on digitalocean.com
New Mar 16, 2026
DigitalOcean at NVIDIA GTC 2026: Building the AI Factory for the Agentic Era

A seamless path for builders: Start building on build.nvidia.com, Deploy to DigitalOcean The landscape of artificial intelligence has shifted from static models to dyna...

View DigitalOcean at NVIDIA GTC 2026: Building the AI Factory for the Agentic Era on digitalocean.com
New Mar 16, 2026
Deploy Smarter with AI: Introducing App Platform Skills on DigitalOcean

AI coding assistants have fundamentally changed how developers write software. Tools like Claude Code, Codex, GitHub Copilot, Gemini, and Cursor can scaffold an entire application in minutes. But as...

View Deploy Smarter with AI: Introducing App Platform Skills on DigitalOcean on digitalocean.com
New Mar 13, 2026
Scaling Autonomous Site Reliability Engineering: Architecture, Orchestration, and Validation for a 90,000+ Server Fleet

As Cloudways scaled from a bootstrapped startup to a leading managed PHP hosting service, one of the biggest challenges we encountered was the growing support load. Managing a fleet of over 90,000 se...

View Scaling Autonomous Site Reliability Engineering: Architecture, Orchestration, and Validation for a 90,000+ Server Fleet on digitalocean.com
New Mar 5, 2026
Native .NET Buildpack Support is Now Available on App Platform

The .NET ecosystem continues to power a significant share of enterprise and cloud-native applications, from web APIs and microservices to full-stack applications built with ASP.NET Core. Developers bu...

View Native .NET Buildpack Support is Now Available on App Platform on digitalocean.com
New Mar 3, 2026
How DigitalOcean’s Agentic Inference Cloud powered by NVIDIA GPUs Achieved 67% Lower Inference Costs for Workato

Workato’s AI Research Lab is focused on helping customers extend their production automation with agentic AI capabilities, systems that can reason, act, and orchestrate work across the business. At Wo...

View How DigitalOcean’s Agentic Inference Cloud powered by NVIDIA GPUs Achieved 67% Lower Inference Costs for Workato on digitalocean.com
New Feb 26, 2026
Supabase Template is Now Available on DigitalOcean App Platform

Modern applications need more than just a database. They need authentication, auto-generated APIs, file storage, and real-time subscriptions. Supabase is a powerful open-source Firebase alternative th...

View Supabase Template is Now Available on DigitalOcean App Platform on digitalocean.com
New Feb 25, 2026
Zero to Deploy: Launching Your Career at DigitalOcean

Diving into the professional world is a big moment for recent graduates. At DigitalOcean, we believe the best way to become a world-class builder is to build. Our entry-level roles are designed to tr...

View Zero to Deploy: Launching Your Career at DigitalOcean on digitalocean.com
New Feb 19, 2026
DigitalOcean Gradient™ AI GPU Droplets Optimized for Inference: Increasing Throughput at Lower the Cost

Production-grade LLM inference demands more than just access to GPUs; it requires deep optimization across the entire serving stack, from quantization and attention kernels to memory management and pa...

View DigitalOcean Gradient™ AI GPU Droplets Optimized for Inference: Increasing Throughput at Lower the Cost on digitalocean.com
New Feb 19, 2026
Expanding our Agentic Inference Cloud: Introducing GPU Droplets Powered by AMD Instinct™ MI350X GPUs

As our Agentic Inference Cloud continues to grow, we’re excited to announce the availability of new, high-performance GPU D...

View Expanding our Agentic Inference Cloud: Introducing GPU Droplets Powered by AMD Instinct™ MI350X GPUs on digitalocean.com
New Feb 18, 2026
DigitalOcean Gradient™ AI Platform Now Integrates with LlamaIndex

We’re excited to announce that DigitalOcean Gradient™ AI Platform now integrates natively with LlamaIndex - one of the most popular frameworks for building RAG applications. This means you can now c...

View DigitalOcean Gradient™ AI Platform Now Integrates with LlamaIndex on digitalocean.com
New Feb 10, 2026
The Container paradox: Why the Inference Cloud Demands a “Decoupled” Database

Kubernetes has won the cloud-native war for a reason: it’s one of, if not the most powerful tool we have for scaling applications and ensuring they stay up when unexpected things happen. But as we mo...

View The Container paradox: Why the Inference Cloud Demands a “Decoupled” Database on digitalocean.com
New Feb 9, 2026
Heroku’s Next Chapter Is Maintenance. Yours Shouldn’t Be

Heroku’s move to a “sustaining engineering” model was carefully worded. It avoids the term end-of-life. It reassures existing customers that nothing changes immediately. It emphasizes stability and su...

View Heroku’s Next Chapter Is Maintenance. Yours Shouldn’t Be on digitalocean.com

Scroll to top