Announcing Fluid Compute

We just shipped a new computing model at Vercel Fluid Compute is now live, bringing a more efficient way to scale Vercel Functions while keeping the simplicity of serverless. If you’re building AI applications that need to wait for responses over long periods of time, this is for you.

With optimized concurrency and extended function lifecycles, Fluid Compute allows functions to handle multiple requests per instance while efficiently managing the load, helping you reduce costs and improve performance.

Why should I care

  • Optimized concurrency – Functions handle multiple requests per instance, reducing idle time and cutting compute costs by up to 85% for high-concurrency workloads
  • Extended function lifecycle – Run background tasks after responding with waitUntil, perfect for AI workflows that process results asynchronously
  • Cold start protection – Smarter scaling and pre-warmed instances reduce cold starts
  • More efficient scaling – No more 1:1 invocation-to-instance model
  • Runaway cost protection – Detects and stops infinite loops or excessive invocations
  • Multi-region execution – Routes requests to the nearest selected compute region
  • Node.js and Python support – No restrictions on native modules or standard libraries

Try it out and let us know what you think! Turn it on now here: Enable Fluid

Read more here: Introducing Fluid compute: The power of servers, in serverless form - Vercel

4 Likes

Congrats on the launch! :ocean:

3 Likes