Fermyon Serverless AI

Powered by Serverless WebAssembly, Fermyon now offers Fermyon Serverless AI, with 51ms cold start times — over 100x faster than other on-demand AI infrastructure services.

Execute inferencing on LLMs for Llama2 and CodeLlama

With no extra setup, run inferencing with Fermyon Serverless AI on state-of-the-art GPUs on large language models using Llama2 and CodeLlama models.

Fermyon Serverless AI Overview

Build, run, and deploy your Serverless App fast

Simplify the developers’ experience to run inferencing on similar language models on your machine and enable them to run Serverless apps anywhere Spin runs!

Explore more use cases on Spin Up Hub

Unsure of the examples and sample apps for you to explore and play around with Fermyon Serverless AI?

Check out the Spin Up Hub, the central repository for examples, samples, plugins, and more!


Enterprises wishing to build AI applications that go beyond simple chat services face a largely insurmountable dilemma – it’s either cost prohibitive or it’s abysmally slow and, therefore, often abandon plans to build AI apps. Fermyon has used its core WebAssembly-based cloud compute platform to run fast AI inferencing workloads”


— Roy Illsley, Analyst at Omdia


Fermyon Serverless AI

Join our private preview and get started deploying LLM workloads and AI apps.