Introducing

Fermyon Serverless AI

Powered by Serverless WebAssembly, Fermyon now offers Fermyon Serverless AI, with 51ms ~~cold~~ start times — over 100x faster than other on-demand AI infrastructure services.

Get Started Read the Docs

Execute inferencing on LLMs for Llama2 and CodeLlama

With no extra setup, run inferencing with Fermyon Serverless AI on state-of-the-art GPUs on large language models using Llama2 and CodeLlama models.

Build, run, and deploy your Serverless App fast

Simplify the developers’ experience to run inferencing on similar language models on your machine and enable them to run Serverless apps anywhere Spin runs!

Explore more use cases on Spin Up Hub

Unsure of the examples and sample apps for you to explore and play around with Fermyon Serverless AI?

Check out the Spin Up Hub, the central repository for examples, samples, plugins, and more!

Visit the Spin Up Hub for example apps and code templates for working with AI in your serverless apps.

Enterprises wishing to build AI applications that go beyond simple chat services face a largely insurmountable dilemma – it’s either cost prohibitive or it’s abysmally slow and, therefore, often abandon plans to build AI apps. Fermyon has used its core WebAssembly-based cloud compute platform to run fast AI inferencing workloads”