August 29, 2023

Introducing Componentize-Py

Joel Dice Joel Dice

python spin component wasm webassembly

Introducing Componentize-Py

Earlier this year, we announced an experimental SDK for developing Spin apps using the Python programming language. Since then, we’ve made a lot of incremental improvements, including adding SQLite support and refining the API to make it more idiomatic. However, it still has a few limitations:

  • No support for libraries which rely on native extensions (e.g. NumPy, Pandas, etc.)
  • No type hints
  • Hard-coded APIs: adding support for new Spin features is a manual process

Today, we’re pleased to announce a tool which addresses the above limitations: componentize-py. In this post, we’ll demonstrate how to use it to build a simple component using a mix of Python and native code, type-check it using MyPy, and run it using Spin. Then, we’ll update the app to use the wasi-http proposal, a vendor-agnostic API for sending and receiving HTTP requests. Finally, we’ll look at how componentize-py works, along with the next steps needed to make it suitable for everyday development.

Caveat Lector

Before we dive into the demo, note that we’ll be using tools and interfaces which are, as of this writing, very much under development. That means the developer experience hasn’t been refined yet, and we’ll have to do a few extra manual steps (like install a temporary fork of Spin) to make everything work. Once things have stablized, the process will be much simpler.

Prerequisites

Before we get started with the demo, we’ll need to install a few things. First, you’ll need a recent version of Python (e.g. 3.10 or 3.11) and pip if you don’t already have them. Once you have those, you can grab the latest componentize-py, plus MyPy for type checking:

pip install --upgrade componentize-py mypy

For this demo, we’ll be using a fork of Spin which includes wasi-http support. Both wasi-http and wasi-cli are still under development, so we’ve created a temporary fork to experiment with them in the meantime. To install it, download the appropriate release asset for your platform, unpack it, and add it to PATH. On an ARM-based Mac, that would be:

curl -OL https://github.com/dicej/spin/releases/download/canary/spin-canary-macos-aarch64.tar.gz
tar xf spin-canary-macos-aarch64.tar.gz
PATH=$(pwd):$PATH

Note that the asset contains a wit directory, which we’ll reference later.

Writing the App

Now we’ll create a simple Spin app that does matrix multiplication. It will accept as input a pair of matrices (encoded as a JSON array of arrays) and return the result of multiplying those matrices together. First, we’ll create the app using spin new.

spin new http-empty --accept-defaults matrices
cd matrices

Next, we’ll open the application’s manifest (the spin.toml file) and add [[component]], [component.trigger] and [component.build] configuration lines as shown below:

[[component]]
id = "matrices"
source = "matrices.wasm"
[component.trigger]
route = "/..."
[component.build]
command = "componentize-py -d ../wit/preview2 -w http-trigger componentize app -p . -p deps -o matrices.wasm"
watch = ["app.py"]

We can now write our app by creating a new file called app.py (under the matrices directory) and copying in the following Python code:

# This goes in app.py under the `matrices` directory
import http_trigger
from http_trigger.imports.http_types import Request, Response, Method
from http_trigger import exports
import numpy
import json

class InboundHttp(exports.InboundHttp):
    # !!! WARNING: There's a subtle bug on the following line:
    def handle_request(req: Request) -> Response:
        try:
            if req.method == Method.POST and req.uri == "/multiply" and req.body is not None:
                [a, b] = json.loads(req.body)
                return Response(
                    200,
                    [("content-type", "application/json")],
                    bytes(json.dumps(numpy.matmul(a, b).tolist()), "utf-8")
                )
            else:
                return Response(400, None, None)
        except Exception as e:
            return Response(
                500,
                [("content-type", "text/plain")],
                bytes(f"{type(e).__name__}: {str(e)}", "utf-8")
            )

At this point, we can use componentize-py and MyPy to type-check our program. We’ll use the componentize-py bindings subcommand to generate Python code from the Spin WIT files, writing them to the current directory where MyPy can find them.

componentize-py -d ../wit/preview2 -w http-trigger bindings .
mypy app.py

Oops! We forgot something, as MyPy is quick to tell us:

app.py:9: error: Self argument missing for a non-static method (or an invalid type for self)  [misc]
app.py:9: error: Signature of "handle_request" incompatible with supertype "InboundHttp"  [override]
app.py:9: note:      Superclass:
app.py:9: note:          def handle_request(self, req: Request) -> Response
app.py:9: note:      Subclass:
app.py:9: note:          def handle_request(req) -> Response
Found 2 errors in 1 file (checked 1 source file)

Let’s fix that by adding a self parameter to the offending line in app.py:

    def handle_request(self, req: Request) -> Response:

Now, rerunning mypy app.py should result in a Success: no issues found in 1 source file message.

One more thing before we build and run the app: we’re using NumPy, but if we pip install numpy, the version downloaded will contain native extension libraries built for the host OS (e.g. Mac, Linux, or Windows), not WASI. Eventually, we hope projects like NumPy will begin publishing official WASI builds to PyPI, which would allow tools like componentize-py to retrieve them automatically. For the time being, though, we’ll download a custom build and use that:

curl -OL https://github.com/dicej/wasi-wheels/releases/download/canary/numpy-wasi.tar.gz
mkdir deps
tar -C deps -xf numpy-wasi.tar.gz

Finally, we can build and run our app:

spin build --up

Then, in another terminal, we can test it:

curl -i -H 'content-type: application/json' \
    -d '[[[1, 2], [4, 5], [6, 7]], [[1, 2, 3], [4, 5, 6]]]' \
    http://127.0.0.1:3000/multiply

The result should look something like [[9, 12, 15], [24, 33, 42], [34, 47, 60]]. Yay, math!

A Look at wasi-http

One of the exciting parts about the upcoming WASI Preview 2 release is wasi-http, a proposed standard for issuing and handling HTTP requests. It consists of a WIT world and associated interfaces designed to support asynchronous, concurrent I/O. It’s vendor-agnostic, so apps targetting it can be expected to work on any compatible host (and even incompatible ones, via a virtualizing adapter).

Due to the low-level nature of wasi-http, it’s a bit more challenging to develop for than the existing Spin http-trigger world. However, it also enables concurrent outbound requests and unbuffered request and response streaming – critical features for apps which need to handle a lot of data. Our matrices app doesn’t really fit that category, but we’ll convert it to use wasi-http anyway, just to get a sense of what it looks like. This also demonstrates that there’s nothing Spin- or even WASI-specific about componentize-py: it can be used to target any WIT world, standard or proprietary.

First, we’ll edit the [component.trigger] and [component.build] sections of spin.toml to indicate we’re using the wasi executor and targeting the proxy world defined by wasi-http:

[component.trigger]
route = "/..."
executor =  { type = "wasi" }
[component.build]
command = "componentize-py -d ../wit/wasi-http -w proxy componentize app -p . -p deps -o matrices.wasm"
watch = ["app.py"]

Now, we update our app.py to use the new interfaces:

# This goes in app.py under the `matrices` directory
import asyncio
import hashlib
import json
import numpy
import poll_loop

from proxy import exports
from proxy.types import Ok
from proxy.imports import types2 as types
from proxy.imports.types2 import MethodPost
from poll_loop import Stream, Sink, PollLoop

class IncomingHandler2(exports.IncomingHandler2):
    def handle(self, request: int, response_out: int):
        loop = PollLoop()
        asyncio.set_event_loop(loop)
        loop.run_until_complete(handle_async(request, response_out))

async def handle_async(request: int, response_out: int):
    try:
        method = types.incoming_request_method(request)
        path = types.incoming_request_path_with_query(request)

        if isinstance(method, MethodPost) and path == "/multiply":
            stream = Stream(types.incoming_request_consume(request))

            body = bytearray()

            while True:
                chunk = await stream.next()
                if chunk is None:
                    break
                else:
                    body.extend(chunk)

            [a, b] = json.loads(body)

            response = types.new_outgoing_response(
                200,
                types.new_fields([("content-type", b"application/json")])
            )
            types.set_response_outparam(response_out, Ok(response))

            sink = Sink(types.outgoing_response_write(response))
            await sink.send(bytes(json.dumps(numpy.matmul(a, b).tolist()), "utf-8"))
            sink.close()
        else:
            response = types.new_outgoing_response(400, types.new_fields([]))
            types.set_response_outparam(response_out, Ok(response))
            Sink(types.outgoing_response_write(response)).close()

    except Exception as e:
        response = types.new_outgoing_response(
            500,
            types.new_fields([("content-type", b"text/plain")])
        )
        types.set_response_outparam(response_out, Ok(response))

        sink = Sink(types.outgoing_response_write(response))
        await sink.send(bytes(f"{type(e).__name__}: {str(e)}", "utf-8"))
        sink.close()

Whew, that’s a lot more code! wasi-http is fairly low level, so we have to manage request and response streams explicitly. Simple apps like this one would probably be better suited by a high-level wrapper library that uses wasi-http under the hood. Also, the proposal currently uses integer-based “pseudo-resources” to represent host resources, which makes the generated bindings awkward to use. However, work on proper WIT resources is nearing completion, which will make this much more idiomatic.

You might wonder where the poll_loop module comes from. It’s a helper module which provides utility classes for using WASI Preview 2 streams with Python’s asyncio feature. Since it’s based on a proposal which hasn’t yet stablized, we haven’t published it as part of an official package, but you can get a copy from the componentize-py repository in the meantime:

curl -OL https://raw.githubusercontent.com/dicej/componentize-py/1a7d068d/examples/http/poll_loop.py

Now we can generate bindings for the proxy world, run MyPy on the updated app, and finally run it in Spin:

componentize-py -d ../wit/wasi-http -w proxy bindings .
mypy app.py
spin build --up

Then, in another terminal, we can test the new version with the same command we used earlier:

curl -i -H 'content-type: application/json' \
    -d '[[[1,2],[4,5],[6,7]], [[1,2,3],[4,5,6]]]' \
    http://127.0.0.1:3000/multiply

If you’d like to see a more sophisticated example which makes full use of wasi-http’s concurrent and streaming I/O features, check out this app in the componentize-py repo.

Under the Hood

Python componentize image

In order to understand how componentize-py works, it’s helpful to start by comparing it to spin-python-sdk. The latter operates by injecting an app’s Python code (and any dependencies) into a Wasm module which uses the Spin Rust SDK, bridging the Rust API to Python equivalents using PyO3. The bridge code is all written by hand and must be updated each time new features are added to the Rust SDK. There are advantages to this approach, and it allowed us to add Python support to Spin quickly, but it is awkward to integrate with typecheckers and IDEs, inherently Spin-specific, and has no way of dealing with Python native extensions.

In contrast, componentize-py can accept an arbitrary WIT world and an app that targets it, plus any dependencies (including native extensions), producing a component that will run on any host supporting that world. In order to do that, it generates a Wasm module automatically to fulfill the Component Model canonical ABI requirements of that specific world, then links that module together with wasi-libc, CPython, and any native extensions and their dependencies. This linking process uses shared-everything linking to combine an arbitrary number of core Wasm modules into a single component, including a step to emulate dlopen and dlsym, allowing CPython to load any native extensions at runtime. Finally, componentize-py generates type-annotated Python code corresponding to the world and injects it – plus the application code and its dependencies – as part of a pre-initialization step.

In order to make all that work, we had to extend several third-party projects, including LLVM, Rust, wasi-sdk, CPython, and wasm-tools. Some of those features have landed upstream, while others are still under review. We expect that these same features will be useful for other high-level languages with foreign function capabilities (e.g. Ruby C extensions, Java JNI, .NET DllImport, etc.)

To summarize:

  • The equivalent bridge code we wrote and maintained for spin-python-sdk is now generated automatically – complete with docstrings and type annotations – from WIT files.
  • Any native extensions are bundled as part of the component, available for CPython to load as needed at runtime.
  • The whole package is pre-initialized for sub-millisecond startup times – ideal for running serverless apps at scale.

Making a Good Thing Better

Although you can start experimenting with componentize-py now, there’s still more work needed to make it suitable for everyday use. To start with, WASI Preview 2 and its Component Model underpinnings are still under active development. We expect them to stabilize within the next couple of months, but until that happens compatibility among tools and host runtimes will vary. In addition, here are some more ways we’d like to improve support for Python on WASI:

Publish WASI ports of popular Python packages

There’s no standard way to cross-compile Python native extensions, which makes Wasm support challenging. Many popular packages do not support it at all, requiring various creative workarounds. In addition, many essential data science libraries such as SciPy and OpenBLAS include Fortran code from various eras, and compiling it to Wasm requires a great deal of patching and tweaking. We chose NumPy in the example above because it happens to be one of the easy ones.

The Pyodide project has tackled these issues to an impressive degree and boasts a large library of ports. It is currently focused exclusively on browser runtimes via the emscripten SDK, but we could theoretically adapt the pyodide-build tool to use wasi-sdk as an option, making those ports available in a non-browser environment as well.

Polish the developer experience

Stabilizing WASI Preview 2 will be a big step in this direction, but there are others as well, such as:

  • Port popular libraries such as requests to use WASI interfaces such as wasi-http natively
  • Implement WIT resource support in componentize-py and wasmtime-py
  • Make the generated Python code more idiomatic
  • Publish high-level wrapper packages to make low-level host features easier to use

Integrate support for warg, the Wasm package registry protocol

This will eliminate the need to copy WIT files around and make it easier to integrate components written in arbitrary languages into Python-based apps.

The reference implementation is already usable, and we expect the Bytecode Alliance will begin using it to host WASI proposals, implementations, and virtualizers in the near future.

Reduce binary sizes

As mentioned above, componentize-py currently bundles everything (e.g. wasi-libc, CPython, and any native extensions) into a single component. This is great for portability, but also potentially redundant if you’re deploying multiple Python apps to the same host.

Fortunately, the Component Model specification supports importing modules as well as bundling them inline. Ideally, you would be able to tell componentize-py to import CPython instead of bundling it if you knew the host you were deploying to already had a compatible version. Alternatively, a tool like spin cloud deploy could deconstruct a component and upload only the parts not yet present on the server, similar to how docker push only uploads image layers as needed.

Run the same component at the command line, in the browser, and in the cloud

This is theoretically possible today, thanks to jco and WASI-Virt. Once Preview 2 stabilizes, we’ll start to see practical examples of it.

Getting Involved

If you’d like to help with any of the above, or just learn more about Python on Wasm, please join us in the Python SIG Guest Languages group.

Conclusion

With WASI Preview 2 just around the corner, tools for various languages have begun to support it:

We’re excited to add Python to that list, and hope to make componentize-py the easiest, most efficient way to build WebAssembly components using that language and ecosystem.

Acknowledgments

Many people provided advice and assistance in developing componentize-py and related tools. The following is an incomplete, randomized list:

  • Brett Cannon, for explaining the inner workings of Python modules and packaging, among other things
  • Kevin Smith, for sharing lessons learned trying to port various packages to WASI
  • Dan Gohman and Sam Clegg, for helping me navigate wasi-libc, LLVM, and related tooling conventions
  • Alex Crichton, for sharing deep Rust and Component Model knowledge
  • Jamey Sharp, for early design discussion and for prototyping shared-everything linking, saving me a ton of time
  • Luke Wagner, for design advice and for gently correcting my many misunderstandings of the Component Model
  • Hood Chatham, for explaining how pyodide-build works and why Fortran and libffi are so challenging for Wasm

🔥 Recommended Posts


Quickstart Your Serveless Apps with Spin

Get Started