Serving Example with Furiosa Serving
Furiosa Serving is a lightweight library based on FastAPI that allows you to run a model server on a Furiosa NPU.
For more information about Furiosa Serving, you can visit the package link.
Getting Started
To get started with Furiosa Serving, you'll need to install the furiosa-serving library, create a ServeAPI (which is a FastAPI
wrapper), and set up your model for serving.
In this example, we'll walk you through the steps to create a simple ResNet50 server.
First, you'll need to import the necessary modules and initialize a FastAPI app:
from tempfile import NamedTemporaryFile
from typing import Dict, List
from fastapi import FastAPI, File, UploadFile
import numpy as np
import uvicorn
from furiosa.common.thread import synchronous
from furiosa.models import vision
from furiosa.serving import ServeAPI, ServeModel
serve = ServeAPI()
app: FastAPI = serve.app
Model Initialization
Next, you can initialize a vision model, such as ResNet50, for serving:
resnet50 = vision.ResNet50()
model_file = NamedTemporaryFile()
model_file.write(resnet50.model_source())
model_file_path = model_file.name
model: ServeModel = synchronous(serve.model("furiosart"))(
'ResNet50', location=model_file_path
)
Note
ServeModel does not support in-memory model binaries for now. Instead, you can write the model into a temporary file and pass its path like example.
Model Inference
Now that you have your FastAPI app and model set up, you can define an endpoint for model inference. In this example, we create an endpoint that accepts an image file and performs inference using ResNet50:
@model.post("/infer")
async def infer(image: UploadFile = File(...)) -> Dict[str, str]:
# Model Zoo's preprocesses do not consider in-memory image file for now,
# so we write in-memory image into a temporary file and pass its path
image_file_path = NamedTemporaryFile()
image_file_path.write(await image.read())
tensors, _ctx = resnet50.preprocess(image_file_path.name)
# Infer from ServeModel
result: List[np.ndarray] = await model.predict(tensors)
response: str = resnet50.postprocess(result)
return {"result": response}
Running the Server
Finally, you can run the FastAPI server using uvicorn.
# Run the server if the current Python script is called directly
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
Alternatively, you can run uvicorn server via internal app variable from ServeAPI instance like normal FastAPI application.
This example demonstrates the basic setup of a FastAPI server with Furiosa Serving for model inference. You can extend this example to add more functionality to your server as needed.
For more information and advanced usage of Furiosa Serving, please refer to the Furiosa Serving documentation.
You can find the full code example here.
from tempfile import NamedTemporaryFile
from typing import Dict, List
from fastapi import FastAPI, File, UploadFile
import numpy as np
import uvicorn
from furiosa.common.thread import synchronous
from furiosa.models import vision
from furiosa.serving import ServeAPI, ServeModel
serve = ServeAPI()
app: FastAPI = serve.app
resnet50 = vision.ResNet50()
# ServeModel does not support in-memory model binary for now,
# so we write model into temp file and pass its path
model_file = NamedTemporaryFile()
model_file.write(resnet50.model_source())
model_file_path = model_file.name
model: ServeModel = synchronous(serve.model("furiosart"))('ResNet50', location=model_file_path)
@model.post("/infer")
async def infer(image: UploadFile = File(...)) -> Dict[str, str]:
# Model Zoo's preprocesses do not consider in-memory image file for now
# (note that it's different from in-memory tensor)
# so we write in-memory image into temp file and pass its path
image_file_path = NamedTemporaryFile()
image_file_path.write(await image.read())
tensors, _ctx = resnet50.preprocess(image_file_path.name)
# Infer from ServeModel
result: List[np.ndarray] = await model.predict(tensors)
response: str = resnet50.postprocess(result)
return {"result": response}
# Run the server if current Python script is called directly
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)