Model object

In furiosa-models project, Model is the first class object, and it represents a neural network model. This document explains what Model object offers and their usages.

Loading a pre-trained model

To load a pre-trained neural-network model, you need to call load() method. Since the sizes of pre-trained model weights vary from tens to hundreds megabytes, the model images are not included in Python package. When load() method is called, a pre-trained model will be fetched over network. It takes some time (usually few seconds) depending on models and environments. Once the model images are fetched, they will be cached on a local disk.

Non-blocking API load_async() also is available, and it can be used if your application is running through asynchronous executors (e.g., asyncio).

Blocking APINon-blocking API

from furiosa.models.types import Model
from furiosa.models.vision import ResNet50

model: Model = ResNet50.load()

import asyncio

from furiosa.models.types import Model
from furiosa.models.vision import ResNet50

model: Model = asyncio.run(ResNet50.load_async())

Accessing artifacts and metadata

A Model object includes model artifacts, such as ONNX, tflite, DFG, and ENF.

DFG and ENF are FuriosaAI Compiler specific formats. Both formats are used for pre-compiled binary, and they are used to skip compilation times that take up to minutes. In addition, a Model object has various metadata. The followings are all attributes belonging to a single Model object.

`furiosa.registry.Model`

Represent the artifacts and metadata of a neural network model

Attributes:

Name	Type	Description
`name`	`str`	a name of this model
`format`	`Format`	the binary format type of model source; e.g., ONNX, tflite
`source`	`bytes`	a source binary in ONNX or tflite. It can be used for compiling this model with a custom compiler configuration.
`dfg`	`Optional[bytes]`	an intermediate representation of furiosa-compiler. Native post processor implementation uses dfg binary. Users don't need to use `dfg` directly.
`enf`	`Optional[bytes]`	the executable binary for furiosa runtime and NPU
`version`	`Optional[str]`	model version
`inputs`	`Optional[List[ModelTensor]]`	data type and shape of input tensors
`outputs`	`Optional[List[ModelTensor]]`	data type and shape of output tensors
`compiler_config`	`Optional[Dict]`	a pre-defined compiler option

Source code in furiosa/registry/model.py

class Model(BaseModel):
    """Represent the artifacts and metadata of a neural network model

    Attributes:
        name: a name of this model
        format: the binary format type of model source; e.g., ONNX, tflite
        source: a source binary in ONNX or tflite. It can be used for compiling this model
            with a custom compiler configuration.
        dfg: an intermediate representation of furiosa-compiler. Native post processor implementation uses dfg binary.
            Users don't need to use `dfg` directly.
        enf: the executable binary for furiosa runtime and NPU
        version: model version
        inputs: data type and shape of input tensors
        outputs: data type and shape of output tensors
        compiler_config: a pre-defined compiler option
    """

    # class Config(BaseConfig):
    #     # Non pydantic attribute allowed
    #     # https://pydantic-docs.helpmanual.io/usage/types/#arbitrary-types-allowed
    #     arbitrary_types_allowed = True

    name: str
    source: bytes = Field(repr=False)
    format: Format
    dfg: Optional[bytes] = Field(repr=False)
    enf: Optional[bytes] = Field(repr=False)

    family: Optional[str] = None
    version: Optional[str] = None

    metadata: Optional[Metadata] = None

    inputs: Optional[List[ModelTensor]] = []
    outputs: Optional[List[ModelTensor]] = []

    compiler_config: Optional[Dict] = None

Inferencing with Session API

To load a model to FuriosaAI NPU, you need to create a session instance with a Model object through Furiosa SDK. As we mentioned above, even a single Model object has multiple model artifacts, such as a ONNX model and an ENF (FuriosaAI's compiled program binary).

If an Model object is passed to session.create(), Session API chooses the ENF (FuriosaAI's Executable NPU Format) by default. In this case, session.create() doesn't involve any compilation because it uses the pre-compiled ENF binary.

Info

If you want to learn more about the installation of furiosa-sdk and how to use it, please follow the followings:

Users still can compile source models like ONNX or tflite if passing Model.source to session.create(). Compiling models will take some time up to minutes, but it allows to specify batch size and compiler configs, leading to more optimizations depending on user use-cases. To learn more about Model.source, please refer to Accessing artifacts and metadata.

Example

Using ENF binaryUsing ONNX model

from furiosa.models.vision import SSDMobileNet
from furiosa.runtime import session

image = ["tests/assets/cat.jpg"]

mobilenet = SSDMobileNet.load()
with session.create(mobilenet) as sess:
    inputs, contexts = mobilenet.preprocess(image)
    outputs = sess.run(inputs).numpy()
    mobilenet.postprocess(outputs, contexts)

from furiosa.models.vision import SSDMobileNet
from furiosa.runtime import session

images = ["tests/assets/cat.jpg", "tests/assets/cat.jpg"]

mobilenet = SSDMobileNet.load()
with session.create(mobilenet.source, batch_size=2) as sess:
    inputs, context = mobilenet.preprocess(images)
    outputs = sess.run(inputs).numpy()
    mobilenet.postprocess(outputs, context=context)

Pre/Postprocessing

There are gaps between model input/outputs and user applications' desired input and output data. In general, inputs and outputs of a neural network model are tensors. In applications, user sample data are images in standard formats like PNG or JPEG, and users also need to convert the output tensors to struct data for user applications.

A Model object also provides both preprocess() and postprocess() methods. They are utilities to convert easily user inputs to the model's input tensors and output tensors to struct data which can be easily accessible by applications. If using pre-built pre/postprocessing methods, users can quickly start using furiosa-models.

In sum, typical steps of a single inference is as the following, as also shown at examples.

Call preprocess() with user inputs (e.g., image files)
Pass an output of preprocess() to Session.run()
Pass the output of the model to postprocess()

Info

Default postprocessing implementations are in Python. However, some models have the native postprocessing implemented in Rust and C++ and optimized for FuriosaAI Warboy and Intel/AMD CPUs. Python implementations can run on CPU and GPU as well, whereas the native postprocessor implementations works with only FuriosaAI NPU. Native implementations are designed to leverage FuriosaAI NPU's characteristics even for post-processing and maximize the latency and throughput by using modern CPU architecture, such as CPU cache, SIMD instructions and CPU pipelining. According to our benchmark, the native implementations show at most 70% lower latency.

To use native post processor, please pass use_native=True to Model.load() or Model.load_async(). The following is an example to use native post processor for SSDMobileNet. You can find more details of each mode page.