Compiler
Furiosa compiler takes TFLite or Onnx format models as input arguments and compiles the input models to program binary executable on Furiosa NPU. During this compilation, the input model is analyzed on operator-level and the-state-of-the-art optimizations are done for efficient inference which exploiting Furiosa NPU and resources (CPU and memory) of host machine. If a model contains only operators on List of Supported Operators for NPU Acceleration, then the model is most likely to be accelerated effectively on Furiosa NPU.
furiosa compile
Compiler is mainly called by inference API when initializing Session for prepare inference with model and NPU.
Command furiosa compile
enables user to directly compile input models to executable binaries.
Please refer to Python SDK Installation & User Guide for installation of command furiosa compile
.
MODEL_PATH
is a file path for
TFLite or Onnx format file.
furiosa compile MODEL_PATH [-o OUTPUT] [--target-npu TARGET_NPU] [--batch BATCH_SIZE]
Option -o OUTPUT is optional and the name of output file is determined by the given name.
The default output name is output.enf
when the OUTPUT is not given. Here ENF denotes Executable NPU Format.
The following command generates the output.enf
executable binary file.
furiosa compile foo.onnx
The following command generates the foo.enf
executable binary file.
furiosa compile foo.onnx -o foo.enf
Option --target-npu
determines the target of NPU for generated executable binary
NPU Family |
Number of PEs |
Value |
---|---|---|
Warboy |
1 |
warboy |
Warboy |
2 |
warboy-2pe |
If the generated executable binary is run on single PE of Warboy then following command is useful.
furiosa compile foo.onnx --target-npu warboy
If the generated executable binary is run on fused two PEs of Warboy then following command is useful.
furiosa compile foo.onnx --target-npu warboy-2pe
Option --batch-size
determines the batch size of input data.
Generally the larger batch size is, the better throughput of inference is obtained.
However, more memory I/O operations are needed when required memory size is over than the size of NPU DRAM. More I/O operations deteriorate overall performance. The default value of --batch-size
is one and the optimal batch can be found by experiments.
For reference, the optimal batch sizes for MLPerf™ Inference Edge v1.1 models are like the following:
Model |
Optimal Batch |
---|---|
SSD-MobileNets-v1 |
2 |
Resnet50-v1.5 |
1 |
SSD-ResNet34 |
1 |
The batch size two can be given like the following:
furiosa compile foo.onnx --batch-size 2
Usage of ENF files
File ENF (Executable NPU Format) is the executable binary as the final result of compiler. Mostly, compilation process takes from several seconds to several minutes. Compilation process can be skipped by using the generated ENF file.
For example, if ENF file is passed to function session.create()
then compilation is skipped and object Session
is instantly created.
from furiosa.runtime import session
sess = session.create("foo.enf")