Command Line Tools
Through the command line tools, Furiosa SDK provides functions such as monitoring NPU device information, compiling models, and checking compatibility between models and SDKs. This section explains how to install and use each command line tool.
furiosa-toolkit
furiosa-toolkit
provides a command line tool that enables users to manage and check the information of NPU devices.
furiosa-toolkit installation
To use this command line tool, you first need to install the kernel driver as shown in Driver, Firmware, and Runtime Installation. Subsequently, follow the instructions below to install furiosa-toolkit.
sudo apt-get install -y furiosa-toolkit
furiosactl
The furiosactl command provides a variety of subcommands and has the ability to obtain information or control the device.
furiosactl <sub command> [option] ..
furiosactl info
After installing the kernel driver, you can use the furiosactl
command to check whether the NPU device is recognized.
Currently, this command provides the furiosactl info
command to output temperature, power consumption and PCI information of the NPU device.
If the device is not visible with this command after mounting it on the machine, Driver, Firmware, and Runtime Installation to install the driver.
If you add the --full
option to the info
command, you can see the device’s UUID and serial number information together.
$ furiosactl info
+------+--------+----------------+-------+--------+--------------+
| NPU | Name | Firmware | Temp. | Power | PCI-BDF |
+------+--------+----------------+-------+--------+--------------+
| npu1 | warboy | 1.6.0, 3c10fd3 | 54°C | 0.99 W | 0000:44:00.0 |
+------+--------+----------------+-------+--------+--------------+
$ furiosactl info --full
+------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+
| NPU | Name | UUID | S/N | Firmware | Temp. | Power | PCI-BDF | PCI-DEV |
+------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+
| npu1 | warboy | 00000000-0000-0000-0000-000000000000 | WBYB0000000000000 | 1.6.0, 3c10fd3 | 54°C | 0.99 W | 0000:44:00.0 | 511:0 |
+------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+
furiosactl list
The list
subcommand provides information about the device files available on the NPU device.
You can also check whether each core present in the NPU is in use or idle.
furiosactl list
+------+------------------------------+-----------------------------------+
| NPU | Cores | DEVFILES |
+------+------------------------------+-----------------------------------+
| npu1 | 0 (available), 1 (available) | npu1, npu1pe0, npu1pe1, npu1pe0-1 |
+------+------------------------------+-----------------------------------+
furiosactl ps
The ps
subcommand prints information about the OS process currently occupying the NPU device.
$ furiosactl ps
+-----------+--------+------------------------------------------------------------+
| NPU | PID | CMD |
+-----------+--------+------------------------------------------------------------+
| npu0pe0-1 | 132529 | /usr/bin/python3 /usr/local/bin/uvicorn image_classify:app |
+-----------+--------+------------------------------------------------------------+
furiosactl top
(experimental)
The top
subcommand is used to view utilization by NPU unit over time.
The output has the following meaning
By default, utilization is calculated every 1 second, but you can set the calculation interval yourself with the --interval
option. (unit: ms)
Item |
Description |
---|---|
Datetime |
Observation time |
PID |
Process ID that is using the NPU |
Device |
NPU device in use |
NPU(%) |
Percentage of time the NPU was used during the observation time. |
Comp(%) |
Percentage of time the NPU was used for computation during the observation time |
I/O (%) |
Percentage of time the NPU was used for I/O out of the time the NPU was used |
Command |
Executed command line of the process |
$ furiosactl top --interval 200
NOTE: furiosa top is under development. Usage and output formats may change.
Please enter Ctrl+C to stop.
Datetime PID Device NPU(%) Comp(%) I/O(%) Command
2023-03-21T09:45:56.699483936Z 152616 npu1pe0-1 19.06 100.00 0.00 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:56.906443888Z 152616 npu1pe0-1 51.09 93.05 6.95 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.110489333Z 152616 npu1pe0-1 46.40 97.98 2.02 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.316060982Z 152616 npu1pe0-1 51.43 100.00 0.00 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.521140588Z 152616 npu1pe0-1 54.28 94.10 5.90 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.725910558Z 152616 npu1pe0-1 48.93 98.93 1.07 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.935041998Z 152616 npu1pe0-1 47.91 100.00 0.00 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:58.13929122Z 152616 npu1pe0-1 49.06 94.94 5.06 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
furiosa-bench (Benchmark Tool)
furiosa-bench
command carries out a benchmark with a ONNX or TFLite model and a workload using furiosa-runtime. A benchmark result includes tail latency and QPS.
The arguments of the command are as follows:
$ furiosa-bench --help
USAGE:
furiosa-bench [OPTIONS] <model-path>
OPTIONS:
-b, --batch <number> Sets the number of batch size, which should be exponents of two [default: 1]
-o, --output <bench-result-path> Create json file that has information about the benchmark
-C, --compiler-config <compiler-config> Sets a file path for compiler configuration (YAML format)
-d, --devices <devices> Designates NPU devices to be used (e.g., "warboy(2)*1" or "npu0pe0-1")
-h, --help Prints help information
-t, --io-threads <number> Sets the number of I/O Threads [default: 1]
--duration <min-duration> Sets the minimum test time in seconds. Both min_query_count and min_duration should be met to finish the test
[default: 0]
-n, --queries <min-query-count> Sets the minimum number of test queries. Both min_query_count and min_duration_ms should be met to finish the
test [default: 1]
-T, --trace-output <trace-output> Sets a file path for profiling result (Chrome Trace JSON format)
-V, --version Prints version information
-v, --verbose Print verbose log
-w, --workers <number> Sets the number of workers [default: 1]
--workload <workload> Sets the bench workload which can be either latency-oriented (L) or throughput-oriented (T) [default: L]
ARGS:
<model-path>
MODEL_PATH is the file path of ONNX, TFLite or ENF (format produced by furiosa-compiler).
The following is an example usage of furiosa-bench without an output path option (i.e., --output
):
$ furiosa-bench mnist-8.onnx --workload L -n 1000 -w 8 -t 2
======================================================================
This benchmark was executed with latency-workload which prioritizes latency of individual queries over throughput.
1000 queries executed with batch size 1
Latency stats are as follows
QPS(Throughput): 34.40/s
Per-query latency:
Min latency (us) : 8399
Max latency (us) : 307568
Mean latency (us) : 29040
50th percentile (us): 19329
95th percentile (us): 62797
99th percentile (us): 79874
99th percentile (us): 307568
If an output path is specified, furiosa-bench will save a JSON document as follows:
$ furiosa-bench mnist-8.onnx --workload L -n 1000 -w 8 -t 2 -o mnist.json
$ cat mnist.json
{
"model_data": {
"path": "./mnist-8.onnx",
"md5": "d7cd24a0a76cd492f31065301d468c3d ./mnist-8.onnx"
},
"compiler_version": "0.10.0-dev (rev: 2d862de8a built_at: 2023-07-13T20:05:04Z)",
"hal_version": "Version: 0.12.0-2+nightly-230716",
"git_revision": "fe6f77a",
"result": {
"mode": "Latency",
"total run time": "30025 us",
"total num queries": 1000,
"batch size": 1,
"qps": "33.31/s",
"latency stats": {
"min": "8840 us",
"max": "113254 us",
"mean": "29989 us",
"50th percentile": "18861 us",
"95th percentile": "64927 us",
"99th percentile": "87052 us",
"99.9th percentile": "113254 us"
}
}
}
furiosa
The furiosa
command is a meta-command line tool that can be used by installing the Python SDK <PythonSDK>.
Additional subcommands are also added when the extension package is installed.
If the Python execution environment is not prepared, refer to Python execution environment setup.
Installing command line tool.
$ pip install furiosa-sdk
Verifying installation.
$ furiosa compile --version
libnpu.so --- v2.0, built @ fe1fca3
0.5.0 (rev: 49b97492a built at 2021-12-07 04:07:08) (wrapper: None)
furiosa compile
The compile
command compiles models such as ONNX and TFLite, generating programs that utilize FuriosaAI NPU.
Detailed explanations and options can be found in the furiosa-compiler page.
furiosa litmus (Model Compatibility Checker)
The litmus
is a tool to check quickly if an ONNX model can work normally with Furiosa SDK using NPU.
litmus
goes through all usage steps of Furiosa SDK, including quantization, compilation, and inferences on FuriosaAI NPU.
litmus
is also a useful bug reporting tool. If you specify --dump
option, litmus
will collect logs and environment information and dump an archive file.
The archive file can be used to report issues.
The steps executed by litmus
command are as follows.
Step1: Load an input model and check it is a valid model.
Step2: Quantize the model with random calibration.
Step3: Compile the quantized model.
Step4: Inference the compiled model using
furiosa-bench
. This step is skipped iffuriosa-bench
was not installed.
Usage:
furiosa-litmus [-h] [--dump OUTPUT_PREFIX] [--skip-quantization] [--target-npu TARGET_NPU] [-v] model_path
A simple example using litmus
command is as follows.
$ furiosa litmus model.onnx
libfuriosa_hal.so --- v0.11.0, built @ 43c901f
INFO:furiosa.common.native:loaded native library libfuriosa_compiler.so.0.10.0 (0.10.0-dev d7548b7f6)
furiosa-quantizer 0.10.0 (rev. 9ecebb6) furiosa-litmus 0.10.0 (rev. 9ecebb6)
[Step 1] Checking if the model can be loaded and optimized ...
[Step 1] Passed
[Step 2] Checking if the model can be quantized ...
[Step 2] Passed
[Step 3] Checking if the model can be compiled for the NPU family [warboy-2pe] ...
[1/6] 🔍 Compiling from onnx to dfg
Done in 0.09272794s
[2/6] 🔍 Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph(LAS)...Done in 9.034934s
▪▪▪▪▪ [2/3] Lowering graph(LAS)...Done in 20.140083s
▪▪▪▪▪ [3/3] Optimizing graph...Done in 0.019548794s
Done in 29.196825s
[3/6] 🔍 Compiling from ldfg to cdfg
Done in 0.001701888s
[4/6] 🔍 Compiling from cdfg to gir
Done in 0.015205072s
[5/6] 🔍 Compiling from gir to lir
Done in 0.0038304s
[6/6] 🔍 Compiling from lir to enf
Done in 0.020943863s
✨ Finished in 29.331545s
[Step 3] Passed
[Step 4] Perform inference once for data collection... (Optional)
✨ Finished in 0.000001198s
======================================================================
This benchmark was executed with latency-workload which prioritizes latency of individual queries over throughput.
1 queries executed with batch size 1
Latency stats are as follows
QPS(Throughput): 125.00/s
Per-query latency:
Min latency (us) : 7448
Max latency (us) : 7448
Mean latency (us) : 7448
50th percentile (us): 7448
95th percentile (us): 7448
99th percentile (us): 7448
99th percentile (us): 7448
[Step 4] Finished
If you have quantized model already, you can skip Step1 and Step2 with --skip-quantization
option.
$ furiosa litmus --skip-quantization quantized-model.onnx
libfuriosa_hal.so --- v0.11.0, built @ 43c901f
INFO:furiosa.common.native:loaded native library libfuriosa_compiler.so.0.10.0 (0.10.0-dev d7548b7f6)
furiosa-quantizer 0.10.0 (rev. 9ecebb6) furiosa-litmus 0.10.0 (rev. 9ecebb6)
[Step 1] Skip model loading and optimization
[Step 2] Skip model quantization
[Step 1 & Step 2] Load quantized model ...
[Step 3] Checking if the model can be compiled for the NPU family [warboy-2pe] ...
...
You can use the --dump <path>
option to create a <path>-<unix_epoch>.zip file that contains metadata necessary for analysis, such as compilation logs, runtime logs, software versions, and execution environments.
If you have any problems, you can get support through FuriosaAI customer service center with this zip file.
$ furiosa litmus --dump archive model.onnx
libfuriosa_hal.so --- v0.11.0, built @ 43c901f
INFO:furiosa.common.native:loaded native library libfuriosa_compiler.so.0.10.0 (0.10.0-dev d7548b7f6)
furiosa-quantizer 0.10.0 (rev. 9ecebb6) furiosa-litmus 0.10.0 (rev. 9ecebb6)
[Step 1] Checking if the model can be loaded and optimized ...
[Step 1] Passed
...
$ zipinfo -1 archive-1690438803.zip
archive-16904388032l4hoi3h/meta.yaml
archive-16904388032l4hoi3h/compiler/compiler.log
archive-16904388032l4hoi3h/compiler/memory-analysis.html
archive-16904388032l4hoi3h/compiler/model.dot
archive-16904388032l4hoi3h/runtime/trace.json