********************************************************* Command Line Tools ********************************************************* Through the command line tools, Furiosa SDK provides functions such as monitoring NPU device information, compiling models, and checking compatibility between models and SDKs. This section explains how to install and use each command line tool. .. _Toolkit: furiosa-toolkit ############################################################ ``furiosa-toolkit`` provides a command line tool that enables users to manage and check the information of NPU devices. furiosa-toolkit installation ======================================== To use this command line tool, you first need to install the kernel driver as shown in :ref:`RequiredPackages`. Subsequently, follow the instructions below to install furiosa-toolkit. .. tabs:: .. tab:: Installation using APT server .. code-block:: sh sudo apt-get install -y furiosa-toolkit furiosactl ======================================== The furiosactl command provides a variety of subcommands and has the ability to obtain information or control the device. .. code-block:: sh furiosactl [option] .. ``furiosactl info`` --------------------------------------------- After installing the kernel driver, you can use the ``furiosactl`` command to check whether the NPU device is recognized. Currently, this command provides the ``furiosactl info`` command to output temperature, power consumption and PCI information of the NPU device. If the device is not visible with this command after mounting it on the machine, :ref:`RequiredPackages` to install the driver. If you add the ``--full`` option to the ``info`` command, you can see the device's UUID and serial number information together. .. code-block:: sh $ furiosactl info +------+--------+----------------+-------+--------+--------------+ | NPU | Name | Firmware | Temp. | Power | PCI-BDF | +------+--------+----------------+-------+--------+--------------+ | npu1 | warboy | 1.6.0, 3c10fd3 | 54°C | 0.99 W | 0000:44:00.0 | +------+--------+----------------+-------+--------+--------------+ $ furiosactl info --full +------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+ | NPU | Name | UUID | S/N | Firmware | Temp. | Power | PCI-BDF | PCI-DEV | +------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+ | npu1 | warboy | 00000000-0000-0000-0000-000000000000 | WBYB0000000000000 | 1.6.0, 3c10fd3 | 54°C | 0.99 W | 0000:44:00.0 | 511:0 | +------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+ ``furiosactl list`` --------------------------------------------- The ``list`` subcommand provides information about the device files available on the NPU device. You can also check whether each core present in the NPU is in use or idle. .. code-block:: sh furiosactl list +------+------------------------------+-----------------------------------+ | NPU | Cores | DEVFILES | +------+------------------------------+-----------------------------------+ | npu1 | 0 (available), 1 (available) | npu1, npu1pe0, npu1pe1, npu1pe0-1 | +------+------------------------------+-----------------------------------+ ``furiosactl ps`` --------------------------------------------- The ``ps`` subcommand prints information about the OS process currently occupying the NPU device. .. code-block:: sh $ furiosactl ps +-----------+--------+------------------------------------------------------------+ | NPU | PID | CMD | +-----------+--------+------------------------------------------------------------+ | npu0pe0-1 | 132529 | /usr/bin/python3 /usr/local/bin/uvicorn image_classify:app | +-----------+--------+------------------------------------------------------------+ ``furiosactl top`` (experimental) --------------------------------------------- The ``top`` subcommand is used to view utilization by NPU unit over time. The output has the following meaning By default, utilization is calculated every 1 second, but you can set the calculation interval yourself with the ``--interval`` option. (unit: ms) .. list-table:: furiosa top fields :widths: 100 400 :header-rows: 1 * - Item - Description * - Datetime - Observation time * - PID - Process ID that is using the NPU * - Device - NPU device in use * - NPU(%) - Percentage of time the NPU was used during the observation time. * - Comp(%) - Percentage of time the NPU was used for computation during the observation time * - I/O (%) - Percentage of time the NPU was used for I/O out of the time the NPU was used * - Command - Executed command line of the process .. code-block:: sh $ furiosactl top --interval 200 NOTE: furiosa top is under development. Usage and output formats may change. Please enter Ctrl+C to stop. Datetime PID Device NPU(%) Comp(%) I/O(%) Command 2023-03-21T09:45:56.699483936Z 152616 npu1pe0-1 19.06 100.00 0.00 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf 2023-03-21T09:45:56.906443888Z 152616 npu1pe0-1 51.09 93.05 6.95 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf 2023-03-21T09:45:57.110489333Z 152616 npu1pe0-1 46.40 97.98 2.02 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf 2023-03-21T09:45:57.316060982Z 152616 npu1pe0-1 51.43 100.00 0.00 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf 2023-03-21T09:45:57.521140588Z 152616 npu1pe0-1 54.28 94.10 5.90 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf 2023-03-21T09:45:57.725910558Z 152616 npu1pe0-1 48.93 98.93 1.07 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf 2023-03-21T09:45:57.935041998Z 152616 npu1pe0-1 47.91 100.00 0.00 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf 2023-03-21T09:45:58.13929122Z 152616 npu1pe0-1 49.06 94.94 5.06 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf .. _FuriosaBench: furiosa-bench (Benchmark Tool) ############################################# ``furiosa-bench`` command carries out a benchmark with a ONNX or TFLite model and a workload using furiosa-runtime. A benchmark result includes tail latency and QPS. The arguments of the command are as follows: .. code-block:: sh $ furiosa-bench --help USAGE: furiosa-bench [OPTIONS] OPTIONS: -b, --batch Sets the number of batch size, which should be exponents of two [default: 1] -o, --output Create json file that has information about the benchmark -C, --compiler-config Sets a file path for compiler configuration (YAML format) -d, --devices Designates NPU devices to be used (e.g., "warboy(2)*1" or "npu0pe0-1") -h, --help Prints help information -t, --io-threads Sets the number of I/O Threads [default: 1] --duration Sets the minimum test time in seconds. Both min_query_count and min_duration should be met to finish the test [default: 0] -n, --queries Sets the minimum number of test queries. Both min_query_count and min_duration_ms should be met to finish the test [default: 1] -T, --trace-output Sets a file path for profiling result (Chrome Trace JSON format) -V, --version Prints version information -v, --verbose Print verbose log -w, --workers Sets the number of workers [default: 1] --workload Sets the bench workload which can be either latency-oriented (L) or throughput-oriented (T) [default: L] ARGS: MODEL_PATH is the file path of ONNX, TFLite or ENF (format produced by :ref:`CompilerCli`). The following is an example usage of furiosa-bench without an output path option (i.e., ``--output`` ): .. code-block:: sh $ furiosa-bench mnist-8.onnx --workload L -n 1000 -w 8 -t 2 ====================================================================== This benchmark was executed with latency-workload which prioritizes latency of individual queries over throughput. 1000 queries executed with batch size 1 Latency stats are as follows QPS(Throughput): 34.40/s Per-query latency: Min latency (us) : 8399 Max latency (us) : 307568 Mean latency (us) : 29040 50th percentile (us): 19329 95th percentile (us): 62797 99th percentile (us): 79874 99th percentile (us): 307568 If an output path is specified, furiosa-bench will save a JSON document as follows: .. code-block:: sh $ furiosa-bench mnist-8.onnx --workload L -n 1000 -w 8 -t 2 -o mnist.json $ cat mnist.json { "model_data": { "path": "./mnist-8.onnx", "md5": "d7cd24a0a76cd492f31065301d468c3d ./mnist-8.onnx" }, "compiler_version": "0.10.0-dev (rev: 2d862de8a built_at: 2023-07-13T20:05:04Z)", "hal_version": "Version: 0.12.0-2+nightly-230716", "git_revision": "fe6f77a", "result": { "mode": "Latency", "total run time": "30025 us", "total num queries": 1000, "batch size": 1, "qps": "33.31/s", "latency stats": { "min": "8840 us", "max": "113254 us", "mean": "29989 us", "50th percentile": "18861 us", "95th percentile": "64927 us", "99th percentile": "87052 us", "99.9th percentile": "113254 us" } } } furiosa ############################################# The ``furiosa`` command is a meta-command line tool that can be used by installing the `Python SDK `. Additional subcommands are also added when the extension package is installed. If the Python execution environment is not prepared, refer to :any:`SetupPython`. Installing command line tool. .. code-block:: sh $ pip install furiosa-sdk Verifying installation. .. code-block:: sh $ furiosa compile --version libnpu.so --- v2.0, built @ fe1fca3 0.5.0 (rev: 49b97492a built at 2021-12-07 04:07:08) (wrapper: None) furiosa compile ======================================= The ``compile`` command compiles models such as `ONNX `_ and `TFLite `_, generating programs that utilize FuriosaAI NPU. Detailed explanations and options can be found in the :ref:`CompilerCli` page. .. _Litmus: furiosa litmus (Model Compatibility Checker) ======================================================== The ``litmus`` is a tool to check quickly if an `ONNX`_ model can work normally with Furiosa SDK using NPU. ``litmus`` goes through all usage steps of Furiosa SDK, including quantization, compilation, and inferences on FuriosaAI NPU. ``litmus`` is also a useful bug reporting tool. If you specify ``--dump`` option, ``litmus`` will collect logs and environment information and dump an archive file. The archive file can be used to report issues. The steps executed by ``litmus`` command are as follows. - Step1: Load an input model and check it is a valid model. - Step2: Quantize the model with random calibration. - Step3: Compile the quantized model. - Step4: Inference the compiled model using ``furiosa-bench``. This step is skipped if ``furiosa-bench`` was not installed. Usage: .. code-block:: sh furiosa-litmus [-h] [--dump OUTPUT_PREFIX] [--skip-quantization] [--target-npu TARGET_NPU] [-v] model_path A simple example using ``litmus`` command is as follows. .. code-block:: sh $ furiosa litmus model.onnx libfuriosa_hal.so --- v0.11.0, built @ 43c901f INFO:furiosa.common.native:loaded native library libfuriosa_compiler.so.0.10.0 (0.10.0-dev d7548b7f6) furiosa-quantizer 0.10.0 (rev. 9ecebb6) furiosa-litmus 0.10.0 (rev. 9ecebb6) [Step 1] Checking if the model can be loaded and optimized ... [Step 1] Passed [Step 2] Checking if the model can be quantized ... [Step 2] Passed [Step 3] Checking if the model can be compiled for the NPU family [warboy-2pe] ... [1/6] 🔍 Compiling from onnx to dfg Done in 0.09272794s [2/6] 🔍 Compiling from dfg to ldfg ▪▪▪▪▪ [1/3] Splitting graph(LAS)...Done in 9.034934s ▪▪▪▪▪ [2/3] Lowering graph(LAS)...Done in 20.140083s ▪▪▪▪▪ [3/3] Optimizing graph...Done in 0.019548794s Done in 29.196825s [3/6] 🔍 Compiling from ldfg to cdfg Done in 0.001701888s [4/6] 🔍 Compiling from cdfg to gir Done in 0.015205072s [5/6] 🔍 Compiling from gir to lir Done in 0.0038304s [6/6] 🔍 Compiling from lir to enf Done in 0.020943863s ✨ Finished in 29.331545s [Step 3] Passed [Step 4] Perform inference once for data collection... (Optional) ✨ Finished in 0.000001198s ====================================================================== This benchmark was executed with latency-workload which prioritizes latency of individual queries over throughput. 1 queries executed with batch size 1 Latency stats are as follows QPS(Throughput): 125.00/s Per-query latency: Min latency (us) : 7448 Max latency (us) : 7448 Mean latency (us) : 7448 50th percentile (us): 7448 95th percentile (us): 7448 99th percentile (us): 7448 99th percentile (us): 7448 [Step 4] Finished If you have quantized model already, you can skip Step1 and Step2 with ``--skip-quantization`` option. .. code-block:: sh $ furiosa litmus --skip-quantization quantized-model.onnx libfuriosa_hal.so --- v0.11.0, built @ 43c901f INFO:furiosa.common.native:loaded native library libfuriosa_compiler.so.0.10.0 (0.10.0-dev d7548b7f6) furiosa-quantizer 0.10.0 (rev. 9ecebb6) furiosa-litmus 0.10.0 (rev. 9ecebb6) [Step 1] Skip model loading and optimization [Step 2] Skip model quantization [Step 1 & Step 2] Load quantized model ... [Step 3] Checking if the model can be compiled for the NPU family [warboy-2pe] ... ... You can use the ``--dump `` option to create a `-.zip` file that contains metadata necessary for analysis, such as compilation logs, runtime logs, software versions, and execution environments. If you have any problems, you can get support through `FuriosaAI customer service center `_ with this zip file. .. code-block:: sh $ furiosa litmus --dump archive model.onnx libfuriosa_hal.so --- v0.11.0, built @ 43c901f INFO:furiosa.common.native:loaded native library libfuriosa_compiler.so.0.10.0 (0.10.0-dev d7548b7f6) furiosa-quantizer 0.10.0 (rev. 9ecebb6) furiosa-litmus 0.10.0 (rev. 9ecebb6) [Step 1] Checking if the model can be loaded and optimized ... [Step 1] Passed ... $ zipinfo -1 archive-1690438803.zip archive-16904388032l4hoi3h/meta.yaml archive-16904388032l4hoi3h/compiler/compiler.log archive-16904388032l4hoi3h/compiler/memory-analysis.html archive-16904388032l4hoi3h/compiler/model.dot archive-16904388032l4hoi3h/runtime/trace.json