Release Notes - 0.9.0

Furiosa SDK 0.9.0 is a major release, including many performance enhancements, additional functions, and bug fixes. In partcular, 0.9.0 release includes the significant improvements of the quantization tools.

Component Version Information

Package Name

Version

NPU Driver

1.7.0

NPU Firmware Tools

1.4.0

NPU Firmware Image

1.7.0

HAL (Hardware Abstraction Layer)

0.11.0

Furiosa Compiler

0.9.0

Python SDK (furiosa-runtime, furiosa-server, furiosa-serving, furiosa-quantizer, ..)

0.9.0

NPU Management CLI (furiosactl)

0.11.0

NPU Device Plugin

0.10.1

NPU Feature Discovery

0.2.0

Installing the latest SDK

If you are using APT repository, the upgrade process is simpler.

apt-get update && apt-get upgrade

If you wish to designate a specific package for upgrade, execute as below: You can find more details about APT repository setup at Driver, Firmware, and Runtime Installation.

apt-get update && \
apt-get install -y furiosa-driver-pdma furiosa-libhal-warboy furiosa-libnux furiosactl

You can upgrade firmware as follows:

apt-get update && \
apt-get install -y furiosa-firmware-tools furiosa-firmware-image

You can upgrade Python package as follows:

pip install --upgrade pip setuptools wheel
pip install --upgrade furiosa-sdk

Warning

When installing or upgrading the furiosa-sdk without updating pip to the latest version, you may encounter the following errors.

ERROR: Could not find a version that satisfies the requirement furiosa-quantizer-impl==0.9.* (from furiosa-quantizer==0.9.*->furiosa-sdk) (from versions: none)
ERROR: No matching distribution found for furiosa-quantizer-impl==0.9.* (from furiosa-quantizer==0.9.*->furiosa-sdk)

Major changes

Quantization tool

Quantization tool is a library that converts a pre-trained model to a quantized model. You can refer to more details at Model Quantization 0.9.0 release includes the API improvement and new calibration methods, possibly leading to better accuracy.

  • Added new quantization-related APIs that are more flexible and solid. (furiosa.quantizer , furiosa.optimizer )

optimized_onnx_model = optimize_model(source_onnx_model)
calibrator = Calibrator(optimized_onnx_model, CalibrationMethod.MIN_MAX_ASYM)
for calibration_data, _ in tqdm.tqdm(calibration_dataloader, desc="Calibration", unit="images", mininterval=0.5):
  calibrator.collect_data([[calibration_data.numpy()]])
ranges = calibrator.compute_range()
quantizated_graph = quantize(optimized_onnx_model, ranges)
  • Added an option to decide whether to perform quantize at the beginning of the model.

    • Instead of without_quantize being removed from the compiler options, it can be specified via the argument with_quantize to the quantize function.

  • The normalized_pixel_outputs argument to the quantize function can be set to convert the model output to uint8 instead of dequantizing to fp32.

    • A tensor with an element range of (0. , 1.) can be optimized to convert to pixel data in uint8.

  • Provides more calibration methods.

Supported Calibration Methods

Calibration Method

Asymmetric

QuasiSymmetric

Min-Max

MIN_MAX_ASYM

MIN_MAX_SYM

Entropy

ENTROPY_ASYM

ENTROPY_SYM

Percentile

PERCENTILE_ASYM

PERCENTILE_SYM

Mean squared error

MSE_ASYM

MSE_SYM

Signal-to-quantization-noise ratio

SQNR_ASYM

SQNR_SYM

To ensure the effectiveness of new calibration methods, we measured the accuracy of 10 popular models with the new calibration methods. Among them, 8 models showed better accuracy than the existing calibration methods. For example, the accuracy of EfficientNet-B0 increased by 57.452%. With the min-max calibration method, EfficientNet-B0 had an accuracy of 16.104%. In contrast, with the percentile calibration method, the accuracy was 73.556%. The details of the experiment results can be found at Quantization Accuracy.

For more information on installing and using the new quantizer, you can refer to the following examples.

Compiler

  • Added acceleration support for operators Lower, Unlower

  • Added acceleration support for operator Dequantize

  • Support for executing binaries that are larger than the hardware’s instruction memory

  • Improved scheduler and memory allocator to eliminate unnecessary I/O

  • Various improvements optimize compilation for better execution performance

furiosa-toolkit

The furiosactl command-line tool included in the furiosa-toolkit 0.11.0 release includes improvements to the includes the following major improvements

The newly added furiosactl top command is used to view utilization by NPU device over time.

$ furiosactl top --interval 200
NOTE: furiosa top is under development. Usage and output formats may change.
Please enter Ctrl+C to stop.
Datetime                        PID       Device        NPU(%)   Comp(%)   I/O(%)   Command
2023-03-21T09:45:56.699483936Z  152616    npu1pe0-1      19.06    100.00     0.00   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:56.906443888Z  152616    npu1pe0-1      51.09     93.05     6.95   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.110489333Z  152616    npu1pe0-1      46.40     97.98     2.02   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.316060982Z  152616    npu1pe0-1      51.43    100.00     0.00   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.521140588Z  152616    npu1pe0-1      54.28     94.10     5.90   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.725910558Z  152616    npu1pe0-1      48.93     98.93     1.07   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.935041998Z  152616    npu1pe0-1      47.91    100.00     0.00   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:58.13929122Z   152616    npu1pe0-1      49.06     94.94     5.06   ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf

The furiosactl info command has been improved to display concise information about each device. As before, you can enter the --full option if you want to see more information about a device.

$ furiosactl info
+------+--------+----------------+-------+--------+--------------+
| NPU  | Name   | Firmware       | Temp. | Power  | PCI-BDF      |
+------+--------+----------------+-------+--------+--------------+
| npu1 | warboy | 1.6.0, 3c10fd3 |  54°C | 0.99 W | 0000:44:00.0 |
+------+--------+----------------+-------+--------+--------------+

$ furiosactl info --full
+------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+
| NPU  | Name   | UUID                                 | S/N               | Firmware       | Temp. | Power  | PCI-BDF      | PCI-DEV |
+------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+
| npu1 | warboy | 00000000-0000-0000-0000-000000000000 | WBYB0000000000000 | 1.6.0, 3c10fd3 |  54°C | 0.99 W | 0000:44:00.0 | 511:0   |
+------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+

More information about installing and using furiosactl can be found in furiosa-toolkit.