Release Notes - 0.9.0
Furiosa SDK 0.9.0 is a major release, including many performance enhancements, additional functions, and bug fixes. In partcular, 0.9.0 release includes the significant improvements of the quantization tools.
Package Name |
Version |
---|---|
NPU Driver |
1.7.0 |
NPU Firmware Tools |
1.4.0 |
NPU Firmware Image |
1.7.0 |
HAL (Hardware Abstraction Layer) |
0.11.0 |
Furiosa Compiler |
0.9.0 |
Python SDK (furiosa-runtime, furiosa-server, furiosa-serving, furiosa-quantizer, ..) |
0.9.0 |
NPU Management CLI (furiosactl) |
0.11.0 |
NPU Device Plugin |
0.10.1 |
NPU Feature Discovery |
0.2.0 |
Installing the latest SDK
If you are using APT repository, the upgrade process is simpler.
apt-get update && apt-get upgrade
If you wish to designate a specific package for upgrade, execute as below: You can find more details about APT repository setup at Driver, Firmware, and Runtime Installation.
apt-get update && \
apt-get install -y furiosa-driver-pdma furiosa-libhal-warboy furiosa-libnux furiosactl
You can upgrade firmware as following:
apt-get update && \
apt-get install -y furiosa-firmware-tools furiosa-firmware-image
You can upgrade Python package as following:
pip install --upgrade pip setuptools wheel
pip install --upgrade furiosa-sdk
Warning
When installing or upgrading the furiosa-sdk without updating pip to the latest version, you may encounter the following errors.
ERROR: Could not find a version that satisfies the requirement furiosa-quantizer-impl==0.9.* (from furiosa-quantizer==0.9.*->furiosa-sdk) (from versions: none)
ERROR: No matching distribution found for furiosa-quantizer-impl==0.9.* (from furiosa-quantizer==0.9.*->furiosa-sdk)
Major changes
Quantization tool
Quantization tool is a library that converts a pre-trained model to a quantized model. You can refer to more details at Model Quantization 0.9.0 release includes the API improvement and new calibration methods, possibly leading to better accuracy.
Added new quantization-related APIs that are more flexible and solid. (
furiosa.quantizer
,furiosa.optimizer
)
optimized_onnx_model = optimize_model(source_onnx_model)
calibrator = Calibrator(optimized_onnx_model, CalibrationMethod.MIN_MAX_ASYM)
for calibration_data, _ in tqdm.tqdm(calibration_dataloader, desc="Calibration", unit="images", mininterval=0.5):
calibrator.collect_data([[calibration_data.numpy()]])
ranges = calibrator.compute_range()
quantizated_graph = quantize(optimized_onnx_model, ranges)
Added an option to decide whether to perform quantize at the beginning of the model.
Instead of
without_quantize
being removed from the compiler options, it can be specified via the argumentwith_quantize
to thequantize
function.
The
normalized_pixel_outputs
argument to thequantize
function can be set to convert the model output to uint8 instead of dequantizing to fp32.A tensor with an element range of
(0. , 1.)
can be optimized to convert to pixel data in uint8.
Provides more calibration methods.
Calibration Method |
Asymmetric |
QuasiSymmetric |
---|---|---|
Min-Max |
MIN_MAX_ASYM |
MIN_MAX_SYM |
Entropy |
ENTROPY_ASYM |
ENTROPY_SYM |
Percentile |
PERCENTILE_ASYM |
PERCENTILE_SYM |
Mean squared error |
MSE_ASYM |
MSE_SYM |
Signal-to-quantization-noise ratio |
SQNR_ASYM |
SQNR_SYM |
To ensure the effectiveness of new calibration methods, we measured the accuracy of 10 popular models with the new calibration methods. Among them, 8 models showed better accuracy than the existing calibration methods. For example, the accuracy of EfficientNet-B0 increased by 57.452%. With the min-max calibration method, EfficientNet-B0 had an accuracy of 16.104%. In contrast, with the percentile calibration method, the accuracy was 73.556%. The details of the experiment results can be found at Quantization Accuracy.
For more information on installing and using the new quantizer, you can refer to the following examples.
Compiler
Added acceleration support for operators Lower, Unlower
Added acceleration support for operator Dequantize
Support for executing binaries that are larger than the hardware’s instruction memory
Improved scheduler and memory allocator to eliminate unnecessary I/O
Various improvements optimize compilation for better execution performance
furiosa-toolkit
The furiosactl
command-line tool included in the furiosa-toolkit 0.11.0 release includes improvements to the
includes the following major improvements
The newly added furiosactl top
command is used to view utilization by NPU device over time.
$ furiosactl top --interval 200
NOTE: furiosa top is under development. Usage and output formats may change.
Please enter Ctrl+C to stop.
Datetime PID Device NPU(%) Comp(%) I/O(%) Command
2023-03-21T09:45:56.699483936Z 152616 npu1pe0-1 19.06 100.00 0.00 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:56.906443888Z 152616 npu1pe0-1 51.09 93.05 6.95 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.110489333Z 152616 npu1pe0-1 46.40 97.98 2.02 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.316060982Z 152616 npu1pe0-1 51.43 100.00 0.00 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.521140588Z 152616 npu1pe0-1 54.28 94.10 5.90 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.725910558Z 152616 npu1pe0-1 48.93 98.93 1.07 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:57.935041998Z 152616 npu1pe0-1 47.91 100.00 0.00 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
2023-03-21T09:45:58.13929122Z 152616 npu1pe0-1 49.06 94.94 5.06 ./npu_runtime_test -n 10000 results/ResNet-CTC_kor1_200_nightly3_128dpes_8batches.enf
The furiosactl info
command has been improved to display concise information about each device. As before, you can enter the --full
option if you want to see more information about a device.
$ furiosactl info
+------+--------+----------------+-------+--------+--------------+
| NPU | Name | Firmware | Temp. | Power | PCI-BDF |
+------+--------+----------------+-------+--------+--------------+
| npu1 | warboy | 1.6.0, 3c10fd3 | 54°C | 0.99 W | 0000:44:00.0 |
+------+--------+----------------+-------+--------+--------------+
$ furiosactl info --full
+------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+
| NPU | Name | UUID | S/N | Firmware | Temp. | Power | PCI-BDF | PCI-DEV |
+------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+
| npu1 | warboy | 00000000-0000-0000-0000-000000000000 | WBYB0000000000000 | 1.6.0, 3c10fd3 | 54°C | 0.99 W | 0000:44:00.0 | 511:0 |
+------+--------+--------------------------------------+-------------------+----------------+-------+--------+--------------+---------+
More information about installing and using furiosactl
can be found in furiosa-toolkit.