furiosa.quantizer package

class furiosa.quantizer.calibrator.Calibrator(model: ModelProto | bytes, calibration_method: CalibrationMethod, *, percentage: float = 99.99)

Bases: object

Calibrator.

This collects the values of tensors in an ONNX model and computes their ranges.

collect_data(calibration_dataset: Iterable[Sequence[ndarray]]) → None

Collect the values of tensors that will be used for range computation.

This can be called multiple times.

Parameters:: calibration_dataset (Iterable[Sequence[numpy.ndarray]]) – An object that provides input data for the model one at a time.

compute_range(verbose: bool = False) → Dict[str, Tuple[float, float]]

Estimate the ranges of the tensors on the basis of the collected data.

Parameters:

verbose (bool) – Whether to show a progress bar, Defaults to False.

Returns:

A dictionary that maps a: tensor name to a tuple of the tensor’s min and max.

Return type:

Dict[str, Tuple[float, float]]

furiosa.quantizer.editor module

class furiosa.quantizer.editor.ModelEditor(model: ModelProto)

Bases: object

A utility class for manipulating ONNX models.

convert_input_type(tensor_name: str, tensor_type: TensorType) → None

Convert the element type of an input tensor named tensor_name to tensor_type.

Parameters:

tensor_name (str) – The name of an input tensor to convert.
tensor_type (TensorType) – The desired element type.

convert_output_type(tensor_name: str, tensor_type: TensorType, tensor_range: Tuple[float, float] | None = None) → None

Convert the element type of an output tensor named tensor_name to tensor_type.

Parameters:

tensor_name (str) – The name of an output tensor to convert.
tensor_type (TensorType) – The desired element type.
tensor_range (Optional[Tuple[float, float]]) – A new min/max range of the output tensor. If it is None, the original range will be retained. Defaults to None.

class furiosa.quantizer.editor.TensorType(value)

Bases: IntEnum

An enumeration class representing the element type of a tensor.

This class is used with the ModelEditor.convert_{input,output}_type method to specify the desired element type.

INT8 = 2

UINT8 = 1

furiosa.quantizer.editor.get_output_names(model: ModelProto) → List[str]

Return the names of outputs in an ONNX model.

Parameters:: model (onnx.ModelProto) – An ONNX model.
Returns:: A list of the names of outputs in the model.
Return type:: List[str]

furiosa.quantizer.editor.get_pure_input_names(model: ModelProto) → List[str]

Return the names of inputs in an ONNX model that have no associated initializers.

Parameters:: model (onnx.ModelProto) – An ONNX model.
Returns:: A list of the names of inputs in the model that have no associated initializers.
Return type:: List[str]

Module contents

A FuriosaAI qunatizer.

class furiosa.quantizer.CalibrationMethod(value)

Bases: IntEnum

Calibration method.

MIN_MAX_ASYM

Min-max calibration (Asymmetric).

Type:: CalibrationMethod

MIN_MAX_SYM

Min-max calibration (Symmetric).

Type:: CalibrationMethod

ENTROPY_ASYM

Entropy calibration (Aymmetric).

Type:: CalibrationMethod

ENTROPY_SYM

Entropy calibration (Symmetric).

Type:: CalibrationMethod

PERCENTILE_ASYM

Percentile calibration (Asymmetric).

Type:: CalibrationMethod

PERCENTILE_SYM

Percentile calibration (Symmetric).

Type:: CalibrationMethod

MSE_ASYM

Mean squared error (MSE) calibration (Asymmetric).

Type:: CalibrationMethod

MSE_SYM

Mean squared error (MSE) calibration (Symmetric).

Type:: CalibrationMethod

SQNR_ASYM

Signal-to-quantization-noise ratio (SQNR) calibration (Asymmetric).

Type:: CalibrationMethod

SQNR_SYM

Signal-to-quantization-noise ratio (SQNR) calibration (Symmetric).

Type:: CalibrationMethod

ENTROPY_ASYM = 2

ENTROPY_SYM = 3

MIN_MAX_ASYM = 0

MIN_MAX_SYM = 1

MSE_ASYM = 6

MSE_SYM = 7

PERCENTILE_ASYM = 4

PERCENTILE_SYM = 5

SQNR_ASYM = 8

SQNR_SYM = 9

class furiosa.quantizer.Calibrator(model: ModelProto | bytes, calibration_method: CalibrationMethod, *, percentage: float = 99.99)

Bases: object

Calibrator.

This collects the values of tensors in an ONNX model and computes their ranges.

collect_data(calibration_dataset: Iterable[Sequence[ndarray]]) → None

Collect the values of tensors that will be used for range computation.

This can be called multiple times.

Parameters:: calibration_dataset (Iterable[Sequence[numpy.ndarray]]) – An object that provides input data for the model one at a time.

compute_range(verbose: bool = False) → Dict[str, Tuple[float, float]]

Estimate the ranges of the tensors on the basis of the collected data.

Parameters:

verbose (bool) – Whether to show a progress bar, Defaults to False.

Returns:

A dictionary that maps a: tensor name to a tuple of the tensor’s min and max.

Return type:

Dict[str, Tuple[float, float]]

class furiosa.quantizer.ModelEditor(model: ModelProto)

Bases: object

A utility class for manipulating ONNX models.

convert_input_type(tensor_name: str, tensor_type: TensorType) → None

Convert the element type of an input tensor named tensor_name to tensor_type.

Parameters:

tensor_name (str) – The name of an input tensor to convert.
tensor_type (TensorType) – The desired element type.

convert_output_type(tensor_name: str, tensor_type: TensorType, tensor_range: Tuple[float, float] | None = None) → None

Convert the element type of an output tensor named tensor_name to tensor_type.

Parameters:

tensor_name (str) – The name of an output tensor to convert.
tensor_type (TensorType) – The desired element type.
tensor_range (Optional[Tuple[float, float]]) – A new min/max range of the output tensor. If it is None, the original range will be retained. Defaults to None.

class furiosa.quantizer.TensorType(value)

Bases: IntEnum

An enumeration class representing the element type of a tensor.

This class is used with the ModelEditor.convert_{input,output}_type method to specify the desired element type.

INT8 = 2

UINT8 = 1

furiosa.quantizer.get_output_names(model: ModelProto) → List[str]

Return the names of outputs in an ONNX model.

Parameters:: model (onnx.ModelProto) – An ONNX model.
Returns:: A list of the names of outputs in the model.
Return type:: List[str]

furiosa.quantizer.get_pure_input_names(model: ModelProto) → List[str]

Return the names of inputs in an ONNX model that have no associated initializers.

Parameters:: model (onnx.ModelProto) – An ONNX model.
Returns:: A list of the names of inputs in the model that have no associated initializers.
Return type:: List[str]

furiosa.quantizer.quantize(model: ModelProto | bytes, tensor_name_to_range: Mapping[str, Sequence[float]]) → bytes

Quantize an ONNX model on the basis of the range of its tensors.

Parameters:

model (onnx.ModelProto or bytes) – An ONNX model to quantize.
tensor_name_to_range (Mapping[str, Sequence[float]]) – A mapping from a tensor name to a 2-tuple (or list) of the tensor’s min and max.

Returns:

A serialized ONNX model that incorporates quantization: information.

Return type:

bytes