furiosa.quantizer.frontend.onnx.quantizer package
Submodules
furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper module
- class furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.ClipperElimination(model, name_nodes=True)
Bases:
furiosa.quantizer.frontend.onnx.transformer.ONNXTransformer
,abc.ABC
This class contains methods commonly used in various EliminateClipper patterns
- Eliminate clippers if
a pattern given should be matched.
every consumer’s output quantization parameters of the previous node of clipper should be mutually identical.
- check_condition_1(clip)
- check_condition_2(qlinear)
- We assume multiple dqlinears(DQs) might follow from qlinear(Q) like in …-op-Q-DQ-clip-Q1-…
+-DQ_1-clip-Q1_1-… +- …
Logic below checks the equality of qlinear1s(Q1s)’ quantization parameters.
- check_runnable = False
- pattern_condition_checker(matched_nodes)
- pattern_matching(base_node)
- abstract property pattern_to_match: List[str]
- remove_clip_qdq(clip)
- replace_quant_params(qlinear, qlinear1)
- class furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.EliminateClipper
Bases:
furiosa.quantizer.interfaces.transformer.Transformer
- transform(model: onnx.onnx_ml_pb2.ModelProto) onnx.onnx_ml_pb2.ModelProto
- class furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.Pattern_1(model, name_nodes=True)
Bases:
furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.ClipperElimination
- transform
prev –> Conv –> QuantizeLinear –> DequantizeLinear –> Relu –> QuantizeLinear –> next
- to
prev –> Conv –> QuantizeLinear –> next
- pattern_to_match = ['Conv', 'QuantizeLinear', 'DequantizeLinear', 'Relu', 'QuantizeLinear']
- class furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.Pattern_2(model, name_nodes=True)
Bases:
furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.ClipperElimination
- transform
prev –> Conv –> QuantizeLinear –> DequantizeLinear –> Clip –> QuantizeLinear –> next
- to
prev –> Conv –> QuantizeLinear –> next
if Clip.input[1], Clip.input[2]’s input before fake quantization have initializer
- pattern_to_match = ['Conv', 'QuantizeLinear', 'DequantizeLinear', 'Clip', 'QuantizeLinear']
- class furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.Pattern_3(model, name_nodes=True)
Bases:
furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.ClipperElimination
- transform
prev –> Add –> QuantizeLinear –> DequantizeLinear –> Relu –> QuantizeLinear –> next
- to
prev –> Add –> QuantizeLinear –> next
- pattern_to_match = ['Add', 'QuantizeLinear', 'DequantizeLinear', 'Relu', 'QuantizeLinear']
- class furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.Pattern_4(model, name_nodes=True)
Bases:
furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.ClipperElimination
- transform
prev –> Add –> QuantizeLinear –> DequantizeLinear –> Clip –> QuantizeLinear –> next
- to
prev –> Add –> QuantizeLinear –> next
if Clip.input[1], Clip.input[2]’s input before fake quantization have initializer
- pattern_to_match = ['Add', 'QuantizeLinear', 'DequantizeLinear', 'Clip', 'QuantizeLinear']
- class furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.Pattern_5(model, name_nodes=True)
Bases:
furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.ClipperElimination
- transform
prev –> Conv –> QuantizeLinear –> DequantizeLinear –> Squeeze –> QuantizeLinear –> DequantizeLinear –> Relu –> QuantizeLinear –> next
- to
prev –> Conv –> QuantizeLinear –> DequantizeLinear –> Squeeze –> QuantizeLinear –> next
- pattern_to_match = ['Conv', 'QuantizeLinear', 'DequantizeLinear', 'Squeeze', 'QuantizeLinear', 'DequantizeLinear', 'Relu', 'QuantizeLinear']
- class furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.Pattern_6(model, name_nodes=True)
Bases:
furiosa.quantizer.frontend.onnx.quantizer.eliminate_clipper.ClipperElimination
- transform
prev –> Conv –> QuantizeLinear –> DequantizeLinear –> Squeeze –> QuantizeLinear –> DequantizeLinear –> Clip –> QuantizeLinear –> next
- to
prev –> Conv –> QuantizeLinear –> DequantizeLinear –> Squeeze –> QuantizeLinear –> next
if Clip.input[1], Clip.input[2]’s input before fake quantization have initializer
- pattern_to_match = ['Conv', 'QuantizeLinear', 'DequantizeLinear', 'Squeeze', 'QuantizeLinear', 'DequantizeLinear', 'Clip', 'QuantizeLinear']
furiosa.quantizer.frontend.onnx.quantizer.quantizer module
- class furiosa.quantizer.frontend.onnx.quantizer.quantizer.FuriosaONNXQuantizer(model: onnx.onnx_ml_pb2.ModelProto, per_channel: bool, static: bool, mode: furiosa.quantizer.frontend.onnx.quantizer.utils.QuantizationMode, dynamic_ranges: Dict[str, Tuple[float, float]], raw_data=True)
Bases:
object
- build_quantized_model()
- check_model()
- make_intermediate_representation()
- make_quant_dequant_node(node_input: str, axis: Optional[int] = None, idx: Optional[int] = None) None
- quantize() onnx.onnx_ml_pb2.ModelProto
- quantize_model()
furiosa.quantizer.frontend.onnx.quantizer.quantizer_mode module
- class furiosa.quantizer.frontend.onnx.quantizer.quantizer_mode.DFGImportable(model, raw_data)
Bases:
object
- remove_quantizelinear_operator_with_initializer()
- transform
prev -> QuantizeLinear -> DequantizeLinear -> next
- into
prev’ -> DequantizeLinear -> next
where prev’ is the output(s) of QuantizeLinear
if prev is defined in graph.initializer
- transform()
- transform_to_integer_arithmetic_operator()
- class furiosa.quantizer.frontend.onnx.quantizer.quantizer_mode.ONNXRuntimeExecutable(model, raw_data)
Bases:
furiosa.quantizer.frontend.onnx.quantizer.quantizer_mode.DFGImportable
- transform()
furiosa.quantizer.frontend.onnx.quantizer.utils module
- class furiosa.quantizer.frontend.onnx.quantizer.utils.QuantizationMode(value)
Bases:
enum.Enum
An enumeration.
- DFG = 0
- FAKE = 1
- furiosa.quantizer.frontend.onnx.quantizer.utils.activation_scale_zeropoint(rmin, rmax, activation_qtype)
- furiosa.quantizer.frontend.onnx.quantizer.utils.append_suffix(name: str, suffix: List[str]) List[str]
Helper function to append suffixes to the given name.
- furiosa.quantizer.frontend.onnx.quantizer.utils.asymmetric_scale_zeropoint(rmin, rmax, activation_qtype)
source: onnxruntime quantization tools
- furiosa.quantizer.frontend.onnx.quantizer.utils.calculate_activation_quant_params(dynamic_ranges: Dict, node_list: List[onnx.onnx_ml_pb2.NodeProto], activation_qtype: onnx.onnx_ml_pb2.TensorProto, value_info: Dict) Dict
- furiosa.quantizer.frontend.onnx.quantizer.utils.calculate_weight_quant_params(data: numpy.ndarray, weight_qtype: onnx.onnx_ml_pb2.TensorProto, name: str) Tuple[int, float]
- Parameters
data – data to quantize
weight_qtype – quantization data type of weight
name – name of tensor to quantize
- Returns
quantized weights, zero point, scale
- To pack weights, we compute a linear transformation
when data type == uint8 mode, from [rmin, rmax] -> [0, 2^{b-1}] and
- when data type == int8, from [-m , m] -> [-(2^{b-1}-1), 2^{b-1}-1] where
m = max(abs(rmin), abs(rmax))
and add necessary intermediate nodes to trasnform quantized weight to full weight using the equation r = S(q-z), where
r: real original value q: quantized value S: scale z: zero point
source: onnxruntime quantization tools
- furiosa.quantizer.frontend.onnx.quantizer.utils.get_input_tensors(model: onnx.onnx_ml_pb2.ModelProto) List[Tuple[str, List[int], str]]
- furiosa.quantizer.frontend.onnx.quantizer.utils.get_qrange(qtype)
source: onnxruntime quantization tools
- furiosa.quantizer.frontend.onnx.quantizer.utils.get_vi_dtype(vi)
This function returns value_info’s data type
- Parameters
vi – graph.value_info
- Returns
graph.value_info.type.tensor_type.elem_type
- furiosa.quantizer.frontend.onnx.quantizer.utils.is_float_tensor(vi)