$$ \begin{align*}
re&al_value_{Single} \
&= roundToNearestFloat((affine_value_{uint8 , or , uint16} -
zero_point_{uint8 , or , uint16})_{sint32})_{Single} * scale_{Single}
\end{align*} $$
在上面的例子中,假设减法的结果,32位有符号整数格式,并且$$roundToNearestFloat$$返回Single精度。
仿射到不动点
当仿射标度和不动点标度相同时,从仿射值中减去零点得到等价的不固定值。
$$ scaled_value = affine_value_{non\mbox{-}negative} - zero_point_{non\mbox{-}negative} $$
Fixed point to affine
当仿射尺度和不动点尺度相同时,将零点加到不动点的值上,得到等价的仿射值。
$$ affine_value_{non\mbox{-}negative} = scaled_value + zero_point_{non\mbox{-}negative} $$
Usage within MLIR
MLIR中正在开发的量化系统有几个内容:
Quantization dialect containing:
A family of which represent the mapping between expressed values (typically of a floating point computer type) and storage values (typically of an integral computer type).
for converting between types based on a QuantizedType and its expressed and storage sub-types.
for assigning instrumentation points within the computation where runtime statistics may help guide the quantization process.
Integration with simulated quantization at training time
The TFLite op-set natively supports uniform-quantized variants.
Passes and tools exist to convert directly from the TensorFlow dialect to the TFLite quantized operation set.
并不是所有的量子化应用都会用到所有这些设置。TensorFlow到TensorFlow Lite的转换,使用QuantizedTypes,但有自己的类型转换算子和支持数学的表达式。
Quantization Dialect
Quantized type
TODO: Flesh this section out.
QuantizedType base class
UniformQuantizedType
Quantized type conversion operations
qcast : Convert from an expressed type to QuantizedType
dcast : Convert from a QuantizedType to its expressed type
scast : Convert between a QuantizedType and its storage type
Instrumentation and constraint operations
const_fake_quant : Emulates the logic of the historic TensorFlow fake_quant_with_min_max_args operation.
stats_ref : Declares that statistics should be gathered at this point with a unique key and made available to future passes of the solver.
stats : Declares inline statistics (per layer and per axis) for the point in the computation. stats_ref ops are generally converted to statistical operations once trial runs have been performed.
coupled_ref : Declares points in the computation to be coupled from a type inference perspective based on a unique key.
Integration with simulated quantization at training time
训练时与模拟量化的集成
TensorFlow历来使用tf.quantization.fake_quant_*模拟训练时,量化效果的算子族。
正如最初实现的那样,TensorFlow Lite是推理时此类操作的主要对象。当启用量化推断时,如果每个合格的张量都经过一个适当的伪量化节点(张量可以应用伪量化的规则,多少有些牵扯),那么TensorFlow Lite将使用伪量化操作的属性,判断如何从量化算子转换为使用kernel子集。
在基于MLIR的量化中,伪量化算子将它们转换成一个序列来处理的,该序列是*qcast*(quantize),然后是*dcast*(dequantize),具有适当的*UniformQuantizedType*作为qcast算子的对象。
后续的编译器传递保留量化,以某种方式模拟的知识,同时允许编译器灵活地移动类型转换,简化了计算,并将其转换为基于积分算子的形式。
允许部分量化的计算,其中不能简化为积分运算的部分,仍然以浮点形式执行,并在边界处进行适当的转换。
TFLite native quantization
TODO: Flesh this out
General algorithm
Take input min/max information and set the ArrayInfo (which really is InputOrOutputArrayInfo.
In LegalizeTF, convert ArrayInfo min/max to tf.Quantize and tf.Dequantize nodes. (or tf.FakeQuant) Convert all constant FakeQuants to (tf.FQ -> tfl.Q -> tfl.DQ).
Hardcode logic/propagation needs to happen here.
Run TF constant folding.
In PrepareTFL, convert all tf.FQ to (tfl.Q -> tfl.DQ).
Run quantization pass that take (tfl.DQ (for both input and weights) -> op -> tfl.Q) and replaces with (op). Also replace (constant_float -> tfl.Q) with (constant_quant).