ONNX and QONNX
Parsing of ONNX and QONNX models is made in conjunction with the qonnx package, even if it no quantization is used. This is a common initial parser shared with the AMD/Xilinx FINN project. The first step is to do constant folding, shape inference, etc., on the ONNX graph, commonly known as cleaning. If a model has convolution layers, the model also needs to be converted to a channels-last format, since that is what hls4ml mainly supports. The qonnx
package also provides a number of additional transforms that may need to be used. For example, Gemm
nodes need to converted to MatMul
and Add
nodes.
There are command-line based versions of cleaning and channels-last conversion:
$ qonnx_clean filename.onnx
$ qonnx_to_channels_last filename_clean.onnx
$ qonnx_clean filename_clean_channels_last.onnx # good to do a clean again as a last step
Things can similarly be done in python. This method is usually easier if you additionally need to call other transforms. An example is given below which also calls the GemmToMatMul
converter:
model = ModelWrapper('filename.onnx')
model = qonnx.util.cleanup.cleanup_model(model)
model = model.transform(ConvertToChannelsLastAndClean())
model = model.transform(GemmToMatMul())
model = qonnx.util.cleanup.cleanup_model(model)
ModelWrapper
is defined in qonnx.core.modelwrapper
. More information on the qonnx
package can be found at the QONNX documentation page.
The next steps are very similar to if you are using a Keras model:
config = hls4ml.utils.config.config_from_onnx_model(
model, granularity='name', backend='Vitis', default_precision='fixed<16,6>'
)
# modify the config as desired
hls_model = hls4ml.converters.convert_from_onnx_model(
model,
output_dir='my-hls-test',
io_type='io_stream',
backend='Vitis',
hls_config=config,
)
hls_model.compile()
Note, unlike the Keras version, “name” granularity is the default for config_from_onnx_model
, and it must be used for QONNX models. Unquantized ONNX models can use “model” if so desired, but generally there is no benefit.
One can subsequently call the predict
function to check the performance or build the project.
Note that execute_onnx
in qonnx.core.onnx_exec
can be use to run the QONNX graphs directly, and it also provides the values at intermediate layers for validating the model (tracing).
Quant nodes
Documentation for quant nodes is provided in the qonnx package. Note that currently hls4ml only supports the Quant operator. Also, not all legal Quant
configurations are parsable by hls4ml or synthesizable. The scale
, zeropt
, and bitwidth
values must be constant (though not necessarily scalar for the scale
and zeropt
).
Generally if the zeropt
is 0 and the scale
is a scalar power of 2, hls4ml uses ap_fixed
or ac_fixed
types (depending on the backend) to represent the quantizations. In other cases, the scale
and zeropt
need to be explicitly handled by hls4ml, and there is more of a chance of hls4ml not being able to process the input. (Please report any issues that you find.)