VivadoAccelerator

The VivadoAccelerator backend of hls4ml leverages the PYNQ software stack to easily deploy models on supported devices. Currently hls4ml supports the following boards:

pynq-z2 (part: xc7z020clg400-1)
zcu102 (part: xczu9eg-ffvb1156-2-e)
alveo-u50 (part: xcu50-fsvh2104-2-e)
alveo-u250 (part: xcu250-figd2104-2L-e)
alveo-u200 (part: xcu200-fsgd2104-2-e)
alveo-u280 (part: xcu280-fsvh2892-2L-e)

but, in principle, support can be extended to any board supported by PYNQ. For the Zynq-based boards, there are two components: an ARM-based processing system (PS) and FPGA-based programmable logic (PL), with various interfaces between the two.

Neural Network Overlay

In the PYNQ project, programmable logic circuits are presented as hardware libraries called overlays. The overlay can be accessed through a Python API. In hls4ml, we create a custom neural network overlay, which sends and receives data via AXI stream. The target device is programmed using a bitfile that is generated by the VivadoAccelerator backend.

Example

This example is taken from part 7 of the hls4ml tutorial. Specifically, we’ll deploy a model on a pynq-z2 board.

First, we generate the bitfile from a Keras model model and a config.

import hls4ml
config = hls4ml.utils.config_from_keras_model(model, granularity='name')
hls_model = hls4ml.converters.convert_from_keras_model(model,
                                                       hls_config=config,
                                                       output_dir='hls4ml_prj_pynq',
                                                       backend='VivadoAccelerator',
                                                       board='pynq-z2')
hls4ml.build(bitfile=True)

After this command completes, we will need to package up the bitfile, hardware handoff, and Python driver to copy to the PS of the board.

mkdir -p package
cp hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.runs/impl_1/design_1_wrapper.bit package/hls4ml_nn.bit
cp hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.srcs/sources_1/bd/design_1/hw_handoff/design_1.hwh package/hls4ml_nn.hwh
cp hls4ml_prj_pynq/axi_stream_driver.py package/
tar -czvf package.tar.gz -C package/ .

Then we can copy this package to the PS of the board and untar it.

Finally, on the PS in Python we can create a NeuralNetworkOverlay object, which will download the bitfile onto the PL of the board. We also must provide the shapes of our input and output data, X_test.shape and y_test.shape, respectively, to allocate the buffers for the data transfer. The predict method will send the input data to the PL and return the output data y_hw.

from axi_stream_driver import NeuralNetworkOverlay

nn = NeuralNetworkOverlay('hls4ml_nn.bit', X_test.shape, y_test.shape)
y_hw, latency, throughput = nn.predict(X_test, profile=True)