VivadoAccelerator Backend

The VivadoAccelerator backend of hls4ml leverages the PYNQ software stack to easily deploy models on supported devices. Currently hls4ml supports the following boards:

but, in principle, support can be extended to any board supported by PYNQ. For the Zynq-based boards, there are two components: an ARM-based processing system (PS) and FPGA-based programmable logic (PL), with various intefaces between the two.

Zynq PL/PS interfaces

Neural Network Overlay

In the PYNQ project, programmable logic circuits are presented as hardware libraries called overlays. The overlay can be accessed through a Python API. In hls4ml, we create a custom neural network overlay, which sends and receives data via AXI stream. The target device is programmed using a bitfile that is generated by the VivadoAccelerator backend.

PYNQ software stack


This example is taken from part 7 of the hls4ml tutorial. Specifically, we’ll deploy a model on a pynq-z2 board.

First, we generate the bitfile from a Keras model model and a config.

import hls4ml
config = hls4ml.utils.config_from_keras_model(model, granularity='name')
hls_model = hls4ml.converters.convert_from_keras_model(model,

After this command completes, we will need to package up the bitfile, hardware handoff, and Python driver to copy to the PS of the board.

mkdir -p package
cp hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.runs/impl_1/design_1_wrapper.bit package/hls4ml_nn.bit
cp hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.srcs/sources_1/bd/design_1/hw_handoff/design_1.hwh package/hls4ml_nn.hwh
cp hls4ml_prj_pynq/ package/
tar -czvf package.tar.gz -C package/ .

Then we can copy this package to the PS of the board and untar it.

Finally, on the PS in Python we can create a NeuralNetworkOverlay object, which will download the bitfile onto the PL of the board. We also must provide the shapes of our input and output data, X_test.shape and y_test.shape, respectively, to allocate the buffers for the data transfer. The predict method will send the input data to the PL and return the output data y_hw.

from axi_stream_driver import NeuralNetworkOverlay

nn = NeuralNetworkOverlay('hls4ml_nn.bit', X_test.shape, y_test.shape)
y_hw, latency, throughput = nn.predict(X_test, profile=True)