VivadoAccelerator
The VivadoAccelerator backend of hls4ml leverages the PYNQ software stack to easily deploy models on supported devices.
Currently hls4ml supports the following boards:
pynq-z2 (part:
xc7z020clg400-1)zcu102 (part:
xczu9eg-ffvb1156-2-e)alveo-u50 (part:
xcu50-fsvh2104-2-e)alveo-u250 (part:
xcu250-figd2104-2L-e)alveo-u200 (part:
xcu200-fsgd2104-2-e)alveo-u280 (part:
xcu280-fsvh2892-2L-e)
but, in principle, support can be extended to any board supported by PYNQ. For the Zynq-based boards, there are two components: an ARM-based processing system (PS) and FPGA-based programmable logic (PL), with various interfaces between the two.
Neural Network Overlay
In the PYNQ project, programmable logic circuits are presented as hardware libraries called overlays.
The overlay can be accessed through a Python API.
In hls4ml, we create a custom neural network overlay, which sends and receives data via AXI stream.
The target device is programmed using a bitfile that is generated by the VivadoAccelerator backend.
Example
This example is taken from part 7 of the hls4ml tutorial.
Specifically, we’ll deploy a model on a pynq-z2 board.
First, we generate the bitfile from a Keras model model and a config.
import hls4ml
config = hls4ml.utils.config_from_keras_model(model, granularity='name')
hls_model = hls4ml.converters.convert_from_keras_model(model,
hls_config=config,
output_dir='hls4ml_prj_pynq',
backend='VivadoAccelerator',
board='pynq-z2')
hls4ml.build(bitfile=True)
After this command completes, we will need to package up the bitfile, hardware handoff, and Python driver to copy to the PS of the board.
mkdir -p package
cp hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.runs/impl_1/design_1_wrapper.bit package/hls4ml_nn.bit
cp hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.srcs/sources_1/bd/design_1/hw_handoff/design_1.hwh package/hls4ml_nn.hwh
cp hls4ml_prj_pynq/axi_stream_driver.py package/
tar -czvf package.tar.gz -C package/ .
Then we can copy this package to the PS of the board and untar it.
Finally, on the PS in Python we can create a NeuralNetworkOverlay object, which will download the bitfile onto the PL of the board.
We also must provide the shapes of our input and output data, X_test.shape and y_test.shape, respectively, to allocate the buffers for the data transfer.
The predict method will send the input data to the PL and return the output data y_hw.
from axi_stream_driver import NeuralNetworkOverlay
nn = NeuralNetworkOverlay('hls4ml_nn.bit', X_test.shape, y_test.shape)
y_hw, latency, throughput = nn.predict(X_test, profile=True)