Configuration 

We currently support two ways of setting hls4ml’s model configuration. This page documents both methods’ usage.

The Python API approach is recommended for most users as there are more utilities to help create the configuration dictionaries.

NOTE:

One important part of hls4ml to remember is that the user is responsible for the format of the inputs. There is no automatic formatting or normalization so this must be done in the training.

1. Python API 

Using hls4ml, you can quickly generate a simple configuration dictionary from a keras model:

import hls4ml
config = hls4ml.utils.config_from_keras_model(model, granularity='model')

This python dictionary can be edited as needed. More advanced configuration can be generated by, for example for ONNX models:

import hls4ml
config = hls4ml.utils.config_from_onnx_model(
     model,
     granularity='name',
     default_precision='fixed<16,6>',
     backend='Vitis')

for Keras models:

import hls4ml
config = hls4ml.utils.config_from_keras_model(
     model,
     granularity='name',
     default_precision='fixed<16,6>',
     backend='oneAPI')

or for PyTorch models:

import hls4ml
config = hls4ml.utils.config_from_pytorch_model(
     model,
     granularity='name',
     default_precision='fixed<16,6>',
     backend='Catapult')

The name granularity includes per-layer configuration based on the model. A 'name' granularity is generally recommended because it allows for more turning, and also because it allows for automatic setting of precisions. The layer-level precisions with the 'name' granularity default to 'auto', which means that hls4ml will try to set it automatically (see Automatic precision inference). Note that layer-level settings take precedence over model-level settings. A 'name' granularity is required for QKeras and QONNX model parsing. Passing the backend to these functions is recommended because some configuration options depend on the backend. See config_from_keras_model and similar for more information on the various options. Note specifically the documentation of config_from_pytorch_model on how to handle differences in input data formats between pytorch and keras (hls4ml follows keras conventions internally).

Warning

Note that passing precision configurations when invoking the full model precision propagation (by default for HGQ/HGQ2 models, or when bit_exact=True is set for other frontends) is not needed and should not be done without understanding the implications.

One can override specific values before using the configuration:

config['LayerName']['fc1']['ReuseFactor'] = 2

Or to set the precision of a specific layer’s weight:

config['LayerName']['fc1']['Precision']['weight'] = 'fixed<8,4>'

To better understand how the configuration hierachy works, refer to the next section for more details.

Finally, one then uses the configuration to create an hls model:

hls_model = hls4ml.converters.convert_from_keras_model(
      model,
      hls_config=config,
      output_dir="my_project_dir",
      io_type='io_stream',
      backend='Vitis'
  )

See convert_from_keras_model for more information on the various options. Similar functions exist for ONNX and PyTorch.

2. YAML Configuration file 

2.1 Top Level Configuration 

One can also use YAML configuration files in hls4ml (*.yml). An example configuration file is here.

It looks like this:

# Project section
OutputDir: my-hls-test
ProjectName: myproject

# Model section (Keras model)
KerasJson: keras/KERAS_3layer.json
KerasH5:   keras/KERAS_3layer_weights.h5 #You can also use h5 file from Keras's model.save() without supplying json file.
InputData: keras/KERAS_3layer_input_features.dat
OutputPredictions: keras/KERAS_3layer_predictions.dat

# Backend section (Vivado backend)
Part: xcvu13p-flga2577-2-e
ClockPeriod: 5
IOType: io_parallel # options: io_parallel/io_stream

HLSConfig:
  Model:
    Precision: fixed<16,6>
    ReuseFactor: 1
    Strategy: Latency
  LayerType:
    Dense:
      ReuseFactor: 2
      Strategy: Resource
      Compression: True

There are a number of configuration options that you have. Let’s go through them. You have basic setup parameters:

OutputDir: the output directory where you want your HLS project to appear
ProjectName: the name of the HLS project IP that is produced
KerasJson/KerasH5: for Keras, the model architecture and weights are stored in a json and h5 file. The path to those files are required here. We also support keras model’s file obtained just from model.save(). In this case you can just supply the h5 file in KerasH5: field.
InputData/OutputPredictions: path to your input/predictions of the model. If none is supplied, then hls4ml will create artificial data for simulation. The data used above in the example can be found here. We also support npy data files. We welcome suggestions on more input data types to support.

The backend-specific section of the configuration depends on the backend. You can get a starting point for the necessary settings using, for example hls4ml.templates.get_backend(‘Vivado’).create_initial_config(). For Vivado backend the options are:

Part: the particular FPGA part number that you are considering, here it’s a Xilinx Virtex UltraScale+ VU13P FPGA
ClockPeriod: the clock period, in ns, at which your algorithm runs Then you have some optimization parameters for how your algorithm runs:
IOType: your options are io_parallel or io_stream which defines how data is transferred into and out of the HLS model IP, and how the data is transferred between layers. For io_parallel, data are directly wired between layers fully in parallel. For io_stream, HLS streams are used, which instantiates as stateful FIFO buffers, effectively decouples the producer and consumer (upstream and downstream in a neural network) and removing the need of a global state machine coordinating the exact timing for io operations. This is particular useful with the DATAFLOW pipeline style. For more information, see here. * HLSConfig: the detailed configuration of precision and parallelism, including:
- ReuseFactor: in the case that you are pipelining, this defines the pipeline interval or initiation interval
- ParallelizationFactor: The number of output “pixels” to compute in parallel in convolutional layers. Increasing this parameter results in significant increase in resources required on the FPGA.
- Strategy: Optimization strategy on FPGA, either “Latency”, “Resource”, “distributed_arithmetic” (or “da”), or “Unrolled”. If none is supplied then hl4ml uses “Latency” as default. Note that a reuse factor must be 1 if using “distributed_arithmetic”, and should be larger than 1 when using “resource” or “unrolled” strategy.
- PipelineStyle: Set the top level pipeline style. Valid options are “auto”, “pipeline” and “dataflow”. If unspecified, it defaults to “auto”.
- PipelineInterval: Optionally override the desired initiation interval of the design. Only valid in combination with “pipeline” style. If unspecified, it is left to the compiler to decide, ideally matching the largest reuse factor of the network.
- Precision: this defines the precision of your inputs, outputs, weights and biases. It is denoted by fixed<X,Y>, where Y is the number of bits representing the signed number above the binary point (i.e. the integer part), and X is the total number of bits. Additionally, integers in the type (int<N>, where N is a bit-size from 1 to 1024) can also be used. The format follows ap_fixed and ap_int conventions. You have a chance to further configure this more finely with per-layer configuration described below. In the per-layer configuration (but not globally) one can also use 'auto' precision.

2.2 Per-Layer Configuration 

In the hls4ml configuration file, it is possible to specify the model Precision and ReuseFactor with finer granularity.

Under the HLSConfig heading, these can be set for the Model, per LayerType, per LayerName, and for named variables within the layer (for precision only). The most basic configuration may look like this:

HLSConfig:
  Model:
    Precision: fixed<16,6>
    ReuseFactor: 1

This configuration use fixed<16,6> for every variable and a ReuseFactor of 1 throughout.

Specify all Dense layers to use a different precision like this:

HLSConfig:
  Model:
    Precision: fixed<16,6>
    ReuseFactor: 1
  LayerType:
    Dense:
      Precision: fixed<14,5>

In this case, all variables in any Dense layers will be represented with fixed<14,5> while any other layer types will use fixed<16,6>.

A specific layer can be targeted like this:

HLSConfig:
   Model:
     Precision: fixed<16,6>
     ReuseFactor: 16
   LayerName:
     dense1:
       Precision:
         weight: fixed<14,2>
         bias: fixed<14,4>
         result: fixed<16,6>
       ReuseFactor: 12
       Strategy: Resource

In this case, the default model configuration will use fixed<16,6> and a ReuseFactor of 16. The layer named dense1 (defined in the user provided model architecture file) will instead use different precision for the weight, bias, and result (output) variables, a ReuseFactor of 12, and the Resource strategy (while the model default is Latency strategy.

More than one layer can have a configuration specified, e.g.:

HLSConfig:
  Model:
   ...
  LayerName:
    dense1:
       ...
    batchnormalization1:
       ...
    dense2:
       ...

For more information on the optimization parameters and what they mean, you can visit the Concepts section.

Detailed Configuration in Converted HLS Code 

NOTE: this section is developer-oriented.

After you create your project, you have the opportunity to do more configuration if you so choose.

In your project, the file <OutputDir>/firmware/<ProjectName>.cpp is your top level file. It has the network architecture constructed for you. An example is here and the important snippet is:

layer2_t layer2_out[N_LAYER_2];
#pragma HLS ARRAY_PARTITION variable=layer2_out complete dim=0
nnet::dense_latency<input_t, layer2_t, config2>(input_1, layer2_out, w2, b2);

layer3_t layer3_out[N_LAYER_2];
#pragma HLS ARRAY_PARTITION variable=layer3_out complete dim=0
nnet::relu<layer2_t, layer3_t, relu_config3>(layer2_out, layer3_out);

layer4_t layer4_out[N_LAYER_4];
#pragma HLS ARRAY_PARTITION variable=layer4_out complete dim=0
nnet::dense_latency<layer3_t, layer4_t, config4>(layer3_out, layer4_out, w4, b4);

nnet::sigmoid<layer4_t, result_t, sigmoid_config5>(layer4_out, layer5_out);

You can see, for the simple 1-layer DNN, the computation (nnet::dense_latency) and activation (nnet::relu/nnet::sigmoid) calculation for each layer. For each layer, it has its own additional configuration parameters, e.g. config2.

In your project, the file <OutputDir>/firmware/parameters.h stores all the configuration options for each neural network library. An example is here. So for example, the detailed configuration options for an example DNN layer is:

//hls-fpga-machine-learning insert layer-config
struct config2 : nnet::dense_config {
    static const unsigned n_in = N_INPUT_1_1;
    static const unsigned n_out = N_LAYER_2;
    static const unsigned io_type = nnet::io_parallel;
    static const unsigned reuse_factor = 1;
    static const unsigned n_zeros = 0;
    static const unsigned n_nonzeros = 320;
    static const bool store_weights_in_bram = false;
    typedef ap_fixed<16,6> accum_t;
    typedef model_default_t bias_t;
    typedef model_default_t weight_t;
    typedef ap_uint<1> index_t;
};

It is at this stage that a user can even further configure their network HLS implementation in finer detail.

Configuration

1. Python API

2. YAML Configuration file

2.1 Top Level Configuration

2.2 Per-Layer Configuration

Detailed Configuration in Converted HLS Code