Distributed Arithmetic

Distributed Arithmetic (DA) is a strategy for constant-matrix-vector multiplication (CMVM) operations used in hls4ml. The implementation is provided by an external library, da4ml, which can be installed with pip install hls4ml[da]. The library transforms the CMVM operations into an adder graph with common subexpression elimations to reduce the overall complexity. As the CMVM operation is fully unrolled, reuse_factor must be 1 (by default) for the corresponding CMVM operations [*]. Comparing to the traditional Latency strategy CMVM kernels, DA can usually reduce up to 30% of the LUTs and all DSPs used.

When the DA strategy is used, the CMVM operations will be implemented bit-exactly, and the accumulator precision setting will not be used.

Currently, the DA strategy is only available for the Vivado/Vitis HLS backends. The following layers are supported: * Dense * Convolutional (1D, 2D) * EinsumDense * Multi-head attention (implemented as multiple EinsumDense layers)

While possible, the RNN layers are not yet supported by the DA strategy.

For more details, please refer to the da4ml repository or the paper.