High Granularity Quantization (HGQ)

High Granularity Quantization (HGQ) is a library that performs gradient-based automatic bitwidth optimization and quantization-aware training algorithm for neural networks to be deployed on FPGAs. By leveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level.

Conversion of models made with HGQ library is fully supported. The HGQ models are first converted to proxy model format, which can then be parsed by hls4ml bit-accurately. Below is an example of how to create a model with HGQ and convert it to hls4ml model.

import keras
from HGQ.layers import HDense, HDenseBatchNorm, HQuantize
from HGQ import ResetMinMax, FreeBOPs

model = keras.models.Sequential([
   HQuantize(beta=1.e-5),
   HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
   HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
   HDense(10, beta=1.e-5),
])

 opt = keras.optimizers.Adam(learning_rate=0.001)
 loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
 model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
 callbacks = [ResetMinMax(), FreeBOPs()]

 model.fit(..., callbacks=callbacks)

 from HGQ import trace_minmax, to_proxy_model
 from hls4ml.converters import convert_from_keras_model

 trace_minmax(model, x_train, cover_factor=1.0)
 proxy = to_proxy_model(model, aggressive=True)

 model_hls = convert_from_keras_model(proxy, backend='vivado',output_dir=... ,part=...)

An interactive example of HGQ can be found in the kaggle notebook. Full documentation can be found at calad0i.github.io/HGQ.