High Granularity Quantization (HGQ)
High Granularity Quantization (HGQ) is a library that performs gradient-based automatic bitwidth optimization and quantization-aware training algorithm for neural networks to be deployed on FPGAs. By leveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level.
Conversion of models made with HGQ library is fully supported. The HGQ models are first converted to proxy model format, which can then be parsed by hls4ml bit-accurately. Below is an example of how to create a model with HGQ and convert it to hls4ml model.
import keras
from HGQ.layers import HDense, HDenseBatchNorm, HQuantize
from HGQ import ResetMinMax, FreeBOPs
model = keras.models.Sequential([
HQuantize(beta=1.e-5),
HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
HDense(10, beta=1.e-5),
])
opt = keras.optimizers.Adam(learning_rate=0.001)
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
callbacks = [ResetMinMax(), FreeBOPs()]
model.fit(..., callbacks=callbacks)
from HGQ import trace_minmax, to_proxy_model
from hls4ml.converters import convert_from_keras_model
trace_minmax(model, x_train, cover_factor=1.0)
proxy = to_proxy_model(model, aggressive=True)
model_hls = convert_from_keras_model(proxy, backend='vivado',output_dir=... ,part=...)
An interactive example of HGQ can be found in the kaggle notebook. Full documentation can be found at calad0i.github.io/HGQ.