Part 1: Getting started#
from tensorflow.keras.utils import to_categorical
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
import numpy as np
%matplotlib inline
seed = 0
np.random.seed(seed)
import tensorflow as tf
tf.random.set_seed(seed)
import os
os.environ['PATH'] = os.environ['XILINX_VITIS'] + '/bin:' + os.environ['PATH']
2025-01-10 14:12:40.527141: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[1], line 15
12 tf.random.set_seed(seed)
13 import os
---> 15 os.environ['PATH'] = os.environ['XILINX_VITIS'] + '/bin:' + os.environ['PATH']
File ~/miniconda3/envs/hls4ml-tutorial/lib/python3.10/os.py:680, in _Environ.__getitem__(self, key)
677 value = self._data[self.encodekey(key)]
678 except KeyError:
679 # raise KeyError with the original key value
--> 680 raise KeyError(key) from None
681 return self.decodevalue(value)
KeyError: 'XILINX_VITIS'
Fetch the jet tagging dataset from Open ML#
data = fetch_openml('hls4ml_lhc_jets_hlf')
X, y = data['data'], data['target']
Let’s print some information about the dataset#
Print the feature names and the dataset shape
print(data['feature_names'])
print(X.shape, y.shape)
print(X[:5])
print(y[:5])
As you saw above, the y
target is an array of strings, e.g. [‘g’, ‘w’,…] etc.
We need to make this a “One Hot” encoding for the training.
Then, split the dataset into training and validation sets
le = LabelEncoder()
y = le.fit_transform(y)
y = to_categorical(y, 5)
X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(y[:5])
scaler = StandardScaler()
X_train_val = scaler.fit_transform(X_train_val)
X_test = scaler.transform(X_test)
np.save('X_train_val.npy', X_train_val)
np.save('X_test.npy', X_test)
np.save('y_train_val.npy', y_train_val)
np.save('y_test.npy', y_test)
np.save('classes.npy', le.classes_)
Now construct a model#
We’ll use 3 hidden layers with 64, then 32, then 32 neurons. Each layer will use relu
activation.
Add an output layer with 5 neurons (one for each class), then finish with Softmax activation.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l1
from callbacks import all_callbacks
model = Sequential()
model.add(Dense(64, input_shape=(16,), name='fc1', kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(Activation(activation='relu', name='relu1'))
model.add(Dense(32, name='fc2', kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(Activation(activation='relu', name='relu2'))
model.add(Dense(32, name='fc3', kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(Activation(activation='relu', name='relu3'))
model.add(Dense(5, name='output', kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(Activation(activation='softmax', name='softmax'))
Train the model#
We’ll use Adam optimizer with categorical crossentropy loss.
The callbacks will decay the learning rate and save the model into a directory ‘model_1’
The model isn’t very complex, so this should just take a few minutes even on the CPU.
If you’ve restarted the notebook kernel after training once, set train = False
to load the trained model.
train = True
if train:
adam = Adam(lr=0.0001)
model.compile(optimizer=adam, loss=['categorical_crossentropy'], metrics=['accuracy'])
callbacks = all_callbacks(
stop_patience=1000,
lr_factor=0.5,
lr_patience=10,
lr_epsilon=0.000001,
lr_cooldown=2,
lr_minimum=0.0000001,
outputDir='model_1',
)
model.fit(
X_train_val,
y_train_val,
batch_size=1024,
epochs=10,
validation_split=0.25,
shuffle=True,
callbacks=callbacks.callbacks,
)
else:
from tensorflow.keras.models import load_model
model = load_model('model_1/KERAS_check_best_model.h5')
Check performance#
Check the accuracy and make a ROC curve
import plotting
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
y_keras = model.predict(X_test)
print("Accuracy: {}".format(accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_keras, axis=1))))
plt.figure(figsize=(9, 9))
_ = plotting.makeRoc(y_test, y_keras, le.classes_)
Convert the model to FPGA firmware with hls4ml#
Now we will go through the steps to convert the model we trained to a low-latency optimized FPGA firmware with hls4ml. First, we will evaluate its classification performance to make sure we haven’t lost accuracy using the fixed-point data types. Then we will synthesize the model with Vitis HLS and check the metrics of latency and FPGA resource usage.
Make an hls4ml config & model#
The hls4ml Neural Network inference library is controlled through a configuration dictionary. In this example we’ll use the most simple variation, later exercises will look at more advanced configuration.
import hls4ml
config = hls4ml.utils.config_from_keras_model(model, granularity='model', backend='Vitis')
print("-----------------------------------")
print("Configuration")
plotting.print_dict(config)
print("-----------------------------------")
hls_model = hls4ml.converters.convert_from_keras_model(
model, hls_config=config, backend='Vitis', output_dir='model_1/hls4ml_prj', part='xcu250-figd2104-2L-e'
)
Let’s visualise what we created. The model architecture is shown, annotated with the shape and data types
hls4ml.utils.plot_model(hls_model, show_shapes=True, show_precision=True, to_file=None)
Compile, predict#
Now we need to check that this model performance is still good. We compile the hls_model, and then use hls_model.predict
to execute the FPGA firmware with bit-accurate emulation on the CPU.
hls_model.compile()
X_test = np.ascontiguousarray(X_test)
y_hls = hls_model.predict(X_test)
Compare#
That was easy! Now let’s see how the performance compares to Keras:
print("Keras Accuracy: {}".format(accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_keras, axis=1))))
print("hls4ml Accuracy: {}".format(accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_hls, axis=1))))
fig, ax = plt.subplots(figsize=(9, 9))
_ = plotting.makeRoc(y_test, y_keras, le.classes_)
plt.gca().set_prop_cycle(None) # reset the colors
_ = plotting.makeRoc(y_test, y_hls, le.classes_, linestyle='--')
from matplotlib.lines import Line2D
lines = [Line2D([0], [0], ls='-'), Line2D([0], [0], ls='--')]
from matplotlib.legend import Legend
leg = Legend(ax, lines, labels=['keras', 'hls4ml'], loc='lower right', frameon=False)
ax.add_artist(leg)
Synthesize#
Now we’ll actually use Vitis HLS to synthesize the model. We can run the build using a method of our hls_model
object.
After running this step, we can integrate the generated IP into a workflow to compile for a specific FPGA board.
In this case, we’ll just review the reports that Vitis HLS generates, checking the latency and resource usage.
This can take several minutes.
While the C-Synthesis is running, we can monitor the progress looking at the log file by opening a terminal from the notebook home, and executing:
tail -f model_1/hls4ml_prj/vitis_hls.log
hls_model.build(csim=False)
Check the reports#
Print out the reports generated by Vitis HLS. Pay attention to the Latency and the ‘Utilization Estimates’ sections
hls4ml.report.read_vivado_report('model_1/hls4ml_prj/')
Exercise#
Since ReuseFactor = 1
we expect each multiplication used in the inference of our neural network to use 1 DSP. Is this what we see? (Note that the Softmax layer should use 5 DSPs, or 1 per class)
Calculate how many multiplications are performed for the inference of this network…
(We’ll discuss the outcome)