hls4ml.optimization.dsp_aware_pruning package
Subpackages
- hls4ml.optimization.dsp_aware_pruning.keras package
- Submodules
- hls4ml.optimization.dsp_aware_pruning.keras.builder module
- hls4ml.optimization.dsp_aware_pruning.keras.config module
- hls4ml.optimization.dsp_aware_pruning.keras.masking module
- hls4ml.optimization.dsp_aware_pruning.keras.reduction module
- hls4ml.optimization.dsp_aware_pruning.keras.regularizers module
- hls4ml.optimization.dsp_aware_pruning.keras.utils module
- Module contents
- hls4ml.optimization.dsp_aware_pruning.objectives package
Submodules
hls4ml.optimization.dsp_aware_pruning.attributes module
- class hls4ml.optimization.dsp_aware_pruning.attributes.LayerAttributes(name, layer_type, inbound_layers, weight_shape, input_shape, output_shape, optimizable, optimization_attributes, args)
Bases:
object
A class for storing layer information
- Parameters:
name (string) – Layer name
layer_type (keras.Layer) – Layer type (e.g. Dense, Conv2D etc.)
inbound_layers (list) – List of parent nodes, identified by name
weight_shape (tuple) – Layer weight shape
input_shape (tuple) – Layer input shape
output_shape (tuple) – Layer output shape
optimizable (bool) – Should optimizations (pruning, weight sharing) be applied to this layer
optimization_attributes (OptimizationAttributes) – Type of optimization, pruning or weight sharing, block shape and pattern offset
args (dict) – Additional information, e.g. hls4mlAttributes; dictionary so it can be generic enough for different platforms
- update_args(updates)
- class hls4ml.optimization.dsp_aware_pruning.attributes.OptimizationAttributes(structure_type=SUPPORTED_STRUCTURES.UNSTRUCTURED, pruning=False, weight_sharing=False, block_shape=(1, 1), pattern_offset=1, consecutive_patterns=1)
Bases:
object
A class for storing layer optimization attributes
- Parameters:
structure_type (enum) – Targeted structure - unstructured, structured, pattern, block
pruning (boolean) – Should pruning be applied to the layer
weight_sharing (boolean) – Should weight sharing be applied to the layer
block_shape (tuple) – Block shape if structure_type == block
pattern_offset (int) – Length of each pattern if structure_type == pattern
consecutive_patterns (int) – How many consecutive patterns are grouped together if structure_type == pattern
Notes
In the case of hls4ml, pattern_offset is equivalent to the number of weights processed in parallel
The pattern_offset is n_in * n_out / reuse_factor; default case (=1) is equivalent to no unrolling
- hls4ml.optimization.dsp_aware_pruning.attributes.get_attributes_from_keras_model(model)
Given a Keras model, builds a dictionary of class attributes Additional arguments (e.g. reuse factor), depend on the target hardware platform and are inserted later Per-layer pruning sype (structured, pattern etc.), depend on the pruning objective and are inserted later
- Parameters:
model (keras.model) – Model to extract attributes from
- Returns:
Each key corresponds to a layer name, values are instances of LayerAttribute
- Return type:
model_attributes (dict)
- hls4ml.optimization.dsp_aware_pruning.attributes.get_attributes_from_keras_model_and_hls4ml_config(model, config)
Given a Keras model and hls4ml configuration, builds a dictionary of class attributes Per-layer pruning sype (structured, pruning etc.), depend on the pruning objective and are inserted later
- Parameters:
model (keras.model) – Model to extract attributes from
config (dict) – hls4ml dictionary
- Returns:
Each key corresponds to a layer name, values are LayerAttribute instances
- Return type:
model_attributes (dict)
- class hls4ml.optimization.dsp_aware_pruning.attributes.hls4mlAttributes(n_in, n_out, io_type, strategy, weight_precision, output_precision, reuse_factor, parallelization_factor=1)
Bases:
object
A class for storing hls4ml information of a single layer
- Parameters:
n_in (int) – Number of inputs (rows) for Dense matrix multiplication
n_out (int) – Number of outputs (cols) for Dense matrix multiplication
io_type (string) – io_parallel or io_stream
strategy (string) – Resource or Latency
weight_precision (FixedPrecisionType) – Layer weight precision
output_precision (FixedPrecisionType) – Layer output precision
reuse_factor (int) – Layer reuse factor
parallelization_factor (int) – Layer parallelization factor - [applicable to io_parallel Conv2D]
hls4ml.optimization.dsp_aware_pruning.config module
hls4ml.optimization.dsp_aware_pruning.knapsack module
- hls4ml.optimization.dsp_aware_pruning.knapsack.solve_knapsack(values, weights, capacity, implementation='CBC_MIP', **kwargs)
A function for solving the Knapsack problem
- Parameters:
values (-) – A one-dimensional array, where each entry is the value of an item
weights (-) – An matrix, each row represents the weights of every item, in a given knapsack
capacity (-) – A one-dimensional array, each entry is the maximum weights of a Knapsack
implementation (-) – Algorithm to solve Knapsack problem - dynamic programming, greedy, branch and bound
time_limit (-) – Limit (in seconds) after which the CBC or Branch & Bound should stop looking for a solution and return optimal so far
scaling_factor (-) – Scaling factor for floating points values in CBC or B&B
- Returns:
tuple containing
optimal_value (float): The optimal values of elements in the knapsack
selected_items (list): A list of indices, corresponding to the selected elements
Notes
- The general formulation of the Knapsack problem for N items and M knapsacks is:
max v.T @ x s.t. A @ x <= W v ~ (N, 1) x ~ (N, 1) A ~ (M, N) W ~ (M, 1) x_{i, j} = {0, 1} and <= is the generalized, element-wise inequlaity for vectors
- Supported implementations:
- Dynamic programming:
Optimal solution
Time complexity: O(nW)
Suitable for single-dimensional constraints and a medium number of items, with integer weights
- Branch and bound:
Optimal
Solved using Google OR-Tools
Suitable for multi-dimensional constraints and a large number of items
- Branch and bound:
Solution sub-optimal, but often better than greeedy
Solved using Google OR-Tools, with the CBC MIP Solver
Suitable for multi-dimensional constraints and a very high number of items
- Greedy:
Solution sub-optimal
Time complexity: O(mn)
Suitable for highly dimensional constraints or a very high number of items
- Most implementations require integer values of weights and capacities;
For pruning & weight sharing this is never a problem In case non-integer weights and capacities are requires, All of the values should be scaled by an appropriate scaling factor
hls4ml.optimization.dsp_aware_pruning.scheduler module
- class hls4ml.optimization.dsp_aware_pruning.scheduler.BinaryScheduler(initial_sparsity=0, final_sparsity=1.0, threshold=0.01)
Bases:
OptimizationScheduler
Sparsity updated by binary halving the search space; constantly updates lower and upper bounds In the update step, sparsity is incremented, as the midpoint between previous sparsity and target sparsity (upper bound) In the repair step, sparsity is decrement, as the midpoint between between the lower bound and previous sparsity
- repair_step()
Method used when the neural architecture does not meet satisfy performance requirement for a given sparsity. Then, the target sparsity is decreased according to the rule.
Examples
ConstantScheduler, sparsity = 0.5, increment = 0.05 -> sparsity = 0.55 [see ConstantScheduler for explanation]
BinaryScheduler, sparsity = 0.75, target = 1.0, previous = 0.5 -> sparsity = (0.5 + 0.75) / 2 = 0.625
- Returns:
tuple containing
updated (boolean) - Has the sparsity changed? If not, the optimization algorithm can stop
sparsity (float) - Updated sparsity
- update_step()
Increments the current sparsity, according to the rule.
Examples
ConstantScheduler, sparsity = 0.5, increment = 0.05 -> sparsity = 0.55
BinaryScheduler, sparsity = 0.5, target = 1.0 -> sparsity = 0.75
- Returns:
tuple containing
updated (boolean) - Has the sparsity changed? If not, the optimization algorithm can stop
sparsity (float) - Updated sparsity
- class hls4ml.optimization.dsp_aware_pruning.scheduler.ConstantScheduler(initial_sparsity=0, final_sparsity=1.0, update_step=0.05)
Bases:
OptimizationScheduler
- Sparsity updated by a constant term, until
sparsity target reached OR
optimization algorithm stops requesting state updates
- repair_step()
Method used when the neural architecture does not meet satisfy performance requirement for a given sparsity. Then, the target sparsity is decreased according to the rule.
Examples
ConstantScheduler, sparsity = 0.5, increment = 0.05 -> sparsity = 0.55 [see ConstantScheduler for explanation]
BinaryScheduler, sparsity = 0.75, target = 1.0, previous = 0.5 -> sparsity = (0.5 + 0.75) / 2 = 0.625
- Returns:
tuple containing
updated (boolean) - Has the sparsity changed? If not, the optimization algorithm can stop
sparsity (float) - Updated sparsity
- update_step()
Increments the current sparsity, according to the rule.
Examples
ConstantScheduler, sparsity = 0.5, increment = 0.05 -> sparsity = 0.55
BinaryScheduler, sparsity = 0.5, target = 1.0 -> sparsity = 0.75
- Returns:
tuple containing
updated (boolean) - Has the sparsity changed? If not, the optimization algorithm can stop
sparsity (float) - Updated sparsity
- class hls4ml.optimization.dsp_aware_pruning.scheduler.OptimizationScheduler(initial_sparsity=0, final_sparsity=1)
Bases:
ABC
Baseline class handling logic regarding target sparsity and its updates at every step
- get_sparsity()
- abstract repair_step()
Method used when the neural architecture does not meet satisfy performance requirement for a given sparsity. Then, the target sparsity is decreased according to the rule.
Examples
ConstantScheduler, sparsity = 0.5, increment = 0.05 -> sparsity = 0.55 [see ConstantScheduler for explanation]
BinaryScheduler, sparsity = 0.75, target = 1.0, previous = 0.5 -> sparsity = (0.5 + 0.75) / 2 = 0.625
- Returns:
tuple containing
updated (boolean) - Has the sparsity changed? If not, the optimization algorithm can stop
sparsity (float) - Updated sparsity
- abstract update_step()
Increments the current sparsity, according to the rule.
Examples
ConstantScheduler, sparsity = 0.5, increment = 0.05 -> sparsity = 0.55
BinaryScheduler, sparsity = 0.5, target = 1.0 -> sparsity = 0.75
- Returns:
tuple containing
updated (boolean) - Has the sparsity changed? If not, the optimization algorithm can stop
sparsity (float) - Updated sparsity
- class hls4ml.optimization.dsp_aware_pruning.scheduler.PolynomialScheduler(maximum_steps, initial_sparsity=0, final_sparsity=1.0, decay_power=3)
Bases:
OptimizationScheduler
- Sparsity updated by at a polynomial decay, until
sparsity target reached OR
optimization algorithm stops requesting state updates
- For more information, see Zhu & Gupta (2016) -
‘To prune, or not to prune: exploring the efficacy of pruning for model compression’
Note, the implementation is slightly different, since TensorFlow Prune API depends on the total number of epochs and update frequency.
In certain cases, a model might underperform at the current sparsity level, but perform better at a higher sparsity. In this case, polynomial sparsity will simply jump to the next sparsity level The model’s performance over several sparsity levels optimization is tracked and toped after high loss over several trials (see top level pruning/optimization function)
- repair_step()
Method used when the neural architecture does not meet satisfy performance requirement for a given sparsity. Then, the target sparsity is decreased according to the rule.
Examples
ConstantScheduler, sparsity = 0.5, increment = 0.05 -> sparsity = 0.55 [see ConstantScheduler for explanation]
BinaryScheduler, sparsity = 0.75, target = 1.0, previous = 0.5 -> sparsity = (0.5 + 0.75) / 2 = 0.625
- Returns:
tuple containing
updated (boolean) - Has the sparsity changed? If not, the optimization algorithm can stop
sparsity (float) - Updated sparsity
- update_step()
Increments the current sparsity, according to the rule.
Examples
ConstantScheduler, sparsity = 0.5, increment = 0.05 -> sparsity = 0.55
BinaryScheduler, sparsity = 0.5, target = 1.0 -> sparsity = 0.75
- Returns:
tuple containing
updated (boolean) - Has the sparsity changed? If not, the optimization algorithm can stop
sparsity (float) - Updated sparsity
Module contents
- hls4ml.optimization.dsp_aware_pruning.optimize_keras_model_for_hls4ml(keras_model, hls_config, objective, scheduler, X_train, y_train, X_val, y_val, batch_size, epochs, optimizer, loss_fn, validation_metric, increasing, rtol, callbacks=None, ranking_metric='l1', local=False, verbose=False, rewinding_epochs=1, cutoff_bad_trials=3, directory='hls4ml-optimization', tuner='Bayesian', knapsack_solver='CBC_MIP', regularization_range=[1e-06, 1.8478497974222906e-06, 3.414548873833601e-06, 6.30957344480193e-06, 1.165914401179831e-05, 2.1544346900318823e-05, 3.9810717055349695e-05, 7.356422544596421e-05, 0.00013593563908785255, 0.00025118864315095795, 0.00046415888336127773, 0.0008576958985908938, 0.001584893192461114, 0.0029286445646252374, 0.0054116952654646375, 0.01])
Top-level function for optimizing a Keras model, given hls4ml config and a hardware objective(s)
- Parameters:
keras_model (keras.Model) – Model to be optimized
hls_config (dict) – hls4ml configuration, obtained from hls4ml.utils.config.config_from_keras_model(…)
objective (hls4ml.optimization.objectives.ObjectiveEstimator)
Parameter
optimization (hardware or user-defined objective of)
scheduler (Sparsity)
scheduler
constant (choose between)
binary (polynomial and)
X_train (np.array) – Training inputs
y_train (np.array) – Training labels
X_val (np.array) – Validation inputs
y_val (np.array) – Validation labels
batch_size (int) – Batch size during training
epochs (int) – Maximum number of epochs to fine-tune model, in one iteration of pruning
optimizer (keras.optimizers.Optimizer or equivalent-string description) – Optimizer used during training
loss_fn (keras.losses.Loss or equivalent loss description) – Loss function used during training
validation_metric (keras.metrics.Metric or equivalent loss description) – Validation metric, used as a baseline
increasing (boolean) – If the metric improves with increased values; e.g. accuracy -> increasing = True, MSE -> increasing = False
rtol (float) – Relative tolerance; pruning stops when pruned_validation_metric < (or >) rtol * baseline_validation_metric
callbacks (list of keras.callbacks.Callback)
ranking_metric (string) – Metric used for ranking weights and structures; currently supported l1, l2, saliency and Oracle
local (boolean) – Layer-wise or global pruning
verbose (boolean) – Display debug logs during model optimization
rewinding_epochs (int) – Number of epochs to retrain model without weight freezing, allows regrowth of previously pruned weights
cutoff_bad_trials (int) – After how many bad trials (performance below threshold), should model pruning / weight sharing stop
directory (string) – Directory to store temporary results
tuner (str) – Tuning algorithm, choose between Bayesian, Hyperband and None
knapsack_solver (str) – Algorithm to solve Knapsack problem when optimizing; default usually works well; for very large networks, greedy algorithm might be more suitable
regularization_range (list) – List of suitable hyperparameters for weight decay
- Returns:
Optimized model
- Return type:
keras.Model