In-Situ High-Speed Computer Vision
Deploying neural networks for in-situ inference on frame grabber FPGAs in high-speed imaging
Medium article
Fusion Paper
Image by: Daisuke Shiraki
Background
Many scientific domains utilize high-speed imaging to aid in experimentation and discovery. From analyzing fusion magneto hydrodynamics, to crystal structure detection in transmission electron microscopy, there is a need for in-situ fast inference in these experiments which operate in the kHz to MHz range.
Strategy
Typically, dedicated PCIe frame grabber devices are paired with high-speed cameras to handle such high throughput, and a protocol such as CoaXPress is used to transmit the raw camera data between the systems. Many frame grabbers implement this protocol as well as additional pixel preprocessing stages on an FPGA device, for which the manufacturer makes the reference design available. Moreover, open-source codesign workflows like hls4ml enable easy translation and deployment of neural networks to FPGA devices, and have demonstrated latencies on the order of nanoseconds to microseconds. We leverage these existing tools to construct a framework for quick neural network deployment to frame grabber FPGAs.
Results
We construct a framework for easy deployment of hls4ml neural networks to the frame grabbers for use in real-time control, data reduction, manufacturing, or other applications. The framework comes complete with two comprehensive tutorials, C/RTL testbenches, and pre-written HDL to enable easy inference latency benchmarking. We have successfully applied this framework to the field of fusion magnetohydrodynamics with more applications in progress.
Wildfire Project
Enabling Embedded Systems and IoT with hls4ml
Background
Wildfires pose an increasingly urgent global threat, as evidenced by recent devastating events in Maui, Hawaii, and across Alaska. To address this challenge, robust and reliable AI-based wildfire detection models are imperative. Our ongoing research has yielded significant advancements in video and image-based wildfire detection and ember detection AI models aimed at early prevention efforts.
Strategy
Recognizing the computational demands of these models, we propose leveraging Field Programmable Gate Arrays (FPGAs) due to their proven flexibility and parallel computation advantages. FPGAs serve as efficient hardware accelerators for deploying deep learning models, ensuring timely and accurate wildfire detection.
Results
To facilitate the integration of our AI detection models onto FPGAs, which have been trained using various frameworks including PyTorch and TensorFlow-Keras, we rely on the pivotal role of hls4ml in implementation. Our project focuses on demonstrating the effectiveness of AI models on FPGAs through the utilization of hls4ml, thereby enabling rapid and efficient wildfire detection and prevention strategies.
High Speed Camera+4D TEM
Enabling Material Science with hls4ml
Image by: Joshua Agar
Background
4D Scanning Transmission Electron Microscopy (4D-STEM) is a powerful technique for atomic resolution imaging. One common imaging mode captures 2D diffraction images at each pixel position in real space. The direct electron detectors used can reach 4K resolution at frame rates up to 5000 frames-per-second. This has led to orders of magnitude increase in the volume and velocity of the data collected, creating challenges in how to efficiently extract actionable information.
Strategy
We propose and demonstrate a machine learning hardware implementation for real-time crystal structure, rotation, and strain detection in 4D-STEM by leveraging a novel deep neural network (DNN) called a cycle-consistent spatial-transforming autoencoder (CC-ST-AE) capable of learning affine transformations on real and simulation data. We then use distillation to train a smaller, quantized, easily-deployable version of the model to enable real-time inference and high throughput.
Results
We use hls4ml to synthesize the distilled model and optimize the implementation to meet the required latency constraint of 100us. We then integrate the neural network in the readout path of the imaging system onboard a Euresys CoaXPress frame grabber to minimize IO-related overhead. This work provides a proof-of-concept for real-time crystal structure detection in 4D-STEM, significantly increasing the potential for fast materials characterization and discovery.