The SuperSONIC project implements server infrastructure for inference-as-a-service applications in large high energy physics (HEP) and multi-messenger astrophysics (MMA) experiments. The server infrastructure is designed for deployment at Kubernetes clusters equipped with GPUs.

SuperSONIC GitHub repository: fastmachinelearning/SuperSONIC.

Why “inference-as-a-service”?

The computing demands of modern scientific experiments are growing at a faster rate than the performance improvements of traditional processors (CPUs). This trend is driven by increasing data collection rates, tightening latency requirements, and rising complexity of algorithms, particularly those based on machine learning. Such a computing landscape strongly motivates the adoption of specialized coprocessors, such as FPGAs, GPUs, and TPUs.

In “inference-as-a-service” model, the data processing workflows (“clients”) off-load computationally intensive steps, such as neural network inference, to a remote “server” equipped with coprocessors. This design allows to optimize both data processing throughput and coprocessor utilization by dynamically balancing the ratio of CPUs to coprocessors. Numerous R&D efforts implementing this paradigm in HEP and MMA experiments are grouped under the name SONIC (Services for Optimized Network Inference on Coprocessors).


SuperSONIC: a case for shared server infrastructure

A key feature of the SONIC approach is the decoupling of clients from servers and the standardization of communication between them. While client-side implementations may vary across applications, the server-side infrastructure can remain largely the same, since the server functionality requirements (load balancing, autoscaling, etc.) are not experiment-specific.

The purpose of SuperSONIC project is to develop server infrastructure that could be reused by scientific experiments with only small differences in configuration.

Experiments that use SuperSONIC

The experiments listed below are developing workflows with inference-as-a-service implementations compatible with SuperSONIC. We are open for collaboration and encourage other experiments to try SuperSONIC for their inference-as-a-service needs.

CMS Experiment at the Large Hadron Collider (CERN).

CMS is testing inference-as-a-service approach in Run 3 offline processing workflows, off-loading inferences to GPUs for machine learning models such as ParticleNet, DeepMET, DeepTau, ParT. In addition, non-ML tracking algorithms such as LST and Patatrack are being adapted for deployment as-a-service.

CMS Detector

ATLAS Experiment at the Large Hadron Collider (CERN).

ATLAS implements inference-as-a-service approach for tracking algorithms such as Exa.TrkX and Traccc.

ATLAS Detector

IceCube Neutrino Observatory at the South Pole.

IceCube uses SONIC approach to accelerate event classifier algorithms based on convolutional neural networks (CNNs).


Deployment sites

SuperSONIC has been successfully tested at the computing clusters listed below. We welcome developer help to add more computing centers to this list.