PQuantML: A Tool for End-to-End Hardware-aware Model Compression
Authors:
Roope Niemi,
Anastasiia Petrovych,
Arghya Ranjan Das,
Enrico Lupi,
Chang Sun,
Dimitrios Danopoulos,
Marlon Joshua Helbing,
Mia Liu,
Sebastian Dittmeier,
Michael Kagan,
Vladimir Loncar,
Maurizio Pierini
Abstract:
PQuantML is a new open-source, hardware-aware neural network model compression library tailored to end-to-end workflows. Motivated by the need to deploy performant models to environments with strict latency constraints, PQuantML simplifies training of compressed models by providing a unified interface to apply pruning and quantization, either jointly or individually. The library implements multipl…
▽ More
PQuantML is a new open-source, hardware-aware neural network model compression library tailored to end-to-end workflows. Motivated by the need to deploy performant models to environments with strict latency constraints, PQuantML simplifies training of compressed models by providing a unified interface to apply pruning and quantization, either jointly or individually. The library implements multiple pruning methods with different granularities, as well as fixed-point quantization with support for High-Granularity Quantization. We evaluate PQuantML on representative tasks such as the jet substructure classification, so-called jet tagging, an on-edge problem related to real-time LHC data processing. Using various pruning methods with fixed-point quantization, PQuantML achieves substantial parameter and bit-width reductions while maintaining accuracy. The resulting compression is further compared against existing tools, such as QKeras and HGQ.
△ Less
Submitted 27 March, 2026;
originally announced March 2026.