Skip to main content

Showing 1–36 of 36 results for author: Loncar, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.26595  [pdf, ps, other

    cs.LG hep-ex

    PQuantML: A Tool for End-to-End Hardware-aware Model Compression

    Authors: Roope Niemi, Anastasiia Petrovych, Arghya Ranjan Das, Enrico Lupi, Chang Sun, Dimitrios Danopoulos, Marlon Joshua Helbing, Mia Liu, Sebastian Dittmeier, Michael Kagan, Vladimir Loncar, Maurizio Pierini

    Abstract: PQuantML is a new open-source, hardware-aware neural network model compression library tailored to end-to-end workflows. Motivated by the need to deploy performant models to environments with strict latency constraints, PQuantML simplifies training of compressed models by providing a unified interface to apply pruning and quantization, either jointly or individually. The library implements multipl… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

  2. arXiv:2602.22248  [pdf, ps, other

    physics.ins-det cs.AR eess.SP hep-ex

    Machine Learning on Heterogeneous, Edge, and Quantum Hardware for Particle Physics (ML-HEQUPP)

    Authors: Julia Gonski, Jenni Ott, Shiva Abbaszadeh, Sagar Addepalli, Matteo Cremonesi, Jennet Dickinson, Giuseppe Di Guglielmo, Erdem Yigit Ertorer, Lindsey Gray, Ryan Herbst, Christian Herwig, Tae Min Hong, Benedikt Maier, Maryam Bayat Makou, David Miller, Mark S. Neubauer, Cristián Peña, Dylan Rankin, Seon-Hee, Seo, Giordon Stark, Alexander Tapper, Audrey Corbeil Therrien, Ioannis Xiotidis, Keisuke Yoshihara , et al. (98 additional authors not shown)

    Abstract: The next generation of particle physics experiments will face a new era of challenges in data acquisition, due to unprecedented data rates and volumes along with extreme environments and operational constraints. Harnessing this data for scientific discovery demands real-time inference and decision-making, intelligent data reduction, and efficient processing architectures beyond current capabilitie… ▽ More

    Submitted 10 March, 2026; v1 submitted 24 February, 2026; originally announced February 2026.

    Comments: 125 pages, 51 figures

  3. arXiv:2602.15751  [pdf, ps, other

    hep-ex cs.LG

    Enabling Low-Latency Machine learning on Radiation-Hard FPGAs with hls4ml

    Authors: Katya Govorkova, Julian Garcia Pardinas, Vladimir Loncar, Victoria Nguyen, Sebastian Schmitt, Marco Pizzichemi, Loris Martinazzoli, Eluned Anne Smith

    Abstract: This paper presents the first demonstration of a viable, ultra-fast, radiation-hard machine learning (ML) application on FPGAs, which could be used in future high-energy physics experiments. We present a three-fold contribution, with the PicoCal calorimeter, planned for the LHCb Upgrade II experiment, used as a test case. First, we develop a lightweight autoencoder to compress a 32-sample timing r… ▽ More

    Submitted 17 February, 2026; originally announced February 2026.

  4. arXiv:2512.15946  [pdf, ps, other

    cs.LG cs.AR hep-ex

    AIE4ML: An End-to-End Framework for Compiling Neural Networks for the Next Generation of AMD AI Engines

    Authors: Dimitrios Danopoulos, Enrico Lupi, Chang Sun, Sebastian Dittmeier, Michael Kagan, Vladimir Loncar, Maurizio Pierini

    Abstract: Efficient AI inference on AMD's Versal AI Engine (AIE) is challenging due to tightly coupled VLIW execution, explicit datapaths, and local memory management. Prior work focused on first-generation AIE kernel optimizations, without tackling full neural network execution across the 2D array. In this work, we present AIE4ML, the first comprehensive framework for converting AI models automatically int… ▽ More

    Submitted 9 January, 2026; v1 submitted 17 December, 2025; originally announced December 2025.

  5. arXiv:2512.06208  [pdf, ps, other

    cs.AR cs.LG hep-ex

    SparsePixels: Efficient Convolution for Sparse Data on FPGAs

    Authors: Ho Fung Tsoi, Dylan Rankin, Vladimir Loncar, Philip Harris

    Abstract: Inference of standard convolutional neural networks (CNNs) on FPGAs often incurs high latency and a long initiation interval due to the deep nested loops required to densely convolve every input pixel regardless of its feature value. However, input features can be spatially sparse in some image data, where semantic information may occupy only a small fraction of the pixels and most computation wou… ▽ More

    Submitted 15 December, 2025; v1 submitted 5 December, 2025; originally announced December 2025.

    Comments: Under review

  6. arXiv:2512.01463  [pdf, ps, other

    cs.AR cs.LG hep-ex

    hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware

    Authors: Jan-Frederik Schulte, Benjamin Ramhorst, Chang Sun, Jovan Mitrevski, Nicolò Ghielmetti, Enrico Lupi, Dimitrios Danopoulos, Vladimir Loncar, Javier Duarte, David Burnette, Lauri Laatu, Stylianos Tzelepis, Konstantinos Axiotis, Quentin Berthet, Haoyan Wang, Paul White, Suleyman Demirsoy, Marco Colombo, Thea Aarrestad, Sioni Summers, Maurizio Pierini, Giuseppe Di Guglielmo, Jennifer Ngadiuba, Javier Campos, Ben Hawks , et al. (28 additional authors not shown)

    Abstract: We present hls4ml, a free and open-source platform that translates machine learning (ML) models from modern deep learning frameworks into high-level synthesis (HLS) code that can be integrated into full designs for field-programmable gate arrays (FPGAs) or application-specific integrated circuits (ASICs). With its flexible and modular design, hls4ml supports a large number of deep learning framewo… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

  7. arXiv:2511.05615  [pdf, ps, other

    cs.LG cs.AI cs.AR physics.ins-det

    wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation

    Authors: Benjamin Hawks, Jason Weitz, Dmitri Demler, Karla Tame-Narvaez, Dennis Plotnikov, Mohammad Mehdi Rahimifar, Hamza Ezzaoui Rahali, Audrey C. Therrien, Donovan Sproule, Elham E Khoda, Keegan A. Smith, Russell Marroquin, Giuseppe Di Guglielmo, Nhan Tran, Javier Duarte, Vladimir Loncar

    Abstract: As machine learning (ML) is increasingly implemented in hardware to address real-time challenges in scientific applications, the development of advanced toolchains has significantly reduced the time required to iterate on various designs. These advancements have solved major obstacles, but also exposed new challenges. For example, processes that were not previously considered bottlenecks, such as… ▽ More

    Submitted 6 November, 2025; originally announced November 2025.

    Comments: 30 pages, 18 figures

    Report number: FERMILAB-PUB-25-0359-CSAID

  8. arXiv:2507.04535  [pdf, ps, other

    cs.AR cs.LG hep-ex

    da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs

    Authors: Chang Sun, Zhiqiang Que, Vladimir Loncar, Wayne Luk, Maria Spiropulu

    Abstract: Neural networks with a latency requirement on the order of microseconds, like the ones used at the CERN Large Hadron Collider, are typically deployed on FPGAs fully unrolled and pipelined. A bottleneck for the deployment of such neural networks is area utilization, which is directly related to the required constant matrix-vector multiplication (CMVM) operations. In this work, we propose an efficie… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    ACM Class: B.2.4; B.6

  9. arXiv:2503.02112  [pdf, other

    cs.LG astro-ph.IM

    Building Machine Learning Challenges for Anomaly Detection in Science

    Authors: Elizabeth G. Campolongo, Yuan-Tang Chou, Ekaterina Govorkova, Wahid Bhimji, Wei-Lun Chao, Chris Harris, Shih-Chieh Hsu, Hilmar Lapp, Mark S. Neubauer, Josephine Namayanja, Aneesh Subramanian, Philip Harris, Advaith Anand, David E. Carlyn, Subhankar Ghosh, Christopher Lawrence, Eric Moreno, Ryan Raikman, Jiaman Wu, Ziheng Zhang, Bayu Adhi, Mohammad Ahmadi Gharehtoragh, Saúl Alonso Monsalve, Marta Babicz, Furqan Baig , et al. (126 additional authors not shown)

    Abstract: Scientific discoveries are often made by finding a pattern or object that was not predicted by the known rules of science. Oftentimes, these anomalous events or objects that do not conform to the norms are an indication that the rules of science governing the data are incomplete, and something new needs to be present to explain these unexpected outliers. The challenge of finding anomalies can be c… ▽ More

    Submitted 29 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: 17 pages 6 figures to be submitted to Nature Communications

  10. arXiv:2502.02304  [pdf, ps, other

    hep-ex cs.DC cs.LG physics.ins-det

    Comparative Analysis of FPGA and GPU Performance for Machine Learning-Based Track Reconstruction at LHCb

    Authors: Fotis I. Giasemis, Vladimir Lončar, Bertrand Granado, Vladimir Vava Gligorov

    Abstract: In high-energy physics, the increasing luminosity and detector granularity at the Large Hadron Collider are driving the need for more efficient data processing solutions. Machine Learning has emerged as a promising tool for reconstructing charged particle tracks, due to its potentially linear computational scaling with detector hits. The recent implementation of a graph neural network-based track… ▽ More

    Submitted 30 April, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  11. arXiv:2501.04845  [pdf, ps, other

    physics.ins-det cs.LG hep-ex nucl-ex

    Intelligent experiments through real-time AI: Fast Data Processing and Autonomous Detector Control for sPHENIX and future EIC detectors

    Authors: J. Kvapil, G. Borca-Tasciuc, H. Bossi, K. Chen, Y. Chen, Y. Corrales Morales, H. Da Costa, C. Da Silva, C. Dean, J. Durham, S. Fu, C. Hao, P. Harris, O. Hen, H. Jheng, Y. Lee, P. Li, X. Li, Y. Lin, M. X. Liu, V. Loncar, J. P. Mitrevski, A. Olvera, M. L. Purschke, J. S. Renck , et al. (8 additional authors not shown)

    Abstract: This R\&D project, initiated by the DOE Nuclear Physics AI-Machine Learning initiative in 2022, leverages AI to address data processing challenges in high-energy nuclear experiments (RHIC, LHC, and future EIC). Our focus is on developing a demonstrator for real-time processing of high-rate data streams from sPHENIX experiment tracking detectors. The limitations of a 15 kHz maximum trigger rate imp… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

    Comments: proceedings for 42nd International Conference on High Energy Physics (ICHEP2024), 18-24 July 2024, Prague, Czech Republic

    Report number: LA-UR-24-30394

  12. arXiv:2411.09851  [pdf, other

    hep-ex cs.LG physics.data-an

    SymbolFit: Automatic Parametric Modeling with Symbolic Regression

    Authors: Ho Fung Tsoi, Dylan Rankin, Cecile Caillol, Miles Cranmer, Sridhara Dasu, Javier Duarte, Philip Harris, Elliot Lipeles, Vladimir Loncar

    Abstract: We introduce SymbolFit, a framework that automates parametric modeling by using symbolic regression to perform a machine-search for functions that fit the data while simultaneously providing uncertainty estimates in a single run. Traditionally, constructing a parametric model to accurately describe binned data has been a manual and iterative process, requiring an adequate functional form to be det… ▽ More

    Submitted 10 May, 2025; v1 submitted 14 November, 2024; originally announced November 2024.

    Comments: 52 pages, 35 figures. Under review. The API can be used out-of-the-box and is available at https://github.com/hftsoi/symbolfit

    Journal ref: Comput. Softw. Big Sci. 9, 12 (2025)

  13. arXiv:2409.05207  [pdf, other

    cs.LG

    Low Latency Transformer Inference on FPGAs for Physics Applications with hls4ml

    Authors: Zhixing Jiang, Dennis Yin, Yihui Chen, Elham E Khoda, Scott Hauck, Shih-Chieh Hsu, Ekaterina Govorkova, Philip Harris, Vladimir Loncar, Eric A. Moreno

    Abstract: This study presents an efficient implementation of transformer architectures in Field-Programmable Gate Arrays(FPGAs) using hls4ml. We demonstrate the strategy for implementing the multi-head attention, softmax, and normalization layer and evaluate three distinct models. Their deployment on VU13P FPGA chip achieved latency less than 2us, demonstrating the potential for real-time applications. HLS4… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

  14. arXiv:2406.19522  [pdf, other

    cs.LG

    Reliable edge machine learning hardware for scientific applications

    Authors: Tommaso Baldi, Javier Campos, Ben Hawks, Jennifer Ngadiuba, Nhan Tran, Daniel Diaz, Javier Duarte, Ryan Kastner, Andres Meza, Melissa Quinnan, Olivia Weng, Caleb Geniesse, Amir Gholami, Michael W. Mahoney, Vladimir Loncar, Philip Harris, Joshua Agar, Shuyu Qin

    Abstract: Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing. This leads to unique validation challenges for VLSI implementations of ML algorithms: enabling bit-accurate functional simulations for performance validation in experimental software frameworks, verifying those ML models are robust under extreme quantization and pruning, and enabling… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: IEEE VLSI Test Symposium 2024 (VTS)

    Report number: FERMILAB-CONF-24-0116-CSAID

  15. arXiv:2405.00645  [pdf, ps, other

    cs.LG physics.ins-det

    HGQ: High Granularity Quantization for Real-time Neural Networks on FPGAs

    Authors: Chang Sun, Zhiqiang Que, Thea K. Årrestad, Vladimir Loncar, Jennifer Ngadiuba, Wayne Luk, Maria Spiropulu

    Abstract: Neural networks with sub-microsecond inference latency are required by many critical applications. Targeting such applications deployed on FPGAs, we present High Granularity Quantization (HGQ), a quantization-aware training framework that optimizes parameter bit-widths through gradient descent. Unlike conventional methods, HGQ determines the optimal bit-width for each parameter independently, maki… ▽ More

    Submitted 19 December, 2025; v1 submitted 1 May, 2024; originally announced May 2024.

    Report number: FERMILAB-PUB-24-0213-CMS, CaltechAUTHORS:10.7907/hq8jd-rhg30

  16. arXiv:2402.01876  [pdf, other

    hep-ex cs.LG physics.ins-det

    Ultrafast jet classification on FPGAs for the HL-LHC

    Authors: Patrick Odagiu, Zhiqiang Que, Javier Duarte, Johannes Haller, Gregor Kasieczka, Artur Lobanov, Vladimir Loncar, Wayne Luk, Jennifer Ngadiuba, Maurizio Pierini, Philipp Rincke, Arpita Seksaria, Sioni Summers, Andre Sznajder, Alexander Tapper, Thea K. Aarrestad

    Abstract: Three machine learning models are used to perform jet origin classification. These models are optimized for deployment on a field-programmable gate array device. In this context, we demonstrate how latency and resource consumption scale with the input size and choice of algorithm. Moreover, the models proposed here are designed to work on the type of data and under the foreseen conditions at the C… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: 13 pages, 3 figures, 3 tables. Mach. Learn.: Sci. Technol (2024)

    Report number: FERMILAB-PUB-24-0030-CMS-CSAID-PPD

  17. arXiv:2402.01047  [pdf, other

    cs.LG cs.AR hep-ex

    Ultra Fast Transformers on FPGAs for Particle Physics Experiments

    Authors: Zhixing Jiang, Dennis Yin, Elham E Khoda, Vladimir Loncar, Ekaterina Govorkova, Eric Moreno, Philip Harris, Scott Hauck, Shih-Chieh Hsu

    Abstract: This work introduces a highly efficient implementation of the transformer architecture on a Field-Programmable Gate Array (FPGA) by using the \texttt{hls4ml} tool. Given the demonstrated effectiveness of transformer models in addressing a wide range of problems, their application in experimental triggers within particle physics becomes a subject of significant interest. In this work, we have imple… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 6 pages, 2 figures

    Journal ref: Machine Learning and the Physical Sciences Workshop, NeurIPS 2023

  18. arXiv:2401.09949  [pdf, other

    cs.LG hep-ex physics.ins-det

    SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression

    Authors: Ho Fung Tsoi, Vladimir Loncar, Sridhara Dasu, Philip Harris

    Abstract: Compact symbolic expressions have been shown to be more efficient than neural network models in terms of resource consumption and inference speed when implemented on custom hardware such as FPGAs, while maintaining comparable accuracy~\cite{tsoi2023symbolic}. These capabilities are highly valuable in environments with stringent computational resource constraints, such as high-energy physics experi… ▽ More

    Submitted 3 January, 2025; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 21 pages, 9 figures. to be published in MLST

    Journal ref: Mach. Learn. Sci. Tech. 6 (2025) 1, 015021

  19. FPGA Resource-aware Structured Pruning for Real-Time Neural Networks

    Authors: Benjamin Ramhorst, Vladimir Loncar, George A. Constantinides

    Abstract: Neural networks achieve state-of-the-art performance in image classification, speech recognition, scientific analysis and many more application areas. Due to the high computational complexity and memory footprint of neural networks, various compression techniques, such as pruning and quantization, have been proposed in literature. Pruning sparsifies a neural network, reducing the number of multipl… ▽ More

    Submitted 12 December, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

  20. arXiv:2305.04099  [pdf, other

    cs.LG hep-ex physics.ins-det

    Symbolic Regression on FPGAs for Fast Machine Learning Inference

    Authors: Ho Fung Tsoi, Adrian Alan Pol, Vladimir Loncar, Ekaterina Govorkova, Miles Cranmer, Sridhara Dasu, Peter Elmer, Philip Harris, Isobel Ojalvo, Maurizio Pierini

    Abstract: The high-energy physics community is investigating the potential of deploying machine-learning-based solutions on Field-Programmable Gate Arrays (FPGAs) to enhance physics sensitivity while still meeting data processing time constraints. In this contribution, we introduce a novel end-to-end procedure that utilizes a machine learning technique called symbolic regression (SR). It searches the equati… ▽ More

    Submitted 17 January, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

    Comments: 9 pages. Accepted to 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023)

    Journal ref: EPJ Web of Conferences 295, 09036 (2024)

  21. arXiv:2301.07247  [pdf, other

    cs.CV cs.LG cs.NE

    Tailor: Altering Skip Connections for Resource-Efficient Inference

    Authors: Olivia Weng, Gabriel Marcano, Vladimir Loncar, Alireza Khodamoradi, Nojan Sheybani, Andres Meza, Farinaz Koushanfar, Kristof Denolf, Javier Mauricio Duarte, Ryan Kastner

    Abstract: Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this paper, we show that skip connections can be optimized for hardware when tackled with a hardware-software codesign approach. We argue that while a network's sk… ▽ More

    Submitted 15 September, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

  22. arXiv:2207.00559  [pdf, other

    cs.LG hep-ex physics.ins-det stat.ML

    Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

    Authors: Elham E Khoda, Dylan Rankin, Rafael Teixeira de Lima, Philip Harris, Scott Hauck, Shih-Chieh Hsu, Michael Kagan, Vladimir Loncar, Chaitanya Paikara, Richa Rao, Sioni Summers, Caterina Vernieri, Aaron Wang

    Abstract: Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neura… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: 12 pages, 6 figures, 5 tables

  23. arXiv:2206.07527  [pdf, other

    cs.LG cs.AR cs.PL stat.ML

    QONNX: Representing Arbitrary-Precision Quantized Neural Networks

    Authors: Alessandro Pappalardo, Yaman Umuroglu, Michaela Blott, Jovan Mitrevski, Ben Hawks, Nhan Tran, Vladimir Loncar, Sioni Summers, Hendrik Borras, Jules Muhizi, Matthew Trahms, Shih-Chieh Hsu, Scott Hauck, Javier Duarte

    Abstract: We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantiz… ▽ More

    Submitted 24 June, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: 9 pages, 5 figures, Contribution to 4th Workshop on Accelerated Machine Learning (AccML) at HiPEAC 2022 Conference

    Report number: FERMILAB-CONF-22-471-SCD

  24. arXiv:2205.07690  [pdf, other

    cs.CV cs.AR cs.LG physics.ins-det stat.ML

    Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

    Authors: Nicolò Ghielmetti, Vladimir Loncar, Maurizio Pierini, Marcel Roed, Sioni Summers, Thea Aarrestad, Christoffer Petersson, Hampus Linander, Jennifer Ngadiuba, Kelvin Lin, Philip Harris

    Abstract: In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources on a Xilinx Z… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: 11 pages, 6 tables, 5 figures

  25. arXiv:2202.04499  [pdf, other

    hep-ex cs.LG

    Lightweight Jet Reconstruction and Identification as an Object Detection Task

    Authors: Adrian Alan Pol, Thea Aarrestad, Ekaterina Govorkova, Roi Halily, Anat Klempner, Tal Kopetz, Vladimir Loncar, Jennifer Ngadiuba, Maurizio Pierini, Olya Sirkin, Sioni Summers

    Abstract: We apply object detection techniques based on deep convolutional blocks to end-to-end jet identification and reconstruction tasks encountered at the CERN Large Hadron Collider (LHC). Collision events produced at the LHC and represented as an image composed of calorimeter and tracker cells are given as an input to a Single Shot Detection network. The algorithm, named PFJet-SSD performs simultaneous… ▽ More

    Submitted 9 February, 2022; originally announced February 2022.

  26. arXiv:2106.14089  [pdf, other

    cs.LG cs.AR physics.ins-det

    Accelerating Recurrent Neural Networks for Gravitational Wave Experiments

    Authors: Zhiqiang Que, Erwei Wang, Umar Marikar, Eric Moreno, Jennifer Ngadiuba, Hamza Javed, Bartłomiej Borzyszkowski, Thea Aarrestad, Vladimir Loncar, Sioni Summers, Maurizio Pierini, Peter Y Cheung, Wayne Luk

    Abstract: This paper presents novel reconfigurable architectures for reducing the latency of recurrent neural networks (RNNs) that are used for detecting gravitational waves. Gravitational interferometers such as the LIGO detectors capture cosmic events such as black hole mergers which happen at unknown times and of varying durations, producing time-series data. We have developed a new architecture capable… ▽ More

    Submitted 26 June, 2021; originally announced June 2021.

    Comments: Accepted at the 2021 32nd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP)

  27. arXiv:2105.01683  [pdf, other

    physics.ins-det cs.LG hep-ex

    A reconfigurable neural network ASIC for detector front-end data compression at the HL-LHC

    Authors: Giuseppe Di Guglielmo, Farah Fahim, Christian Herwig, Manuel Blanco Valentin, Javier Duarte, Cristian Gingu, Philip Harris, James Hirschauer, Martin Kwok, Vladimir Loncar, Yingyi Luo, Llovizna Miranda, Jennifer Ngadiuba, Daniel Noonan, Seda Ogrenci-Memik, Maurizio Pierini, Sioni Summers, Nhan Tran

    Abstract: Despite advances in the programmable logic capabilities of modern trigger systems, a significant bottleneck remains in the amount of data to be transported from the detector to off-detector logic where trigger decisions are made. We demonstrate that a neural network autoencoder model can be implemented in a radiation tolerant ASIC to perform lossy data compression alleviating the data transmission… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

    Comments: 9 pages, 8 figures, 3 tables

    Report number: FERMILAB-PUB-21-217-CMS-E-SCD

    Journal ref: IEEE Trans. Nucl. Sci. 68, 2179 (2021)

  28. arXiv:2103.05579  [pdf, other

    cs.LG cs.AR physics.ins-det

    hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

    Authors: Farah Fahim, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo Jindariani, Nhan Tran, Luca P. Carloni, Giuseppe Di Guglielmo, Philip Harris, Jeffrey Krupa, Dylan Rankin, Manuel Blanco Valentin, Josiah Hester, Yingyi Luo, John Mamish, Seda Orgrenci-Memik, Thea Aarrestad, Hamza Javed, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Sioni Summers, Javier Duarte, Scott Hauck, Shih-Chieh Hsu , et al. (5 additional authors not shown)

    Abstract: Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-h… ▽ More

    Submitted 23 March, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

    Comments: 10 pages, 8 figures, TinyML Research Symposium 2021

    Report number: FERMILAB-CONF-21-080-SCD

  29. arXiv:2101.05108  [pdf, other

    cs.LG cs.CV hep-ex physics.ins-det stat.ML

    Fast convolutional neural networks on FPGAs with hls4ml

    Authors: Thea Aarrestad, Vladimir Loncar, Nicolò Ghielmetti, Maurizio Pierini, Sioni Summers, Jennifer Ngadiuba, Christoffer Petersson, Hampus Linander, Yutaro Iiyama, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Dylan Rankin, Sergo Jindariani, Kevin Pedro, Nhan Tran, Mia Liu, Edward Kreinar, Zhenbin Wu, Duc Hoang

    Abstract: We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of $5\,μ$s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Num… ▽ More

    Submitted 29 April, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

    Comments: 18 pages, 18 figures, 4 tables

    Journal ref: Mach. Learn.: Sci. Technol. 2 045015 (2021)

  30. arXiv:2012.01563  [pdf, other

    physics.ins-det cs.LG hep-ex physics.comp-ph

    Accelerated Charged Particle Tracking with Graph Neural Networks on FPGAs

    Authors: Aneesh Heintz, Vesal Razavimaleki, Javier Duarte, Gage DeZoort, Isobel Ojalvo, Savannah Thais, Markus Atkinson, Mark Neubauer, Lindsey Gray, Sergo Jindariani, Nhan Tran, Philip Harris, Dylan Rankin, Thea Aarrestad, Vladimir Loncar, Maurizio Pierini, Sioni Summers, Jennifer Ngadiuba, Mia Liu, Edward Kreinar, Zhenbin Wu

    Abstract: We develop and study FPGA implementations of algorithms for charged particle tracking based on graph neural networks. The two complementary FPGA designs are based on OpenCL, a framework for writing programs that execute across heterogeneous platforms, and hls4ml, a high-level-synthesis-based compiler for neural network to firmware conversion. We evaluate and compare the resource usage, latency, an… ▽ More

    Submitted 30 November, 2020; originally announced December 2020.

    Comments: 8 pages, 4 figures, To appear in Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020)

    Report number: FERMILAB-CONF-20-622-CMS-SCD

  31. arXiv:2008.03601  [pdf, other

    physics.ins-det cs.LG hep-ex

    Distance-Weighted Graph Neural Networks on FPGAs for Real-Time Particle Reconstruction in High Energy Physics

    Authors: Yutaro Iiyama, Gianluca Cerminara, Abhijay Gupta, Jan Kieseler, Vladimir Loncar, Maurizio Pierini, Shah Rukh Qasim, Marcel Rieger, Sioni Summers, Gerrit Van Onsem, Kinga Wozniak, Jennifer Ngadiuba, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Dylan Rankin, Sergo Jindariani, Mia Liu, Kevin Pedro, Nhan Tran, Edward Kreinar, Zhenbin Wu

    Abstract: Graph neural networks have been shown to achieve excellent performance for several crucial tasks in particle physics, such as charged particle tracking, jet tagging, and clustering. An important domain for the application of these networks is the FGPA-based first layer of real-time data filtering at the CERN Large Hadron Collider, which has strict latency and resource constraints. We discuss how t… ▽ More

    Submitted 3 February, 2021; v1 submitted 8 August, 2020; originally announced August 2020.

    Comments: 15 pages, 4 figures

    Report number: FERMILAB-PUB-20-405-E-SCD

    Journal ref: Frontiers in Big Data 3 (2021) 44

  32. arXiv:2006.10159  [pdf, other

    physics.ins-det cs.LG eess.IV eess.SP hep-ex

    Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors

    Authors: Claudionor N. Coelho Jr., Aki Kuusela, Shan Li, Hao Zhuang, Thea Aarrestad, Vladimir Loncar, Jennifer Ngadiuba, Maurizio Pierini, Adrian Alan Pol, Sioni Summers

    Abstract: Although the quest for more accurate solutions is pushing deep learning research towards larger and more complex algorithms, edge devices demand efficient inference and therefore reduction in model size, latency and energy consumption. One technique to limit model size is quantization, which implies using fewer bits to represent weights and biases. Such an approach usually results in a decline in… ▽ More

    Submitted 21 June, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Journal ref: Nature Machine Intelligence, Volume 3 (2021)

  33. arXiv:2003.06308  [pdf, other

    cs.LG eess.SP hep-ex

    Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML

    Authors: Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Duc Hoang, Sergo Jindariani, Edward Kreinar, Mia Liu, Vladimir Loncar, Jennifer Ngadiuba, Kevin Pedro, Maurizio Pierini, Dylan Rankin, Sheila Sagear, Sioni Summers, Nhan Tran, Zhenbin Wu

    Abstract: We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parame… ▽ More

    Submitted 29 June, 2020; v1 submitted 11 March, 2020; originally announced March 2020.

    Comments: Update to MLST journal version

    Report number: FERMILAB-PUB-20-167-PPD-SCD

    Journal ref: Mach. Learn.: Sci. Technol. 2, 015001 (2020)

  34. arXiv:2002.02534  [pdf, other

    physics.comp-ph astro-ph.IM cs.LG hep-ex

    Fast inference of Boosted Decision Trees in FPGAs for particle physics

    Authors: Sioni Summers, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Duc Hoang, Sergo Jindariani, Edward Kreinar, Vladimir Loncar, Jennifer Ngadiuba, Maurizio Pierini, Dylan Rankin, Nhan Tran, Zhenbin Wu

    Abstract: We describe the implementation of Boosted Decision Trees in the hls4ml library, which allows the translation of a trained model into FPGA firmware through an automated conversion process. Thanks to its fully on-chip implementation, hls4ml performs inference of Boosted Decision Tree models with extremely low latency. With a typical latency less than 100 ns, this solution is suitable for FPGA-based… ▽ More

    Submitted 19 February, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

    Journal ref: JINST 15 P05026 (2020)

  35. arXiv:1709.04423  [pdf, other

    physics.comp-ph cond-mat.quant-gas cs.MS nlin.PS

    OpenMP GNU and Intel Fortran programs for solving the time-dependent Gross-Pitaevskii equation

    Authors: Luis E. Young-S., Paulsamy Muruganandam, Sadhan K. Adhikari, Vladimir Loncar, Dusan Vudragovic, Antun Balaz

    Abstract: We present Open Multi-Processing (OpenMP) version of Fortran 90 programs for solving the Gross-Pitaevskii (GP) equation for a Bose-Einstein condensate in one, two, and three spatial dimensions, optimized for use with GNU and Intel compilers. We use the split-step Crank-Nicolson algorithm for imaginary- and real-time propagation, which enables efficient calculation of stationary and non-stationary… ▽ More

    Submitted 13 September, 2017; originally announced September 2017.

    Comments: 5 pages, 2 figures; to download the programs, click 'Other formats' and download the source

    Journal ref: Comput. Phys. Commun. 220 (2017) 503

  36. arXiv:1610.05329  [pdf, ps, other

    cond-mat.quant-gas cs.MS nlin.PS physics.comp-ph

    OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross-Pitaevskii equation

    Authors: Vladimir Loncar, Luis E. Young-S., Srdjan Skrbic, Paulsamy Muruganandam, Sadhan K. Adhikari, Antun Balaz

    Abstract: We present new versions of the previously published C and CUDA programs for solving the dipolar Gross-Pitaevskii equation in one, two, and three spatial dimensions, which calculate stationary and non-stationary solutions by propagation in imaginary or real time. Presented programs are improved and parallelized versions of previous programs, divided into three packages according to the type of para… ▽ More

    Submitted 1 August, 2022; v1 submitted 17 October, 2016; originally announced October 2016.

    Comments: 8 pages, 6 figures; to download the programs, click "Other formats" and download the source

    Journal ref: Comput. Phys. Commun. 209 (2016) 190