Skip to main content

Showing 1–50 of 84 results for author: Brooks, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2602.18568  [pdf, ps, other

    cs.AR cs.AI

    RPU -- A Reasoning Processing Unit

    Authors: Matthew Adiletta, Gu-Yeon Wei, David Brooks

    Abstract: Large language model (LLM) inference performance is increasingly bottlenecked by the memory wall. While GPUs continue to scale raw compute throughput, they struggle to deliver scalable performance for memory bandwidth bound workloads. This challenge is amplified by emerging reasoning LLM applications, where long output sequences, low arithmetic intensity, and tight latency constraints demand signi… ▽ More

    Submitted 23 February, 2026; v1 submitted 20 February, 2026; originally announced February 2026.

    Comments: To Appear in HPCA, 2026

  2. arXiv:2512.12106  [pdf, ps, other

    cs.AR

    DreamRAM: A Fine-Grained Configurable Design Space Modeling Tool for Custom 3D Die-Stacked DRAM

    Authors: Victor Cai, Jennifer Zhou, Haebin Do, David Brooks, Gu-Yeon Wei

    Abstract: 3D die-stacked DRAM has emerged as a key technology for delivering high bandwidth and high density for applications such as high-performance computing, graphics, and machine learning. However, different applications place diverse and sometimes diverging demands on power, performance, and area that cannot be universally satisfied with fixed commodity DRAM designs. Die stacking creates the opportuni… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

    Comments: Design, Automation and Test in Europe Conference (DATE 2026)

  3. arXiv:2511.04681  [pdf, ps, other

    astro-ph.CO cs.LG

    Dark Energy Survey Year 3 results: Simulation-based $w$CDM inference from weak lensing and galaxy clustering maps with deep learning: Analysis design

    Authors: A. Thomsen, J. Bucko, T. Kacprzak, V. Ajani, J. Fluri, A. Refregier, D. Anbajagane, F. J. Castander, A. Ferté, M. Gatti, N. Jeffrey, A. Alarcon, A. Amon, K. Bechtol, M. R. Becker, G. M. Bernstein, A. Campos, A. Carnero Rosell, C. Chang, R. Chen, A. Choi, M. Crocce, C. Davis, J. DeRose, S. Dodelson , et al. (77 additional authors not shown)

    Abstract: Data-driven approaches using deep learning are emerging as powerful techniques to extract non-Gaussian information from cosmological large-scale structure. This work presents the first simulation-based inference (SBI) pipeline that combines weak lensing and galaxy clustering maps in a realistic Dark Energy Survey Year 3 (DES Y3) configuration and serves as preparation for a forthcoming analysis of… ▽ More

    Submitted 18 February, 2026; v1 submitted 6 November, 2025; originally announced November 2025.

    Comments: 39 pages, 14 figures

  4. arXiv:2510.15596  [pdf, ps, other

    cs.DC

    PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training

    Authors: Alicia Golden, Michael Kuchnik, Samuel Hsia, Zachary DeVito, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

    Abstract: Large model training beyond tens of thousands of GPUs is an uncharted territory. At such scales, disruptions to the training process are not a matter of if, but a matter of when -- a stochastic process degrading training productivity. Dynamic runtime variation will become increasingly more frequent as training scales up and GPUs are operated in increasingly power-limited and thermally-stressed env… ▽ More

    Submitted 17 October, 2025; originally announced October 2025.

  5. arXiv:2510.07606  [pdf, ps, other

    cs.LG eess.SP

    Transformer-Based Indirect Structural Health Monitoring of Rail Infrastructure with Attention-Driven Detection and Localization of Transient Defects

    Authors: Sizhe Ma, Katherine A. Flanigan, Mario Bergés, James D. Brooks

    Abstract: Indirect structural health monitoring (iSHM) for broken rail detection using onboard sensors presents a cost-effective paradigm for railway track assessment, yet reliably detecting small, transient anomalies (2-10 cm) remains a significant challenge due to complex vehicle dynamics, signal noise, and the scarcity of labeled data limiting supervised approaches. This study addresses these issues thro… ▽ More

    Submitted 8 October, 2025; originally announced October 2025.

    Comments: Preprint presented at the 15th International Workshop on Structural Health Monitoring (IWSHM)

  6. arXiv:2510.01743  [pdf, ps, other

    cs.GR

    MIRAGE: Patient-Specific Mixed Reality Coaching for MRI via Depth-Only Markerless Registration and Immersive VR

    Authors: Daniel Brooks, Emily Carter, Hu Guo, Rajesh Nair

    Abstract: Magnetic resonance imaging (MRI) is an indispensable diagnostic tool, yet the confined bore and acoustic noise can evoke considerable anxiety and claustrophobic reactions. High anxiety leads to motion artifacts, incomplete scans and reliance on pharmacological sedation. MIRAGE (Mixed Reality Anxiety Guidance Environment) harnesses the latest mixed reality (MR) hardware to prepare patients for MRI… ▽ More

    Submitted 2 October, 2025; originally announced October 2025.

  7. arXiv:2505.14733  [pdf, ps, other

    cs.LG cs.AI

    The Energy Cost of Reasoning: Analyzing Energy Usage in LLMs with Test-time Compute

    Authors: Yunho Jin, Gu-Yeon Wei, David Brooks

    Abstract: Scaling large language models (LLMs) has driven significant advancements, yet it faces diminishing returns and escalating energy demands. This work explores how test-time compute (TTC) can serve as an energy-efficient complement to conventional scaling strategies by allocating additional computational resources at inference time rather than during training. Specifically, we investigate whether emp… ▽ More

    Submitted 9 November, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  8. arXiv:2505.06727  [pdf, ps, other

    cs.AR

    Modeling PFAS in Semiconductor Manufacturing to Quantify Trade-offs in Energy Efficiency and Environmental Impact of Computing Systems

    Authors: Mariam Elgamal, Abdulrahman Mahmoud, Gu-Yeon Wei, David Brooks, Gage Hills

    Abstract: The electronics and semiconductor industry is a prominent consumer of per- and poly-fluoroalkyl substances (PFAS), also known as forever chemicals. PFAS are persistent in the environment and can bioaccumulate to ecological and human toxic levels. Computer designers have an opportunity to reduce the use of PFAS in semiconductors and electronics manufacturing, including integrated circuits (IC), bat… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  9. arXiv:2501.07139  [pdf, other

    cs.AI cs.PF

    FlexQuant: Elastic Quantization Framework for Locally Hosted LLM on Edge Devices

    Authors: Yuji Chai, Mujin Kwen, David Brooks, Gu-Yeon Wei

    Abstract: Deploying LLMs on edge devices presents serious technical challenges. Memory elasticity is crucial for edge devices with unified memory, where memory is shared and fluctuates dynamically. Existing solutions suffer from either poor transition granularity or high storage costs. We propose FlexQuant, a novel elasticity framework that generates an ensemble of quantized models, providing an elastic hos… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

  10. arXiv:2412.19821  [pdf, other

    cs.AR cs.AI cs.DC cs.LG

    Nanoscaling Floating-Point (NxFP): NanoMantissa, Adaptive Microexponents, and Code Recycling for Direct-Cast Compression of Large Language Models

    Authors: Yun-Chen Lo, Gu-Yeon Wei, David Brooks

    Abstract: As cutting-edge large language models (LLMs) continue to transform various industries, their fast-growing model size and sequence length have led to memory traffic and capacity challenges. Recently, AMD, Arm, Intel, Meta, Microsoft, NVIDIA, and Qualcomm have proposed a Microscaling standard (Mx), which augments block floating-point with microexponents to achieve promising perplexity-to-footprint t… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

    Comments: 12 pages, 12 figures

    ACM Class: I.2.7; E.4

  11. arXiv:2405.13858  [pdf, other

    cs.DC cs.AR cs.ET cs.LG

    Carbon Connect: An Ecosystem for Sustainable Computing

    Authors: Benjamin C. Lee, David Brooks, Arthur van Benthem, Udit Gupta, Gage Hills, Vincent Liu, Benjamin Pierce, Christopher Stewart, Emma Strubell, Gu-Yeon Wei, Adam Wierman, Yuan Yao, Minlan Yu

    Abstract: Computing is at a moment of profound opportunity. Emerging applications -- such as capable artificial intelligence, immersive virtual realities, and pervasive sensor systems -- drive unprecedented demand for computer. Despite recent advances toward net zero carbon emissions, the computing industry's gross energy usage continues to rise at an alarming rate, outpacing the growth of new energy instal… ▽ More

    Submitted 21 August, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  12. arXiv:2405.02803  [pdf, other

    cs.LG cs.DC

    Is Flash Attention Stable?

    Authors: Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer, Yejin Lee, Zachary DeVito, Jeff Johnson, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

    Abstract: Training large-scale machine learning models poses distinct system challenges, given both the size and complexity of today's workloads. Recently, many organizations training state-of-the-art Generative AI models have reported cases of instability during training, often taking the form of loss spikes. Numeric deviation has emerged as a potential cause of this training instability, although quantify… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  13. arXiv:2402.13513  [pdf, other

    cs.AR

    Guac: Energy-Aware and SSA-Based Generation of Coarse-Grained Merged Accelerators from LLVM-IR

    Authors: Iulian Brumar, Rodrigo Rocha, Alex Bernat, Devashree Tripathy, David Brooks, Gu-Yeon Wei

    Abstract: Designing accelerators for resource- and power-constrained applications is a daunting task. High-level Synthesis (HLS) addresses these constraints through resource sharing, an optimization at the HLS binding stage that maps multiple operations to the same functional unit. However, resource sharing is often limited to reusing instructions within a basic block. Instead of searching globally for th… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  14. arXiv:2402.05893  [pdf, other

    cs.HC

    Personalizing Driver Safety Interfaces via Driver Cognitive Factors Inference

    Authors: Emily S Sumner, Jonathan DeCastro, Jean Costa, Deepak E Gopinath, Everlyne Kimani, Shabnam Hakimi, Allison Morgan, Andrew Best, Hieu Nguyen, Daniel J Brooks, Bassam ul Haq, Andrew Patrikalakis, Hiroshi Yasuda, Kate Sieck, Avinash Balachandran, Tiffany Chen, Guy Rosman

    Abstract: Recent advances in AI and intelligent vehicle technology hold promise to revolutionize mobility and transportation, in the form of advanced driving assistance (ADAS) interfaces. Although it is widely recognized that certain cognitive factors, such as impulsivity and inhibitory control, are related to risky driving behavior, play a significant role in on-road risk-taking, existing systems fail to l… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 12 pages, 7 figures

  15. arXiv:2401.16732  [pdf, other

    cs.CR

    Flash: A Hybrid Private Inference Protocol for Deep CNNs with High Accuracy and Low Latency on CPU

    Authors: Hyeri Roh, Jinsu Yeo, Yeongil Ko, Gu-Yeon Wei, David Brooks, Woo-Seok Choi

    Abstract: This paper presents Flash, an optimized private inference (PI) hybrid protocol utilizing both homomorphic encryption (HE) and secure two-party computation (2PC), which can reduce the end-to-end PI latency for deep CNN models less than 1 minute with CPU. To this end, first, Flash proposes a low-latency convolution algorithm built upon a fast slot rotation operation and a novel data encoding scheme,… ▽ More

    Submitted 17 January, 2025; v1 submitted 29 January, 2024; originally announced January 2024.

  16. arXiv:2312.14385  [pdf, other

    cs.DC cs.LG cs.MM

    Generative AI Beyond LLMs: System Implications of Multi-Modal Generation

    Authors: Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer, Yejin Lee, Zachary DeVito, Jeff Johnson, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

    Abstract: As the development of large-scale Generative AI models evolve beyond text (1D) generation to include image (2D) and video (3D) generation, processing spatial and temporal information presents unique challenges to quality, performance, and efficiency. We present the first work towards understanding this new system design space for multi-modal text-to-image (TTI) and text-to-video (TTV) generation m… ▽ More

    Submitted 5 May, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Published at 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

  17. arXiv:2311.14062  [pdf, other

    cs.CV

    Hardware Resilience Properties of Text-Guided Image Classifiers

    Authors: Syed Talal Wasim, Kabila Haile Soboka, Abdulrahman Mahmoud, Salman Khan, David Brooks, Gu-Yeon Wei

    Abstract: This paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. By utilizing enriched text embeddings derived from GPT-3 with question prompts per class and CLIP pretrained text encoder, we investigate their impact as an initialization for the classification layer. Our approach achieves a remarkable… ▽ More

    Submitted 5 December, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: Accepted at NeurIPS 2023

  18. arXiv:2311.08589  [pdf, other

    cs.DC cs.AR

    Carbon Responder: Coordinating Demand Response for the Datacenter Fleet

    Authors: Jiali Xing, Bilge Acun, Aditya Sundarrajan, David Brooks, Manoj Chakkaravarthy, Nikky Avila, Carole-Jean Wu, Benjamin C. Lee

    Abstract: The increasing integration of renewable energy sources results in fluctuations in carbon intensity throughout the day. To mitigate their carbon footprint, datacenters can implement demand response (DR) by adjusting their load based on grid signals. However, this presents challenges for private datacenters with diverse workloads and services. One of the key challenges is efficiently and fairly allo… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  19. arXiv:2310.02784  [pdf, other

    cs.DC cs.AR cs.LG

    MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

    Authors: Samuel Hsia, Alicia Golden, Bilge Acun, Newsha Ardalani, Zachary DeVito, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

    Abstract: Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs. Our analysis, grounded in real-world large model training on datacenter-scale infrastructures, reveals that 14~32% of all GPU hours are spent on communication with no overlapping computation. To minimize this outstanding commun… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: ISCA 2024

  20. arXiv:2309.14396  [pdf, other

    cs.SE cs.LG cs.PL

    Guess & Sketch: Language Model Guided Transpilation

    Authors: Celine Lee, Abdulrahman Mahmoud, Michal Kurek, Simone Campanoni, David Brooks, Stephen Chong, Gu-Yeon Wei, Alexander M. Rush

    Abstract: Maintaining legacy software requires many software and systems engineering hours. Assembly code programs, which demand low-level control over the computer machine state and have no variable names, are particularly difficult for humans to analyze. Existing conventional program translators guarantee correctness, but are hand-engineered for the source and target programming languages in question. Lea… ▽ More

    Submitted 15 March, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

  21. arXiv:2308.11992  [pdf

    q-bio.TO cs.AI

    Critical Evaluation of Artificial Intelligence as Digital Twin of Pathologist for Prostate Cancer Pathology

    Authors: Okyaz Eminaga, Mahmoud Abbas, Christian Kunder, Yuri Tolkach, Ryan Han, James D. Brooks, Rosalie Nolley, Axel Semjonow, Martin Boegemann, Robert West, Jin Long, Richard Fan, Olaf Bettendorf

    Abstract: Prostate cancer pathology plays a crucial role in clinical management but is time-consuming. Artificial intelligence (AI) shows promise in detecting prostate cancer and grading patterns. We tested an AI-based digital twin of a pathologist, vPatho, on 2,603 histology images of prostate tissue stained with hematoxylin and eosin. We analyzed various factors influencing tumor-grade disagreement betwee… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Under Review

  22. arXiv:2307.01753  [pdf, other

    astro-ph.CO cs.LG physics.comp-ph physics.data-an

    Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxies

    Authors: Mehdi Rezaie, Ashley J. Ross, Hee-Jong Seo, Hui Kong, Anna Porredon, Lado Samushia, Edmond Chaussidon, Alex Krolewski, Arnaud de Mattia, Florian Beutler, Jessica Nicole Aguilar, Steven Ahlen, Shadab Alam, Santiago Avila, Benedict Bahr-Kalus, Jose Bermejo-Climent, David Brooks, Todd Claybaugh, Shaun Cole, Kyle Dawson, Axel de la Macorra, Peter Doel, Andreu Font-Ribera, Jaime E. Forero-Romero, Satya Gontcho A Gontcho , et al. (24 additional authors not shown)

    Abstract: We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter $\fnl$. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range $0.2< z < 1.35$. We identify Galactic extinction, survey depth, and astronomical seeing as the… ▽ More

    Submitted 25 June, 2024; v1 submitted 4 July, 2023; originally announced July 2023.

    Comments: 21 pages, 17 figures, 7 tables (Appendix excluded). Published in MNRAS

  23. arXiv:2306.08162  [pdf, other

    cs.CL cs.AI cs.LG

    INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation

    Authors: Yuji Chai, John Gkountouras, Glenn G. Ko, David Brooks, Gu-Yeon Wei

    Abstract: We introduce a method that dramatically reduces fine-tuning VRAM requirements and rectifies quantization errors in quantized Large Language Models. First, we develop an extremely memory-efficient fine-tuning (EMEF) method for quantized models using Low-Rank Adaptation (LoRA), and drawing upon it, we construct an error-correcting algorithm designed to minimize errors induced by the quantization pro… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  24. arXiv:2306.06000  [pdf, other

    cs.AR cs.AI

    S$^{3}$: Increasing GPU Utilization during Generative Inference for Higher Throughput

    Authors: Yunho Jin, Chun-Feng Wu, David Brooks, Gu-Yeon Wei

    Abstract: Generating texts with a large language model (LLM) consumes massive amounts of memory. Apart from the already-large model parameters, the key/value (KV) cache that holds information about previous tokens in a sequence can grow to be even larger than the model itself. This problem is exacerbated in one of the current LLM serving frameworks which reserves the maximum sequence length of memory for th… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  25. arXiv:2305.03148  [pdf, other

    cs.AR cs.LG cs.NE

    CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device Learning

    Authors: Sai Qian Zhang, Thierry Tambe, Nestor Cuevas, Gu-Yeon Wei, David Brooks

    Abstract: On-device learning allows AI models to adapt to user data, thereby enhancing service quality on edge platforms. However, training AI on resource-limited devices poses significant challenges due to the demanding computing workload and the substantial memory consumption and data access required by deep neural networks (DNNs). To address these issues, we propose utilizing embedded dynamic random-acce… ▽ More

    Submitted 22 December, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

  26. arXiv:2305.01831  [pdf, other

    cs.AR

    Design Space Exploration and Optimization for Carbon-Efficient Extended Reality Systems

    Authors: Mariam Elgamal, Doug Carmean, Elnaz Ansari, Okay Zed, Ramesh Peri, Srilatha Manne, Udit Gupta, Gu-Yeon Wei, David Brooks, Gage Hills, Carole-Jean Wu

    Abstract: As computing hardware becomes more specialized, designing environmentally sustainable computing systems requires accounting for both hardware and software parameters. Our goal is to design low carbon computing systems while maintaining a competitive level of performance and operational efficiency. Despite previous carbon modeling efforts for computing systems, there is a distinct lack of holistic… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  27. arXiv:2304.00404  [pdf, other

    cs.DC cs.AR

    GreenScale: Carbon-Aware Systems for Edge Computing

    Authors: Young Geun Kim, Udit Gupta, Andrew McCrabb, Yonglak Son, Valeria Bertacco, David Brooks, Carole-Jean Wu

    Abstract: To improve the environmental implications of the growing demand of computing, future applications need to improve the carbon-efficiency of computing infrastructures. State-of-the-art approaches, however, do not consider the intermittent nature of renewable energy. The time and location-based carbon intensity of energy fueling computing has been ignored when determining how computation is carried o… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

  28. arXiv:2302.10872  [pdf, other

    cs.AR cs.IR cs.LG

    MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation

    Authors: Samuel Hsia, Udit Gupta, Bilge Acun, Newsha Ardalani, Pan Zhong, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

    Abstract: Deep learning recommendation systems serve personalized content under diverse tail-latency targets and input-query loads. In order to do so, state-of-the-art recommendation models rely on terabyte-scale embedding tables to learn user preferences over large bodies of contents. The reliance on a fixed embedding representation of embedding tables not only imposes significant memory capacity and bandw… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    ACM Class: C.1; H.0

  29. arXiv:2301.11273  [pdf, other

    cs.SI cs.LG

    AlignGraph: A Group of Generative Models for Graphs

    Authors: Kimia Shayestehfard, Dana Brooks, Stratis Ioannidis

    Abstract: It is challenging for generative models to learn a distribution over graphs because of the lack of permutation invariance: nodes may be ordered arbitrarily across graphs, and standard graph alignment is combinatorial and notoriously expensive. We propose AlignGraph, a group of generative models that combine fast and efficient graph alignment methods with a family of deep generative models that are… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: 12 pages, 2 figures, 4 tables

  30. arXiv:2301.10999  [pdf, other

    cs.LG cs.PF

    PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices

    Authors: Yuji Chai, Devashree Tripathy, Chuteng Zhou, Dibakar Gope, Igor Fedorov, Ramon Matas, David Brooks, Gu-Yeon Wei, Paul Whatmough

    Abstract: The ability to accurately predict deep neural network (DNN) inference performance metrics, such as latency, power, and memory footprint, for an arbitrary DNN on a target hardware platform is essential to the design of DNN based models. This ability is critical for the (manual or automatic) design, optimization, and deployment of practical DNNs for a specific hardware deployment platform. Unfortuna… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  31. arXiv:2301.10904  [pdf, other

    cs.CR cs.DC cs.LG

    GPU-based Private Information Retrieval for On-Device Machine Learning Inference

    Authors: Maximilian Lam, Jeff Johnson, Wenjie Xiong, Kiwan Maeng, Udit Gupta, Yang Li, Liangzhen Lai, Ilias Leontiadis, Minsoo Rhu, Hsien-Hsin S. Lee, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks, G. Edward Suh

    Abstract: On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the or… ▽ More

    Submitted 25 September, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

  32. arXiv:2212.00827  [pdf, other

    cs.LG cs.PF

    Architectural Implications of Embedding Dimension during GCN on CPU and GPU

    Authors: Matthew Adiletta, David Brooks, Gu-Yeon Wei

    Abstract: Graph Neural Networks (GNNs) are a class of neural networks designed to extract information from the graphical structure of data. Graph Convolutional Networks (GCNs) are a widely used type of GNN for transductive graph learning problems which apply convolution to learn information from graphs. GCN is a challenging algorithm from an architecture perspective due to inherent sparsity, low data reuse,… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: 10 pages, 7 figures

  33. arXiv:2209.14657  [pdf, other

    eess.IV cs.CV

    Correlated Feature Aggregation by Region Helps Distinguish Aggressive from Indolent Clear Cell Renal Cell Carcinoma Subtypes on CT

    Authors: Karin Stacke, Indrani Bhattacharya, Justin R. Tse, James D. Brooks, Geoffrey A. Sonn, Mirabela Rusu

    Abstract: Renal cell carcinoma (RCC) is a common cancer that varies in clinical behavior. Indolent RCC is often low-grade without necrosis and can be monitored without treatment. Aggressive RCC is often high-grade and can cause metastasis and death if not promptly detected and treated. While most kidney cancers are detected on CT scans, grading is based on histology from invasive biopsy or surgery. Determin… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: Submitted to Medical Image Analysis

  34. arXiv:2209.12127  [pdf, other

    cs.LG

    SpeedLimit: Neural Architecture Search for Quantized Transformer Models

    Authors: Yuji Chai, Luke Bailey, Yunho Jin, Matthew Karle, Glenn G. Ko, David Brooks, Gu-Yeon Wei, H. T. Kung

    Abstract: While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an u… ▽ More

    Submitted 13 October, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

  35. arXiv:2205.06437  [pdf, other

    cs.CR

    Impala: Low-Latency, Communication-Efficient Private Deep Learning Inference

    Authors: Woo-Seok Choi, Brandon Reagen, Gu-Yeon Wei, David Brooks

    Abstract: This paper proposes Impala, a new cryptographic protocol for private inference in the client-cloud setting. Impala builds upon recent solutions that combine the complementary strengths of homomorphic encryption (HE) and secure multi-party computation (MPC). A series of protocol optimizations are developed to reduce both communication and performance bottlenecks. First, we remove MPC's overwhelming… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

  36. arXiv:2205.03325  [pdf, other

    cs.AR cs.RO

    OMU: A Probabilistic 3D Occupancy Mapping Accelerator for Real-time OctoMap at the Edge

    Authors: Tianyu Jia, En-Yu Yang, Yu-Shun Hsiao, Jonathan Cruz, David Brooks, Gu-Yeon Wei, Vijay Janapa Reddi

    Abstract: Autonomous machines (e.g., vehicles, mobile robots, drones) require sophisticated 3D mapping to perceive the dynamic environment. However, maintaining a real-time 3D map is expensive both in terms of compute and memory requirements, especially for resource-constrained edge machines. Probabilistic OctoMap is a reliable and memory-efficient 3D dense map model to represent the full environment, with… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

    Comments: 2022 Design Automation and Test in Europe Conference (DATE), March 14-23, 2022, Virtual

  37. arXiv:2203.06732  [pdf, other

    q-bio.QM cs.CE q-bio.MN

    BioSimulators: a central registry of simulation engines and services for recommending specific tools

    Authors: Bilal Shaikh, Lucian P. Smith, Dan Vasilescu, Gnaneswara Marupilla, Michael Wilson, Eran Agmon, Henry Agnew, Steven S. Andrews, Azraf Anwar, Moritz E. Beber, Frank T. Bergmann, David Brooks, Lutz Brusch, Laurence Calzone, Kiri Choi, Joshua Cooper, John Detloff, Brian Drawert, Michel Dumontier, G. Bard Ermentrout, James R. Faeder, Andrew P. Freiburger, Fabian Fröhlich, Akira Funahashi, Alan Garny , et al. (46 additional authors not shown)

    Abstract: Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various subsystems and scales remain siloed by different software tools. For example, each tool must be executed through a distinct interface. To help investigators find… ▽ More

    Submitted 13 March, 2022; originally announced March 2022.

    Comments: 6 pages, 2 figures

  38. arXiv:2203.02833  [pdf, other

    cs.CR cs.AI

    Tabula: Efficiently Computing Nonlinear Activation Functions for Secure Neural Network Inference

    Authors: Maximilian Lam, Michael Mitzenmacher, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks

    Abstract: Multiparty computation approaches to secure neural network inference commonly rely on garbled circuits for securely executing nonlinear activation functions. However, garbled circuits require excessive communication between server and client, impose significant storage overheads, and incur large runtime penalties. To reduce these costs, we propose an alternative to garbled circuits: Tabula, an alg… ▽ More

    Submitted 16 June, 2024; v1 submitted 5 March, 2022; originally announced March 2022.

  39. Carbon Explorer: A Holistic Approach for Designing Carbon Aware Datacenters

    Authors: Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Kiwan Maeng, Manoj Chakkaravarthy, Udit Gupta, David Brooks, Carole-Jean Wu

    Abstract: Technology companies have been leading the way to a renewable energy transformation, by investing in renewable energy sources to reduce the carbon footprint of their datacenters. In addition to helping build new solar and wind farms, companies make power purchase agreements or purchase carbon offsets, rather than relying on renewable energy every hour of the day, every day of the week (24/7). Rely… ▽ More

    Submitted 21 February, 2023; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: Published at ASPLOS'23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

    ACM Class: C.0; B.0

  40. arXiv:2201.08603  [pdf, other

    cs.AR

    Trireme: Exploring Hierarchical Multi-Level Parallelism for Domain Specific Hardware Acceleration

    Authors: Georgios Zacharopoulos, Adel Ejjeh, Ying Jing, En-Yu Yang, Tianyu Jia, Iulian Brumar, Jeremy Intan, Muhammad Huzaifa, Sarita Adve, Vikram Adve, Gu-Yeon Wei, David Brooks

    Abstract: The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts of an application to accelerate in hardware and which to leave in software. Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

    Comments: 20 pages

  41. arXiv:2112.02164  [pdf, other

    eess.IV cs.CV

    Bridging the gap between prostate radiology and pathology through machine learning

    Authors: Indrani Bhattacharya, David S. Lim, Han Lin Aung, Xingchen Liu, Arun Seetharaman, Christian A. Kunder, Wei Shao, Simon J. C. Soerensen, Richard E. Fan, Pejman Ghanouni, Katherine J. To'o, James D. Brooks, Geoffrey A. Sonn, Mirabela Rusu

    Abstract: Prostate cancer is the second deadliest cancer for American men. While Magnetic Resonance Imaging (MRI) is increasingly used to guide targeted biopsies for prostate cancer diagnosis, its utility remains limited due to high rates of false positives and false negatives as well as low inter-reader agreements. Machine learning methods to detect and localize cancer on prostate MRI can help standardize… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Comments: Indrani Bhattacharya and David S. Lim contributed equally as first authors. Geoffrey A. Sonn and Mirabela Rusu contributed equally as senior authors

  42. arXiv:2111.09222  [pdf, other

    cs.AR

    Early DSE and Automatic Generation of Coarse Grained Merged Accelerators

    Authors: Iulian Brumar, Georgios Zacharopoulos, Yuan Yao, Saketh Rama, Gu-Yeon Wei, David Brooks

    Abstract: Post-Moore's law area-constrained systems rely on accelerators to deliver performance enhancements. Coarse grained accelerators can offer substantial domain acceleration, but manual, ad-hoc identification of code to accelerate is prohibitively expensive. Because cycle-accurate simulators and high-level synthesis flows are so time-consuming, manual creation of high-utilization accelerators that exp… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

  43. arXiv:2111.04807  [pdf, ps, other

    eess.IV cs.CV cs.LG

    Unsupervised Approaches for Out-Of-Distribution Dermoscopic Lesion Detection

    Authors: Max Torop, Sandesh Ghimire, Wenqian Liu, Dana H. Brooks, Octavia Camps, Milind Rajadhyaksha, Jennifer Dy, Kivanc Kose

    Abstract: There are limited works showing the efficacy of unsupervised Out-of-Distribution (OOD) methods on complex medical data. Here, we present preliminary findings of our unsupervised OOD detection algorithm, SimCLR-LOF, as well as a recent state of the art approach (SSD), applied on medical images. SimCLR-LOF learns semantically meaningful features using SimCLR and uses LOF for scoring if a test sample… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: NeurIPS: Medical Imaging Meets NeurIPS Workshop

  44. arXiv:2111.00364  [pdf, other

    cs.LG cs.AI cs.AR

    Sustainable AI: Environmental Implications, Challenges and Opportunities

    Authors: Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga Behram, James Huang, Charles Bai, Michael Gschwind, Anurag Gupta, Myle Ott, Anastasia Melnikov, Salvatore Candido, David Brooks, Geeta Chauhan, Benjamin Lee, Hsien-Hsin S. Lee, Bugra Akyildiz, Maximilian Balandat, Joe Spisak, Ravi Jain, Mike Rabbat, Kim Hazelwood

    Abstract: This paper explores the environmental impact of the super-linear growth trends for AI from a holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the carbon footprint of AI computing by examining the model development cycle across industry-scale machine learning use cases and, at the same time, considering the life cycle of system hardware. Taking a step further, w… ▽ More

    Submitted 9 January, 2022; v1 submitted 30 October, 2021; originally announced November 2021.

  45. arXiv:2110.12392  [pdf, other

    q-bio.NC cs.LG

    Variation is the Norm: Brain State Dynamics Evoked By Emotional Video Clips

    Authors: Ashutosh Singh, Christiana Westlin, Hedwig Eisenbarth, Elizabeth A. Reynolds Losin, Jessica R. Andrews-Hanna, Tor D. Wager, Ajay B. Satpute, Lisa Feldman Barrett, Dana H. Brooks, Deniz Erdogmus

    Abstract: For the last several decades, emotion research has attempted to identify a "biomarker" or consistent pattern of brain activity to characterize a single category of emotion (e.g., fear) that will remain consistent across all instances of that category, regardless of individual and context. In this study, we investigated variation rather than consistency during emotional experiences while people wat… ▽ More

    Submitted 24 October, 2021; originally announced October 2021.

  46. arXiv:2109.01188  [pdf, other

    cs.ET cs.AR

    NVMExplorer: A Framework for Cross-Stack Comparisons of Embedded Non-Volatile Memories

    Authors: Lillian Pentecost, Alexander Hankin, Marco Donato, Mark Hempstead, Gu-Yeon Wei, David Brooks

    Abstract: Repeated off-chip memory accesses to DRAM drive up operating power for data-intensive applications, and SRAM technology scaling and leakage power limits the efficiency of embedded memories. Future on-chip storage will need higher density and energy efficiency, and the actively expanding field of emerging, embeddable non-volatile memory (eNVM) technologies is providing many potential candidates to… ▽ More

    Submitted 11 January, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

    Comments: 18 pages, 14 figures, 3 tables

    ACM Class: B.3; I.6

  47. arXiv:2106.11757  [pdf, other

    cs.DC

    Application-driven Design Exploration for Dense Ferroelectric Embedded Non-volatile Memories

    Authors: Mohammad Mehdi Sharifi, Lillian Pentecost, Ramin Rajaei, Arman Kazemi, Qiuwen Lou, Gu-Yeon Wei, David Brooks, Kai Ni, X. Sharon Hu, Michael Niemier, Marco Donato

    Abstract: The memory wall bottleneck is a key challenge across many data-intensive applications. Multi-level FeFET-based embedded non-volatile memories are a promising solution for denser and more energy-efficient on-chip memory. However, reliable multi-level cell storage requires careful optimizations to minimize the design overhead costs. In this work, we investigate the interplay between FeFET device cha… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted at ISLPED 2021

  48. arXiv:2106.06089  [pdf, other

    cs.CR cs.AI

    Gradient Disaggregation: Breaking Privacy in Federated Learning by Reconstructing the User Participant Matrix

    Authors: Maximilian Lam, Gu-Yeon Wei, David Brooks, Vijay Janapa Reddi, Michael Mitzenmacher

    Abstract: We show that aggregated model updates in federated learning may be insecure. An untrusted central server may disaggregate user updates from sums of updates across participants given repeated observations, enabling the server to recover privileged information about individual users' private training data via traditional gradient inference attacks. Our method revolves around reconstructing participa… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: ICML 2021

  49. arXiv:2105.12882  [pdf, other

    cs.RO

    MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles

    Authors: Yu-Shun Hsiao, Zishen Wan, Tianyu Jia, Radhika Ghosal, Abdulrahman Mahmoud, Arijit Raychowdhury, David Brooks, Gu-Yeon Wei, Vijay Janapa Reddi

    Abstract: Safety and resilience are critical for autonomous unmanned aerial vehicles (UAVs). We introduce MAVFI, the micro aerial vehicles (MAVs) resilience analysis methodology to assess the effect of silent data corruption (SDC) on UAVs' mission metrics, such as flight time and success rate, for accurately measuring system resilience. To enhance the safety and resilience of robot systems bound by size, we… ▽ More

    Submitted 30 January, 2023; v1 submitted 26 May, 2021; originally announced May 2021.

    Comments: 6 pages, 9 figures; The first two authors have equal contributions; Accepted as a conference paper in DATE 2023

  50. arXiv:2105.08820  [pdf, other

    cs.AR cs.AI cs.DC

    RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

    Authors: Udit Gupta, Samuel Hsia, Jeff Zhang, Mark Wilkening, Javin Pombra, Hsien-Hsin S. Lee, Gu-Yeon Wei, Carole-Jean Wu, David Brooks

    Abstract: Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and high system loads. This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance. Central to RecPipe is decomposing recommendation models into multi-stage pipelines to maintain quality while reducing compute complexity and exposing… ▽ More

    Submitted 22 May, 2021; v1 submitted 18 May, 2021; originally announced May 2021.