Skip to main content

Showing 1–11 of 11 results for author: Damani, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.29010  [pdf, ps, other

    cs.LG cs.AI

    Improving Efficiency of GPU Kernel Optimization Agents using a Domain-Specific Language and Speed-of-Light Guidance

    Authors: Siva Kumar Sastry Hari, Vignesh Balaji, Sana Damani, Qijing Huang, Christos Kozyrakis

    Abstract: Optimizing GPU kernels with LLM agents is an iterative process over a large design space. Every candidate must be generated, compiled, validated, and profiled, so fewer trials will save both runtime and cost. We make two key observations. First, the abstraction level that agents operate at is important. If it is too low, the LLM wastes reasoning on low-impact details. If it is too high, it may mis… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

  2. arXiv:2603.19173  [pdf, ps, other

    cs.LG cs.AI

    SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

    Authors: Edward Lin, Sahil Modi, Siva Kumar Sastry Hari, Qijing Huang, Zhifan Ye, Nestor Qin, Fengzhe Zhou, Yuan Zhang, Jingquan Wang, Sana Damani, Dheeraj Peri, Ouye Xie, Aditya Kane, Moshe Maor, Michael Behar, Triston Cao, Rishabh Mehta, Vartika Singh, Vikram Sharma Mailthody, Terry Chen, Zihao Ye, Hanfeng Chen, Tianqi Chen, Vinod Grover, Wei Chen , et al. (8 additional authors not shown)

    Abstract: As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, a… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  3. arXiv:2602.14293  [pdf, ps, other

    cs.LG cs.AI

    KernelBlaster: Continual Cross-Task CUDA Optimization via Memory-Augmented In-Context Reinforcement Learning

    Authors: Kris Shengjun Dong, Sahil Modi, Dima Nikiforov, Sana Damani, Edward Lin, Siva Kumar Sastry Hari, Christos Kozyrakis

    Abstract: Optimizing CUDA code across multiple generations of GPU architectures is challenging, as achieving peak performance requires an extensive exploration of an increasingly complex, hardware-specific optimization space. Traditional compilers are constrained by fixed heuristics, whereas finetuning Large Language Models (LLMs) can be expensive. However, agentic workflows for CUDA code optimization have… ▽ More

    Submitted 15 February, 2026; originally announced February 2026.

    Comments: 15 pages, 33 pages with appendix

  4. arXiv:2511.12294  [pdf, ps, other

    cs.SE

    ProofWright: Towards Agentic Formal Verification of CUDA

    Authors: Bodhisatwa Chatterjee, Drew Zagieboylo, Sana Damani, Siva Hari, Christos Kozyrakis

    Abstract: Large Language Models (LLMs) are increasingly used to automatically generate optimized CUDA kernels, substantially improving developer productivity. However, despite rapid generation, these kernels often contain subtle correctness bugs and lack formal safety guarantees. Runtime testing is inherently unreliable - limited input coverage and reward hacking can mask incorrect behavior - while manual f… ▽ More

    Submitted 18 March, 2026; v1 submitted 15 November, 2025; originally announced November 2025.

  5. arXiv:2502.17780  [pdf

    cs.DC eess.SY

    GPUArmor: A Hardware-Software Co-design for Efficient and Scalable Memory Safety on GPUs

    Authors: Mohamed Tarek Ibn Ziad, Sana Damani, Mark Stephenson, Stephen W. Keckler, Aamer Jaleel

    Abstract: Memory safety errors continue to pose a significant threat to current computing systems, and graphics processing units (GPUs) are no exception. A prominent class of memory safety algorithms is allocation-based solutions. The key idea is to maintain each allocation's metadata (base address and size) in a disjoint table and retrieve it at runtime to verify memory accesses. While several previous sol… ▽ More

    Submitted 25 February, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

    Comments: arXiv version of submission

  6. arXiv:2407.06549  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    AutoTask: Task Aware Multi-Faceted Single Model for Multi-Task Ads Relevance

    Authors: Shouchang Guo, Sonam Damani, Keng-hao Chang

    Abstract: Ads relevance models are crucial in determining the relevance between user search queries and ad offers, often framed as a classification problem. The complexity of modeling increases significantly with multiple ad types and varying scenarios that exhibit both similarities and differences. In this work, we introduce a novel multi-faceted attention model that performs task aware feature combination… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  7. arXiv:2406.19486  [pdf, other

    cs.CL cs.AI cs.ET cs.LG eess.SP

    LoPT: Low-Rank Prompt Tuning for Parameter Efficient Language Models

    Authors: Shouchang Guo, Sonam Damani, Keng-hao Chang

    Abstract: In prompt tuning, a prefix or suffix text is added to the prompt, and the embeddings (soft prompts) or token indices (hard prompts) of the prefix/suffix are optimized to gain more control over language models for specific tasks. This approach eliminates the need for hand-crafted prompt engineering or explicit model fine-tuning. Prompt tuning is significantly more parameter-efficient than model fin… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  8. arXiv:2302.08687  [pdf, other

    cs.AR cs.AI cs.LG

    VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs

    Authors: Geonhwa Jeong, Sana Damani, Abhimanyu Rajeshkumar Bambhaniya, Eric Qin, Christopher J. Hughes, Sreenivas Subramoney, Hyesoon Kim, Tushar Krishna

    Abstract: Deep Learning (DL) acceleration support in CPUs has recently gained a lot of traction, with several companies (Arm, Intel, IBM) announcing products with specialized matrix engines accessible via GEMM instructions. CPUs are pervasive and need to handle diverse requirements across DL workloads running in edge/HPC/cloud platforms. Therefore, as DL workloads embrace sparsity to reduce the computations… ▽ More

    Submitted 23 February, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: This paper is accepted to HPCA 2023

  9. arXiv:1907.02014  [pdf, other

    cs.OH cs.CV cs.CY

    Using AI for Economic Upliftment of Handicraft Industry

    Authors: Nitya Raviprakash, Sonam Damani, Ankush Chatterjee, Meghana Joshi, Puneet Agrawal

    Abstract: The handicraft industry is a strong pillar of Indian economy which provides large-scale employment opportunities to artisans in rural and underprivileged communities. However, in this era of globalization, diverse modern designs have rendered traditional designs old and monotonous, causing an alarming decline of handicraft sales. For this age-old industry to survive the global competition, it is i… ▽ More

    Submitted 31 May, 2019; originally announced July 2019.

  10. arXiv:1811.08759  [pdf, other

    cs.AI

    Using AI to Design Stone Jewelry

    Authors: Khyatti Gupta, Sonam Damani, Kedhar Nath Narahari

    Abstract: Jewelry has been an integral part of human culture since ages. One of the most popular styles of jewelry is created by putting together precious and semi-precious stones in diverse patterns. While technology is finding its way in the production process of such jewelry, designing it remains a time-consuming and involved task. In this paper, we propose a unique approach using optimization methods co… ▽ More

    Submitted 21 November, 2018; originally announced November 2018.

  11. arXiv:1810.12097  [pdf, other

    cs.CL

    Ruuh: A Deep Learning Based Conversational Social Agent

    Authors: Sonam Damani, Nitya Raviprakash, Umang Gupta, Ankush Chatterjee, Meghana Joshi, Khyatti Gupta, Kedhar Nath Narahari, Puneet Agrawal, Manoj Kumar Chinnakotla, Sneha Magapu, Abhishek Mathur

    Abstract: Dialogue systems and conversational agents are becoming increasingly popular in the modern society but building an agent capable of holding intelligent conversation with its users is a challenging problem for artificial intelligence. In this demo, we demonstrate a deep learning based conversational social agent called "Ruuh" (facebook.com/Ruuh) designed by a team at Microsoft India to converse on… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.

    Comments: 2 pages, 1 figure