Skip to main content

Showing 1–50 of 80 results for author: Agarwal, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2602.22271  [pdf, ps, other

    cs.LG math.PR math.ST

    Support Tokens, Stability Margins, and a New Foundation for Robust LLMs

    Authors: Deepak Agarwal, Dhyey Dharmendrakumar Mavani, Suyash Gupta, Karthik Sethuraman, Tejas Dharamsi

    Abstract: Self-attention is usually described as a flexible, content-adaptive way to mix a token with information from its past. We reinterpret causal self-attention transformers, the backbone of modern foundation models, within a probabilistic framework, much as classical PCA is extended to probabilistic PCA. This reformulation reveals a key structural consequence of the underlying change of variables: a b… ▽ More

    Submitted 21 March, 2026; v1 submitted 25 February, 2026; originally announced February 2026.

    Comments: 45 pages, 9 figures

    ACM Class: I.2.7; G.3; G.4

  2. arXiv:2601.21386  [pdf, ps, other

    cs.SD cs.AI

    Understanding Frechet Speech Distance for Synthetic Speech Quality Evaluation

    Authors: June-Woo Kim, Dhruv Agarwal, Federica Cerina

    Abstract: Objective evaluation of synthetic speech quality remains a critical challenge. Human listening tests are the gold standard, but costly and impractical at scale. Fréchet Distance has emerged as a promising alternative, yet its reliability depends heavily on the choice of embeddings and experimental settings. In this work, we comprehensively evaluate Fréchet Speech Distance (FSD) and its variant Spe… ▽ More

    Submitted 29 January, 2026; originally announced January 2026.

    Comments: accepted to ICASSP 2026

  3. arXiv:2511.18728  [pdf

    cs.LG

    Reinforcement Learning for Self-Healing Material Systems

    Authors: Maitreyi Chatterjee, Devansh Agarwal, Biplab Chatterjee

    Abstract: The transition to autonomous material systems necessitates adaptive control methodologies to maximize structural longevity. This study frames the self-healing process as a Reinforcement Learning (RL) problem within a Markov Decision Process (MDP), enabling agents to autonomously derive optimal policies that efficiently balance structural integrity maintenance against finite resource consumption. A… ▽ More

    Submitted 23 November, 2025; originally announced November 2025.

    Comments: Accepted to INCOM 2026. This is the camera-ready version

  4. arXiv:2511.18727  [pdf

    cs.LG

    LogSyn: A Few-Shot LLM Framework for Structured Insight Extraction from Unstructured General Aviation Maintenance Logs

    Authors: Devansh Agarwal, Maitreyi Chatterjee, Biplab Chatterjee

    Abstract: Aircraft maintenance logs hold valuable safety data but remain underused due to their unstructured text format. This paper introduces LogSyn, a framework that uses Large Language Models (LLMs) to convert these logs into structured, machine-readable data. Using few-shot in-context learning on 6,169 records, LogSyn performs Controlled Abstraction Generation (CAG) to summarize problem-resolution narr… ▽ More

    Submitted 7 February, 2026; v1 submitted 23 November, 2025; originally announced November 2025.

    Comments: Accepted in Proceedings of the 3rd INCOM 2026

  5. arXiv:2511.04550  [pdf

    cs.CR cs.LG

    Confidential Computing for Cloud Security: Exploring Hardware based Encryption Using Trusted Execution Environments

    Authors: Dhruv Deepak Agarwal, Aswani Kumar Cherukuri

    Abstract: The growth of cloud computing has revolutionized data processing and storage capacities to another levels of scalability and flexibility. But in the process, it has created a huge challenge of security, especially in terms of safeguarding sensitive data. Classical security practices, including encryption at rest and during transit, fail to protect data in use and expose it to various possible brea… ▽ More

    Submitted 6 November, 2025; originally announced November 2025.

  6. arXiv:2509.15516  [pdf, ps, other

    eess.AS cs.SD

    The Universal Personalizer: Few-Shot Dysarthric Speech Recognition via Meta-Learning

    Authors: Dhruuv Agarwal, Harry Zhang, Yang Yu, Quan Wang

    Abstract: Personalizing dysarthric ASR is hindered by demanding enrollment collection and per-user training. We propose a hybrid meta-training method for a single model, enabling zero-shot and few-shot on-the-fly personalization via in-context learning (ICL). On Euphonia, it achieves 13.9% Word Error Rate (WER), surpassing speaker-independent baselines (17.5%). On SAP Test-1, our 5.3% WER outperforms the ch… ▽ More

    Submitted 22 February, 2026; v1 submitted 18 September, 2025; originally announced September 2025.

  7. arXiv:2508.16626  [pdf, ps, other

    cs.CY

    Pothole Detection and Analysis System (PoDAS) for Real Time Data Using Sensor Networks

    Authors: Jinesh Mehta, Vinayak Mathur, Dhruv Agarwal, Atish Sharma, Krishna Prakasha

    Abstract: Potholes are a major nuisance on the city roads leading to several problems and losses in productivity. Local authorities have cited a lack of geographic localization of these potholes as one of the rate-limiting factors for repairs. This study proposes a novel low-cost wireless sensor-based end-to-end system called PoDAS (Pothole Detection and Analysis System) which can be deployed across major c… ▽ More

    Submitted 14 August, 2025; originally announced August 2025.

    Comments: Published in Journal of Engineering and Applied Sciences

    Journal ref: Journal of Engineering and Applied Sciences 12(12): 3090-3097, 2017

  8. arXiv:2508.12630  [pdf, ps, other

    cs.CL

    Semantic Anchoring in Agentic Memory: Leveraging Linguistic Structures for Persistent Conversational Context

    Authors: Maitreyi Chatterjee, Devansh Agarwal

    Abstract: Large Language Models (LLMs) have demonstrated impressive fluency and task competence in conversational settings. However, their effectiveness in multi-session and long-term interactions is hindered by limited memory persistence. Typical retrieval-augmented generation (RAG) systems store dialogue history as dense vectors, which capture semantic similarity but neglect finer linguistic structures su… ▽ More

    Submitted 18 August, 2025; originally announced August 2025.

    Comments: Paper is currently in peer review

  9. arXiv:2508.08641  [pdf, ps, other

    cs.LG cs.AI cs.CL

    MiGrATe: Mixed-Policy GRPO for Adaptation at Test-Time

    Authors: Peter Phan, Dhruv Agarwal, Kavitha Srinivas, Horst Samulowitz, Pavan Kapanipathi, Andrew McCallum

    Abstract: Large language models (LLMs) are increasingly being applied to black-box optimization tasks, from program synthesis to molecule design. Prior work typically leverages in-context learning to iteratively guide the model towards better solutions. Such methods, however, often struggle to balance exploration of new solution spaces with exploitation of high-reward ones. Recently, test-time training (TTT… ▽ More

    Submitted 12 August, 2025; originally announced August 2025.

  10. arXiv:2507.00310  [pdf, ps, other

    cs.LG cs.AI cs.CL

    AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise

    Authors: Dhruv Agarwal, Bodhisattwa Prasad Majumder, Reece Adamson, Megha Chakravorty, Satvika Reddy Gavireddy, Aditya Parashar, Harshit Surana, Bhavana Dalvi Mishra, Andrew McCallum, Ashish Sabharwal, Peter Clark

    Abstract: The promise of autonomous scientific discovery (ASD) hinges not only on answering questions, but also on knowing which questions to ask. Most recent works in ASD explore the use of large language models (LLMs) in goal-driven settings, relying on human-specified research questions to guide hypothesis generation. However, scientific discovery may be accelerated further by allowing the AI system to d… ▽ More

    Submitted 12 February, 2026; v1 submitted 30 June, 2025; originally announced July 2025.

    Comments: Accepted to NeurIPS 2025: https://neurips.cc/virtual/2025/loc/san-diego/poster/116398

  11. arXiv:2506.11007  [pdf, other

    cs.SE cs.AI

    Impact of Comments on LLM Comprehension of Legacy Code

    Authors: Rock Sabetto, Emily Escamilla, Devesh Agarwal, Sujay Kandwal, Justin F. Brunelle, Scott Rosen, Nitin Naik, Samruddhi Thaker, Eric O. Scott, Jacob Zimmer, Amit Madan, Arun Sridharan, Doug Wendt, Michael Doyle, Christopher Glasz, Jasper Phillips, William Macke, Colin Diggs, Michael Bartholf, Zachary Robin, Paul Ursino

    Abstract: Large language models (LLMs) have been increasingly integrated into software engineering and maintenance tasks due to their high performance with software engineering tasks and robust understanding of modern programming languages. However, the ability of LLMs to comprehend code written with legacy languages remains a research gap challenged by real-world legacy systems lacking or containing inaccu… ▽ More

    Submitted 23 April, 2025; originally announced June 2025.

  12. LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting

    Authors: Pai Zhu, Quan Wang, Dhruuv Agarwal, Kurt Partridge

    Abstract: Custom keyword spotting (KWS) allows detecting user-defined spoken keywords from streaming audio. This is achieved by comparing the embeddings from voice enrollments and input audio. State-of-the-art custom KWS models are typically trained contrastively using utterances whose keywords are randomly sampled from training dataset. These KWS models often struggle with confusing keywords, such as "blue… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Journal ref: Proc. Interspeech 2025, 2675-2679

  13. arXiv:2505.21548  [pdf, ps, other

    cs.CL cs.AI cs.CY physics.soc-ph

    Fluent but Foreign: Even Regional LLMs Lack Cultural Alignment

    Authors: Dhruv Agarwal, Anya Shukla, Sunayana Sitaram, Aditya Vashistha

    Abstract: Large language models (LLMs) are used worldwide, yet exhibit Western cultural tendencies. Many countries are now building ``regional'' or ``sovereign'' LLMs, but it remains unclear whether they reflect local values and practices or merely speak local languages. Using India as a case study, we evaluate six Indic and six global LLMs on two dimensions -- values and practices -- grounded in nationally… ▽ More

    Submitted 22 January, 2026; v1 submitted 24 May, 2025; originally announced May 2025.

    Comments: Under review

  14. arXiv:2505.18878  [pdf, other

    cs.CL cs.AI

    CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions

    Authors: Kung-Hsiang Huang, Akshara Prabhakar, Onkar Thorat, Divyansh Agarwal, Prafulla Kumar Choubey, Yixin Mao, Silvio Savarese, Caiming Xiong, Chien-Sheng Wu

    Abstract: While AI agents hold transformative potential in business, effective performance benchmarking is hindered by the scarcity of public, realistic business data on widely used platforms. Existing benchmarks often lack fidelity in their environments, data, and agent-user interactions, with limited coverage of diverse business scenarios and industries. To address these gaps, we introduce CRMArena-Pro, a… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  15. GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples

    Authors: Harry Zhang, Kurt Partridge, Pai Zhu, Neng Chen, Hyun Jin Park, Dhruuv Agarwal, Quan Wang

    Abstract: Spoken Keyword Spotting (KWS) is the task of distinguishing between the presence and absence of a keyword in audio. The accuracy of a KWS model hinges on its ability to correctly classify examples close to the keyword and non-keyword boundary. These boundary examples are often scarce in training data, limiting model performance. In this paper, we propose a method to systematically generate adversa… ▽ More

    Submitted 24 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted at Interspeech 2025

    Journal ref: Proc. Interspeech 2025, 2680-2684

  16. arXiv:2505.09819  [pdf, other

    cs.HC cs.CV cs.LG eess.SY

    Visual Feedback of Pattern Separability Improves Myoelectric Decoding Performance of Upper Limb Prostheses

    Authors: Ruichen Yang, György M. Lévay, Christopher L. Hunt, Dániel Czeiner, Megan C. Hodgson, Damini Agarwal, Rahul R. Kaliki, Nitish V. Thakor

    Abstract: State-of-the-art upper limb myoelectric prostheses often use pattern recognition (PR) control systems that translate electromyography (EMG) signals into desired movements. As prosthesis movement complexity increases, users often struggle to produce sufficiently distinct EMG patterns for reliable classification. Existing training typically involves heuristic, trial-and-error user adjustments to sta… ▽ More

    Submitted 15 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

  17. arXiv:2504.12417  [pdf

    cs.AI

    Interpretable AI-driven Guidelines for Type 2 Diabetes Treatment from Observational Data

    Authors: Dewang Kumar Agarwal, Dimitris J. Bertsimas

    Abstract: Objective: Create precise, structured, data-backed guidelines for type 2 diabetes treatment progression, suitable for clinical adoption. Research Design and Methods: Our training cohort was composed of patient (with type 2 diabetes) visits from Boston Medical Center (BMC) from 1998 to 2014. We divide visits into 4 groups based on the patient's treatment regimen before the visit, and further divi… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  18. arXiv:2503.06550  [pdf, other

    cs.CL

    BingoGuard: LLM Content Moderation Tools with Risk Levels

    Authors: Fan Yin, Philippe Laban, Xiangyu Peng, Yilun Zhou, Yixin Mao, Vaibhav Vats, Linnea Ross, Divyansh Agarwal, Caiming Xiong, Chien-Sheng Wu

    Abstract: Malicious content generated by large language models (LLMs) can pose varying degrees of harm. Although existing LLM-based moderators can detect harmful content, they struggle to assess risk levels and may miss lower-risk outputs. Accurate risk assessment allows platforms with different safety thresholds to tailor content filtering and rejection. In this paper, we introduce per-topic severity rubri… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 10 pages, 4 figures, 4 tables. ICLR 2025 poster

  19. arXiv:2411.09969  [pdf, ps, other

    cs.HC cs.AI

    Steering AI-Driven Personalization of Scientific Text for General Audiences

    Authors: Taewook Kim, Dhruv Agarwal, Jordan Ackerman, Manaswi Saha

    Abstract: Digital media platforms (e.g., science blogs) offer opportunities to communicate scientific content to general audiences at scale. However, these audiences vary in their scientific expertise, literacy levels, and personal backgrounds, making effective science communication challenging. To address this challenge, we designed TranSlider, an AI-powered tool that generates personalized translations of… ▽ More

    Submitted 9 August, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

    Comments: 28 pages, 7 figures, 1 table. Accepted to PACM HCI (CSCW 2025)

  20. arXiv:2410.23252  [pdf, other

    cs.CL

    Evaluating Cultural and Social Awareness of LLM Web Agents

    Authors: Haoyi Qiu, Alexander R. Fabbri, Divyansh Agarwal, Kung-Hsiang Huang, Sarah Tan, Nanyun Peng, Chien-Sheng Wu

    Abstract: As large language models (LLMs) expand into performing as agents for real-world applications beyond traditional NLP tasks, evaluating their robustness becomes increasingly important. However, existing benchmarks often overlook critical dimensions like cultural and social awareness. To address these, we introduce CASA, a benchmark designed to assess LLM agents' sensitivity to cultural and social no… ▽ More

    Submitted 8 March, 2025; v1 submitted 30 October, 2024; originally announced October 2024.

    Comments: NAACL 2025 Findings

  21. GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting

    Authors: Pai Zhu, Jacob W. Bartel, Dhruuv Agarwal, Kurt Partridge, Hyun Jin Park, Quan Wang

    Abstract: We propose GE2E-KWS -- a generalized end-to-end training and evaluation framework for customized keyword spotting. Specifically, enrollment utterances are separated and grouped by keywords from the training batch and their embedding centroids are compared to all other test utterance embeddings to compute the loss. This simulates runtime enrollment and verification stages, and improves convergence… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 8 pages, 6 figures, 2 tables The paper is accepted in IEEE Spoken Language Technology (SLT) 2024

    Journal ref: 2024 IEEE Spoken Language Technology Workshop (SLT)

  22. arXiv:2410.07168  [pdf, other

    cs.CL cs.SD eess.AS

    Sylber: Syllabic Embedding Representation of Speech from Raw Audio

    Authors: Cheol Jun Cho, Nicholas Lee, Akshat Gupta, Dhruv Agarwal, Ethan Chen, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Syllables are compositional units of spoken language that efficiently structure human speech perception and production. However, current neural speech representations lack such structure, resulting in dense token sequences that are costly to process. To bridge this gap, we propose a new model, Sylber, that produces speech representations with clean and robust syllabic structure. Specifically, we p… ▽ More

    Submitted 2 March, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: Accepted at ICLR 2025

  23. ViDAS: Vision-based Danger Assessment and Scoring

    Authors: Pranav Gupta, Advith Krishnan, Naman Nanda, Ananth Eswar, Deeksha Agarwal, Pratham Gohil, Pratyush Goel

    Abstract: We present a novel dataset aimed at advancing danger analysis and assessment by addressing the challenge of quantifying danger in video content and identifying how human-like a Large Language Model (LLM) evaluator is for the same. This is achieved by compiling a collection of 100 YouTube videos featuring various events. Each video is annotated by human participants who provided danger ratings on a… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: Preprint

  24. AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances

    Authors: Dhruv Agarwal, Mor Naaman, Aditya Vashistha

    Abstract: Large language models (LLMs) are being increasingly integrated into everyday products and services, such as coding tools and writing assistants. As these embedded AI applications are deployed globally, there is a growing concern that the AI models underlying these applications prioritize Western values. This paper investigates what happens when a Western-centric AI model provides writing suggestio… ▽ More

    Submitted 12 March, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: Accepted at CHI 2025

  25. Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting

    Authors: Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang

    Abstract: The keyword spotting (KWS) problem requires large amounts of real speech training data to achieve high accuracy across diverse populations. Utilizing large amounts of text-to-speech (TTS) synthesized data can reduce the cost and time associated with KWS development. However, TTS data may contain artifacts not present in real speech, which the KWS model can exploit (overfit), leading to degraded ac… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: to be published in a Workshop at Interspeech 2024, Synthetic Data's Transformative Role in Foundational Speech Models

    Journal ref: Proc. Synthetic Data's Transformative Role in Foundational Speech Models 2024, 86-90

  26. Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model

    Authors: Hyun Jin Park, Dhruuv Agarwal, Neng Chen, Rentao Sun, Kurt Partridge, Justin Chen, Harry Zhang, Pai Zhu, Jacob Bartel, Kyle Kastner, Gary Wang, Andrew Rosenberg, Quan Wang

    Abstract: This paper explores the use of TTS synthesized training data for KWS (keyword spotting) task while minimizing development cost and time. Keyword spotting models require a huge amount of training data to be accurate, and obtaining such training data can be costly. In the current state of the art, TTS models can generate large amounts of natural-sounding data, which can help reducing cost and time f… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: to be published in a Workshop at Interspeech 2024, Synthetic Data's Transformative Role in Foundational Speech Models

    Journal ref: Proc. Synthetic Data's Transformative Role in Foundational Speech Models 2024, 16-20

  27. Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments

    Authors: Pai Zhu, Dhruuv Agarwal, Jacob W. Bartel, Kurt Partridge, Hyun Jin Park, Quan Wang

    Abstract: One of the challenges in developing a high quality custom keyword spotting (KWS) model is the lengthy and expensive process of collecting training data covering a wide range of languages, phrases and speaking styles. We introduce Synth4Kws - a framework to leverage Text to Speech (TTS) synthesized data for custom KWS in different resource settings. With no real data, we found increasing TTS phrase… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 5 pages, 5 figures, 2 tables The paper is accepted in Interspeech SynData4GenAI 2024 Workshop - https://syndata4genai.org/#call-for-papers

    Journal ref: Proc. Synthetic Data's Transformative Role in Foundational Speech Models 2024, 11-15

  28. arXiv:2407.01725  [pdf, other

    cs.CL cs.AI cs.LG

    DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

    Authors: Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark

    Abstract: Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. The benchmark is designed to systemat… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Website: https://github.com/allenai/discoverybench

  29. arXiv:2406.12998  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Coding Speech through Vocal Tract Kinematics

    Authors: Cheol Jun Cho, Peter Wu, Tejas S. Prabhune, Dhruv Agarwal, Gopala K. Anumanchipalli

    Abstract: Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech -- Speech Articulatory Coding (SPARC). SPARC co… ▽ More

    Submitted 14 December, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Journal ref: IEEE Journal of Selected Topics in Signal Processing, vol. 18, no. 8, pp. 1427-1440, Dec. 2024

  30. arXiv:2406.10750  [pdf, other

    cs.HC

    EchoGuide: Active Acoustic Guidance for LLM-Based Eating Event Analysis from Egocentric Videos

    Authors: Vineet Parikh, Saif Mahmud, Devansh Agarwal, Ke Li, François Guimbretière, Cheng Zhang

    Abstract: Self-recording eating behaviors is a step towards a healthy lifestyle recommended by many health professionals. However, the current practice of manually recording eating activities using paper records or smartphone apps is often unsustainable and inaccurate. Smart glasses have emerged as a promising wearable form factor for tracking eating behaviors, but existing systems primarily identify when e… ▽ More

    Submitted 31 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted at ISWC '24

  31. SonicID: User Identification on Smart Glasses with Acoustic Sensing

    Authors: Ke Li, Devansh Agarwal, Ruidong Zhang, Vipin Gunda, Tianjun Mo, Saif Mahmud, Boao Chen, François Guimbretière, Cheng Zhang

    Abstract: Smart glasses have become more prevalent as they provide an increasing number of applications for users. They store various types of private information or can access it via connections established with other devices. Therefore, there is a growing need for user identification on smart glasses. In this paper, we introduce a low-power and minimally-obtrusive system called SonicID, designed to authen… ▽ More

    Submitted 24 October, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 27 pages, 6 tables, 9 figures

  32. MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses

    Authors: Saif Mahmud, Devansh Agarwal, Ashwin Ajit, Qikang Liang, Thalia Viranda, Francois Guimbretiere, Cheng Zhang

    Abstract: We introduce MunchSonic, an AI-powered active acoustic sensing system integrated into eyeglasses to track fine-grained dietary actions. MunchSonic emits inaudible ultrasonic waves from the eyeglass frame, with the reflected signals capturing detailed positions and movements of body parts, including the mouth, jaw, arms, and hands involved in eating. These signals are processed by a deep learning p… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: 8 pages, 7 figures

  33. arXiv:2405.20254  [pdf, other

    cs.HC cs.CY

    Conversational Agents to Facilitate Deliberation on Harmful Content in WhatsApp Groups

    Authors: Dhruv Agarwal, Farhana Shahid, Aditya Vashistha

    Abstract: WhatsApp groups have become a hotbed for the propagation of harmful content including misinformation, hate speech, polarizing content, and rumors, especially in Global South countries. Given the platform's end-to-end encryption, moderation responsibilities lie on group admins and members, who rarely contest such content. Another approach is fact-checking, which is unscalable, and can only contest… ▽ More

    Submitted 16 August, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted at CSCW 2024

  34. arXiv:2404.16251  [pdf, other

    cs.CR cs.AI cs.CL

    Prompt Leakage effect and defense strategies for multi-turn LLM interactions

    Authors: Divyansh Agarwal, Alexander R. Fabbri, Ben Risher, Philippe Laban, Shafiq Joty, Chien-Sheng Wu

    Abstract: Prompt leakage poses a compelling security and privacy threat in LLM applications. Leakage of system prompts may compromise intellectual property, and act as adversarial reconnaissance for an attacker. A systematic evaluation of prompt leakage threats and mitigation strategies is lacking, especially for multi-turn LLM interactions. In this paper, we systematically investigate LLM vulnerabilities a… ▽ More

    Submitted 29 July, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  35. arXiv:2404.13924  [pdf, other

    cs.HC cs.ET

    ActSonic: Recognizing Everyday Activities from Inaudible Acoustic Wave Around the Body

    Authors: Saif Mahmud, Vineet Parikh, Qikang Liang, Ke Li, Ruidong Zhang, Ashwin Ajit, Vipin Gunda, Devansh Agarwal, François Guimbretière, Cheng Zhang

    Abstract: We present ActSonic, an intelligent, low-power active acoustic sensing system integrated into eyeglasses that can recognize 27 different everyday activities (e.g., eating, drinking, toothbrushing) from inaudible acoustic waves around the body. It requires only a pair of miniature speakers and microphones mounted on each hinge of the eyeglasses to emit ultrasonic waves, creating an acoustic aura ar… ▽ More

    Submitted 25 November, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 8, Issue 4, November 2024, IMWUT/UbiComp 2025

  36. arXiv:2404.12980  [pdf, other

    cs.HC

    Ring-a-Pose: A Ring for Continuous Hand Pose Tracking

    Authors: Tianhong Catherine Yu, Guilin Hu, Ruidong Zhang, Hyunchul Lim, Saif Mahmud, Chi-Jung Lee, Ke Li, Devansh Agarwal, Shuyang Nie, Jinseok Oh, François Guimbretière, Cheng Zhang

    Abstract: We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three use… ▽ More

    Submitted 11 November, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  37. arXiv:2404.12541  [pdf, other

    cs.CV

    GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models

    Authors: Sai Sree Harsha, Ambareesh Revanur, Dhwanit Agarwal, Shradha Agrawal

    Abstract: Video editing methods based on diffusion models that rely solely on a text prompt for the edit are hindered by the limited expressive power of text prompts. Thus, incorporating a reference target image as a visual guide becomes desirable for precise control over edit. Also, most existing methods struggle to accurately edit a video when the shape and size of the object in the target image differ fr… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: CVPRw 2024

  38. arXiv:2402.13610  [pdf, other

    cs.CL cs.AI cs.LG

    Data-driven Discovery with Large Generative Models

    Authors: Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Sanchaita Hazra, Ashish Sabharwal, Peter Clark

    Abstract: With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit the capabilities of large generative models (LGMs) to develop automated systems for end-to-end data-driven discovery -- a paradigm encompassing the search and verification of hypotheses purely from a se… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  39. EchoWrist: Continuous Hand Pose Tracking and Hand-Object Interaction Recognition Using Low-Power Active Acoustic Sensing On a Wristband

    Authors: Chi-Jung Lee, Ruidong Zhang, Devansh Agarwal, Tianhong Catherine Yu, Vipin Gunda, Oliver Lopez, James Kim, Sicheng Yin, Boao Dong, Ke Li, Mose Sakashita, Francois Guimbretiere, Cheng Zhang

    Abstract: Our hands serve as a fundamental means of interaction with the world around us. Therefore, understanding hand poses and interaction context is critical for human-computer interaction. We present EchoWrist, a low-power wristband that continuously estimates 3D hand pose and recognizes hand-object interactions using active acoustic sensing. EchoWrist is equipped with two speakers emitting inaudible s… ▽ More

    Submitted 29 March, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  40. One Style Does Not Regulate All: Moderation Practices in Public and Private WhatsApp Groups

    Authors: Farhana Shahid, Dhruv Agarwal, Aditya Vashistha

    Abstract: WhatsApp is the largest social media platform in the Global South and is a virulent force in global misinformation and political propaganda. Due to end-to-end encryption WhatsApp can barely review any content and mostly rely on volunteer moderation by group admins. Yet, little is known about how WhatsApp group admins manage their groups, what factors and values influence moderation decisions, and… ▽ More

    Submitted 2 January, 2025; v1 submitted 15 January, 2024; originally announced January 2024.

  41. arXiv:2311.15516  [pdf, other

    eess.SY cs.AI cs.LG

    Active Foundational Models for Fault Diagnosis of Electrical Motors

    Authors: Sriram Anbalagan, Sai Shashank GP, Deepesh Agarwal, Balasubramaniam Natarajan, Babji Srinivasan

    Abstract: Fault detection and diagnosis of electrical motors are of utmost importance in ensuring the safe and reliable operation of several industrial systems. Detection and diagnosis of faults at the incipient stage allows corrective actions to be taken in order to reduce the severity of faults. The existing data-driven deep learning approaches for machine fault diagnosis rely extensively on huge amounts… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: 30 pages, 2 figures, 7 tables

  42. arXiv:2311.15301  [pdf

    eess.IV cs.CV cs.LG

    Eye Disease Prediction using Ensemble Learning and Attention on OCT Scans

    Authors: Gauri Naik, Nandini Narvekar, Dimple Agarwal, Nishita Nandanwar, Himangi Pande

    Abstract: Eye diseases have posed significant challenges for decades, but advancements in technology have opened new avenues for their detection and treatment. Machine learning and deep learning algorithms have become instrumental in this domain, particularly when combined with Optical Coherent Technology (OCT) imaging. We propose a novel method for efficient detection of eye diseases from OCT images. Our t… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: Full paper accepted at FICC (Springer) 2024

  43. Bring Your Own KG: Self-Supervised Program Synthesis for Zero-Shot KGQA

    Authors: Dhruv Agarwal, Rajarshi Das, Sopan Khosla, Rashmi Gangadharaiah

    Abstract: We present BYOKG, a universal question-answering (QA) system that can operate on any knowledge graph (KG), requires no human-annotated training data, and can be ready to use within a day -- attributes that are out-of-scope for current KGQA systems. BYOKG draws inspiration from the remarkable ability of humans to comprehend information present in an unseen KG through exploration -- starting at rand… ▽ More

    Submitted 21 May, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

  44. arXiv:2309.14556  [pdf, other

    cs.CL cs.AI cs.HC

    Art or Artifice? Large Language Models and the False Promise of Creativity

    Authors: Tuhin Chakrabarty, Philippe Laban, Divyansh Agarwal, Smaranda Muresan, Chien-Sheng Wu

    Abstract: Researchers have argued that large language models (LLMs) exhibit high-quality writing capabilities from blogs to stories. However, evaluating objectively the creativity of a piece of writing is challenging. Inspired by the Torrance Test of Creative Thinking (TTCT), which measures creativity as a process, we use the Consensual Assessment Technique [3] and propose the Torrance Test of Creative Writ… ▽ More

    Submitted 8 March, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: ACM CHI 2024

  45. arXiv:2307.16891  [pdf, other

    eess.SY cs.AI cs.LG

    Foundational Models for Fault Diagnosis of Electrical Motors

    Authors: Sriram Anbalagan, Deepesh Agarwal, Balasubramaniam Natarajan, Babji Srinivasan

    Abstract: A majority of recent advancements related to the fault diagnosis of electrical motors are based on the assumption that training and testing data are drawn from the same distribution. However, the data distribution can vary across different operating conditions during real-world operating scenarios of electrical motors. Consequently, this assumption limits the practical implementation of existing s… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 7 pages, 1 figure, 5 tables, submitted to IEEE PESGRE 2023

  46. arXiv:2307.04610  [pdf, other

    cs.CV

    SPLAL: Similarity-based pseudo-labeling with alignment loss for semi-supervised medical image classification

    Authors: Md Junaid Mahmood, Pranaw Raj, Divyansh Agarwal, Suruchi Kumari, Pravendra Singh

    Abstract: Medical image classification is a challenging task due to the scarcity of labeled samples and class imbalance caused by the high variance in disease prevalence. Semi-supervised learning (SSL) methods can mitigate these challenges by leveraging both labeled and unlabeled data. However, SSL methods for medical image classification need to address two key challenges: (1) estimating reliable pseudo-la… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Under Review

  47. Machine Reading Comprehension using Case-based Reasoning

    Authors: Dung Thai, Dhruv Agarwal, Mudit Chaudhary, Wenlong Zhao, Rajarshi Das, Manzil Zaheer, Jay-Yoon Lee, Hannaneh Hajishirzi, Andrew McCallum

    Abstract: We present an accurate and interpretable method for answer extraction in machine reading comprehension that is reminiscent of case-based reasoning (CBR) from classical AI. Our method (CBR-MRC) builds upon the hypothesis that contextualized answers to similar questions share semantic similarities with each other. Given a test question, CBR-MRC first retrieves a set of similar cases from a nonparame… ▽ More

    Submitted 5 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 9 pages, 2 figures

  48. arXiv:2305.14540  [pdf, other

    cs.CL

    LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

    Authors: Philippe Laban, Wojciech Kryściński, Divyansh Agarwal, Alexander R. Fabbri, Caiming Xiong, Shafiq Joty, Chien-Sheng Wu

    Abstract: With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual consistency benchmarks, we find that a few large language models (LLMs) perform competitively on classification benchmarks for factual inconsistency de… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  49. arXiv:2303.05031  [pdf, other

    cs.CV

    CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing

    Authors: Ambareesh Revanur, Debraj Basu, Shradha Agrawal, Dhwanit Agarwal, Deepak Pai

    Abstract: Edit fidelity is a significant issue in open-world controllable generative image editing. Recently, CLIP-based approaches have traded off simplicity to alleviate these problems by introducing spatial attention in a handpicked layer of a StyleGAN. In this paper, we propose CoralStyleCLIP, which incorporates a multi-layer attention-guided blending strategy in the feature space of StyleGAN2 for obtai… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  50. arXiv:2212.08841  [pdf, other

    cs.CL cs.IR

    AugTriever: Unsupervised Dense Retrieval and Domain Adaptation by Scalable Data Augmentation

    Authors: Rui Meng, Ye Liu, Semih Yavuz, Divyansh Agarwal, Lifu Tu, Ning Yu, Jianguo Zhang, Meghana Bhat, Yingbo Zhou

    Abstract: Dense retrievers have made significant strides in text retrieval and open-domain question answering. However, most of these achievements have relied heavily on extensive human-annotated supervision. In this study, we aim to develop unsupervised methods for improving dense retrieval models. We propose two approaches that enable annotation-free and scalable training by creating pseudo querydocument… ▽ More

    Submitted 29 October, 2024; v1 submitted 17 December, 2022; originally announced December 2022.

    Comments: DCAI24, October 25, 2024, Boise, ID