arXiv:2604.07930 [pdf, ps, other]

Unified Supervision for Walmart's Sponsored Search Retrieval via Joint Semantic Relevance and Behavioral Engagement Modeling

Authors: Shasvat Desai, Md Omar Faruk Rokon, Jhalak Nilesh Acharya, Isha Shah, Hong Yao, Utkarsh Porwal, Kuang-chih Lee

Abstract: Modern search systems rely on a fast first stage retriever to fetch relevant items from a massive catalog of items. Deployed search systems often use user engagement signals to supervise bi-encoder retriever training at scale, because these signals are continuously logged from real traffic and require no additional annotation effort. However, engagement is an imperfect proxy for semantic relevance… ▽ More Modern search systems rely on a fast first stage retriever to fetch relevant items from a massive catalog of items. Deployed search systems often use user engagement signals to supervise bi-encoder retriever training at scale, because these signals are continuously logged from real traffic and require no additional annotation effort. However, engagement is an imperfect proxy for semantic relevance. Items may receive interactions due to popularity, promotion, attractive visuals, titles, or price, despite weak query-item relevance. These limitations are further accentuated in Walmart's e-commerce sponsored search. User engagement on ad items is often structurally sparse because the frequency with which an ad is shown depends on factors beyond relevance such as whether the advertiser is currently running that ad, the outcome of the auction for available ad slots, bid competitiveness, and advertiser budget. Thus, even highly relevant query ad pairs can have limited engagement signals simply due to limited impressions. We propose a bi-encoder training framework for Walmart's sponsored search retrieval in e-commerce that uses semantic relevance as the primary supervision signal, with engagement used only as a preference signal among relevant items. Concretely, we construct a context-rich training target by combining 1. graded relevance labels from a cascade of cross-encoder teacher models, 2. a multichannel retrieval prior score derived from the rank positions and cross-channel agreement of retrieval systems running in production, and 3. user engagement applied only to semantically relevant items to refine preferences. Our approach outperforms the current production system in both offline evaluation and online AB tests, yielding consistent gains in average relevance and NDCG. △ Less

Submitted 15 April, 2026; v1 submitted 9 April, 2026; originally announced April 2026.

Comments: Accepted to SIGIR 2026, Industry Track

arXiv:2603.09012 [pdf, ps, other]

doi 10.1145/3772363.3798685

"Who wants to be nagged by AI?": Investigating the Effects of Agreeableness on Older Adults' Perception of LLM-Based Voice Assistants' Explanations

Authors: Niharika Mathur, Hasibur Rahman, Smit Desai

Abstract: LLM-based voice assistants (VAs) increasingly support older adults aging in place, yet how an assistant's agreeableness shapes explanation perception remains underexplored. We conducted a study(N=70) examining how VA agreeableness influences older adults' perceptions of explanations across routine and emergency home scenarios. High-agreeableness assistants were perceived as more trustworthy, empat… ▽ More LLM-based voice assistants (VAs) increasingly support older adults aging in place, yet how an assistant's agreeableness shapes explanation perception remains underexplored. We conducted a study(N=70) examining how VA agreeableness influences older adults' perceptions of explanations across routine and emergency home scenarios. High-agreeableness assistants were perceived as more trustworthy, empathetic, and likable, but these benefits diminished in emergencies where clarity outweighed warmth. Agreeableness did not affect perceived intelligence, suggesting social tone and competence are separable dimensions. Real-time environmental explanations outperformed history-based ones, and agreeable older adults penalized low-agreeableness assistants more strongly. These findings show the need to move beyond a one-size-fits-all approach to AI explainability, while balancing personality, context, and audience. △ Less

Submitted 9 March, 2026; originally announced March 2026.

Comments: To be published as a poster extended abstract at CHI 2026

arXiv:2603.08164 [pdf, ps, other]

The Differential Effects of Agreeableness and Extraversion on Older Adults' Perceptions of Conversational AI Explanations in Assistive Settings

Authors: Niharika Mathur, Hasibur Rahman, Smit Desai

Abstract: Large Language Model-based Voice Assistants (LLM-VAs) are increasingly deployed in assistive settings for older adults, yet little is known about how an agent's personality shapes user perceptions of its explanations. This paper presents a mixed factorial experiment (N=140) examining how agreeableness and extraversion in an LLM-VA ("Robin") influence older adults' perceptions across seven measures… ▽ More Large Language Model-based Voice Assistants (LLM-VAs) are increasingly deployed in assistive settings for older adults, yet little is known about how an agent's personality shapes user perceptions of its explanations. This paper presents a mixed factorial experiment (N=140) examining how agreeableness and extraversion in an LLM-VA ("Robin") influence older adults' perceptions across seven measures: empathy, likeability, trust, reliance, satisfaction, intention to adopt, and perceived intelligence. Results reveal that high agreeableness drove stronger empathy perceptions, while low agreeableness consistently penalized likeability. Importantly, perceived intelligence remained unaffected by personality, suggesting that personality shapes sociability without altering competence perceptions. Real-time environmental explanations outperformed conversational history explanations on five measures, with advantages concentrated in emergency contexts. Notably, highly agreeable participants were especially critical of low-agreeableness agents, revealing a user-agent personality congruence effect. These findings offer design implications for personality-aware, context-sensitive LLM-VAs in assistive settings. △ Less

Submitted 9 March, 2026; originally announced March 2026.

arXiv:2602.22340 [pdf, ps, other]

Conversational Successes and Breakdowns in Everyday Smart Glasses Use

Authors: Xiuqi Tommy Zhu, Xiaoan Liu, Casper Harteveld, Smit Desai, Eileen McGivney

Abstract: Non-Display Smart Glasses hold the potential to support everyday activities by combining continuous environmental sensing with voice-only interaction powered by large language models (LLMs). Understanding how conversational successes and breakdowns arise in everyday contexts can better inform the design of future voice-only interfaces. To investigate this, we conducted a month-long collaborative a… ▽ More Non-Display Smart Glasses hold the potential to support everyday activities by combining continuous environmental sensing with voice-only interaction powered by large language models (LLMs). Understanding how conversational successes and breakdowns arise in everyday contexts can better inform the design of future voice-only interfaces. To investigate this, we conducted a month-long collaborative autoethnography (n=2) to identify patterns of successes and breakdowns when using such devices. We then compare these patterns with prior findings on voice-only interactions to highlight the unique affordances and opportunities offered by non-display smart glasses. △ Less

Submitted 2 April, 2026; v1 submitted 25 February, 2026; originally announced February 2026.

arXiv:2602.09162 [pdf, ps, other]

Boltzmann Reinforcement Learning for Noise resilience in Analog Ising Machines

Authors: Aditya Choudhary, Saaketh Desai, Prasad Iyer

Abstract: Analog Ising machines (AIMs) have emerged as a promising paradigm for combinatorial optimization, utilizing physical dynamics to solve Ising problems with high energy efficiency. However, the performance of traditional optimization and sampling algorithms on these platforms is often limited by inherent measurement noise. We introduce BRAIN (Boltzmann Reinforcement for Analog Ising Networks), a dis… ▽ More Analog Ising machines (AIMs) have emerged as a promising paradigm for combinatorial optimization, utilizing physical dynamics to solve Ising problems with high energy efficiency. However, the performance of traditional optimization and sampling algorithms on these platforms is often limited by inherent measurement noise. We introduce BRAIN (Boltzmann Reinforcement for Analog Ising Networks), a distribution learning framework that utilizes variational reinforcement learning to approximate the Boltzmann distribution. By shifting from state-by-state sampling to aggregating information across multiple noisy measurements, BRAIN is resilient to Gaussian noise characteristic of AIMs. We evaluate BRAIN across diverse combinatorial topologies, including the Curie-Weiss and 2D nearest-neighbor Ising systems. We find that under realistic 3\% Gaussian measurement noise, BRAIN maintains 98\% ground state fidelity, whereas Markov Chain Monte Carlo (MCMC) methods degrade to 51\% fidelity. Furthermore, BRAIN reaches the MCMC-equivalent solution up to 192x faster under these conditions. BRAIN exhibits $\mathcal{O}(N^{1.55})$ scaling up to 65,536 spins and maintains robustness against severe measurement uncertainty up to 40\%. Beyond ground state optimization, BRAIN accurately captures thermodynamic phase transitions and metastable states, providing a scalable and noise-resilient method for utilizing analog computing architectures in complex optimizations. △ Less

Submitted 9 February, 2026; originally announced February 2026.

arXiv:2602.02917 [pdf, ps, other]

Weighted Temporal Decay Loss for Learning Wearable PPG Data with Sparse Clinical Labels

Authors: Yunsung Chung, Keum San Chun, Migyeong Gwak, Han Feng, Yingshuo Liu, Chanho Lim, Viswam Nathan, Nassir Marrouche, Sharanya Arcot Desai

Abstract: Advances in wearable computing and AI have increased interest in leveraging PPG for health monitoring over the past decade. One of the biggest challenges in developing health algorithms based on such biosignals is the sparsity of clinical labels, which makes biosignals temporally distant from lab draws less reliable for supervision. To address this problem, we introduce a simple training strategy… ▽ More Advances in wearable computing and AI have increased interest in leveraging PPG for health monitoring over the past decade. One of the biggest challenges in developing health algorithms based on such biosignals is the sparsity of clinical labels, which makes biosignals temporally distant from lab draws less reliable for supervision. To address this problem, we introduce a simple training strategy that learns a biomarker-specific decay of sample weight over the time gap between a segment and its ground truth label and uses this weight in the loss with a regularizer to prevent trivial solutions. On smartwatch PPG from 450 participants across 10 biomarkers, the approach improves over baselines. In the subject-wise setting, the proposed approach averages 0.715 AUPRC, compared to 0.674 for a fine-tuned self-supervised baseline and 0.626 for a feature-based Random Forest. A comparison of four decay families shows that a simple linear decay function is most robust on average. Beyond accuracy, the learned decay rates summarize how quickly each biomarker's PPG evidence becomes stale, providing an interpretable view of temporal sensitivity. △ Less

Submitted 2 February, 2026; originally announced February 2026.

Comments: ICASSP 2026

arXiv:2601.12215 [pdf, ps, other]

Wavelet-Driven Masked Multiscale Reconstruction for PPG Foundation Models

Authors: Megha Thukral, Cyrus Tanade, Simon A. Lee, Juhyeon Lee, Hao Zhou, Keum San Chun, Migyeong Gwak, Viswam Nathan, Md Mahbubur Rahman, Li Zhu, Mehrab Bin Morshed, Subramaniam Venkatraman, Sharanya Arcot Desai

Abstract: Wearable foundation models have the potential to transform digital health by learning transferable representations from large-scale biosignals collected in everyday settings. While recent progress has been made in large-scale pretraining, most approaches overlook the spectral structure of photoplethysmography (PPG) signals, wherein physiological rhythms unfold across multiple frequency bands. Moti… ▽ More Wearable foundation models have the potential to transform digital health by learning transferable representations from large-scale biosignals collected in everyday settings. While recent progress has been made in large-scale pretraining, most approaches overlook the spectral structure of photoplethysmography (PPG) signals, wherein physiological rhythms unfold across multiple frequency bands. Motivated by the insight that many downstream health-related tasks depend on multi-resolution features spanning fine-grained waveform morphology to global rhythmic dynamics, we introduce Masked Multiscale Reconstruction (MMR) for PPG representation learning - a self-supervised pretraining framework that explicitly learns from hierarchical time-frequency scales of PPG data. The pretraining task is designed to reconstruct randomly masked out coefficients obtained from a wavelet-based multiresolution decomposition of PPG signals, forcing the transformer encoder to integrate information across temporal and spectral scales. We pretrain our model with MMR using ~17 million unlabeled 10-second PPG segments from ~32,000 smartwatch users. On 17 of 19 diverse health-related tasks, MMR trained on large-scale wearable PPG data improves over or matches state-of-the-art open-source PPG foundation models, time-series foundation models, and other self-supervised baselines. Extensive analysis of our learned embeddings and systematic ablations underscores the value of wavelet-based representations, showing that they capture robust and physiologically-grounded features. Together, these results highlight the potential of MMR as a step toward generalizable PPG foundation models. △ Less

Submitted 17 January, 2026; originally announced January 2026.

arXiv:2511.12404 [pdf, ps, other]

SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs

Authors: Shail Desai, Aditya Pawar, Li Lin, Xin Wang, Shu Hu

Abstract: Artificial Intelligence (AI) has made it possible for anyone to create images, audio, and video with unprecedented ease, enriching education, communication, and creative expression. At the same time, the rapid rise of AI-generated media has introduced serious risks, including misinformation, identity misuse, and the erosion of public trust as synthetic content becomes increasingly indistinguishabl… ▽ More Artificial Intelligence (AI) has made it possible for anyone to create images, audio, and video with unprecedented ease, enriching education, communication, and creative expression. At the same time, the rapid rise of AI-generated media has introduced serious risks, including misinformation, identity misuse, and the erosion of public trust as synthetic content becomes increasingly indistinguishable from real media. Although deepfake detection has advanced, many existing tools remain closed-source, limited in modality, or lacking transparency and educational value, making it difficult for users to understand how detection decisions are made. To address these gaps, we introduce SynthGuard, an open, user-friendly platform for detecting and analyzing AI-generated multimedia using both traditional detectors and multimodal large language models (MLLMs). SynthGuard provides explainable inference, unified image and audio support, and an interactive interface designed to make forensic analysis accessible to researchers, educators, and the public. The SynthGuard platform is available at: https://in-engr-nova.it.purdue.edu/ △ Less

Submitted 15 November, 2025; originally announced November 2025.

arXiv:2511.04681 [pdf, ps, other]

doi 10.1103/3sj1-1l9f

Dark Energy Survey Year 3 results: Simulation-based $w$CDM inference from weak lensing and galaxy clustering maps with deep learning: Analysis design

Authors: A. Thomsen, J. Bucko, T. Kacprzak, V. Ajani, J. Fluri, A. Refregier, D. Anbajagane, F. J. Castander, A. Ferté, M. Gatti, N. Jeffrey, A. Alarcon, A. Amon, K. Bechtol, M. R. Becker, G. M. Bernstein, A. Campos, A. Carnero Rosell, C. Chang, R. Chen, A. Choi, M. Crocce, C. Davis, J. DeRose, S. Dodelson , et al. (77 additional authors not shown)

Abstract: Data-driven approaches using deep learning are emerging as powerful techniques to extract non-Gaussian information from cosmological large-scale structure. This work presents the first simulation-based inference (SBI) pipeline that combines weak lensing and galaxy clustering maps in a realistic Dark Energy Survey Year 3 (DES Y3) configuration and serves as preparation for a forthcoming analysis of… ▽ More Data-driven approaches using deep learning are emerging as powerful techniques to extract non-Gaussian information from cosmological large-scale structure. This work presents the first simulation-based inference (SBI) pipeline that combines weak lensing and galaxy clustering maps in a realistic Dark Energy Survey Year 3 (DES Y3) configuration and serves as preparation for a forthcoming analysis of the survey data. We develop a scalable forward model based on the CosmoGridV1 suite of N-body simulations to generate over one million self-consistent mock realizations of DES Y3 at the map level. Leveraging this large dataset, we train deep graph convolutional neural networks on the full survey footprint in spherical geometry to learn low-dimensional features that approximately maximize mutual information with target parameters. These learned compressions enable neural density estimation of the implicit likelihood via normalizing flows in a ten-dimensional parameter space spanning cosmological $w$CDM, intrinsic alignment, and linear galaxy bias parameters, while marginalizing over baryonic, photometric redshift, and shear bias nuisances. To ensure robustness, we extensively validate our inference pipeline using synthetic observations derived from both systematic contaminations in our forward model and independent Buzzard galaxy catalogs. Our forecasts yield significant improvements in cosmological parameter constraints, achieving $2-3\times$ higher figures of merit in the $Ω_m - S_8$ plane relative to our implementation of baseline two-point statistics and effectively breaking parameter degeneracies through probe combination. These results demonstrate the potential of SBI analyses powered by deep learning for upcoming Stage-IV wide-field imaging surveys. △ Less

Submitted 18 February, 2026; v1 submitted 6 November, 2025; originally announced November 2025.

Comments: 39 pages, 14 figures

arXiv:2510.25785 [pdf, ps, other]

HiMAE: Hierarchical Masked Autoencoders Discover Resolution-Specific Structure in Wearable Time Series

Authors: Simon A. Lee, Cyrus Tanade, Hao Zhou, Juhyeon Lee, Megha Thukral, Minji Han, Rachel Choi, Md Sazzad Hissain Khan, Baiying Lu, Migyeong Gwak, Mehrab Bin Morshed, Viswam Nathan, Md Mahbubur Rahman, Li Zhu, Subramaniam Venkatraman, Sharanya Arcot Desai

Abstract: Wearable sensors provide abundant physiological time series, yet the principles governing their predictive utility remain unclear. We hypothesize that temporal resolution is a fundamental axis of representation learning, with different clinical and behavioral outcomes relying on structure at distinct scales. To test this resolution hypothesis, we introduce HiMAE (Hierarchical Masked Autoencoder),… ▽ More Wearable sensors provide abundant physiological time series, yet the principles governing their predictive utility remain unclear. We hypothesize that temporal resolution is a fundamental axis of representation learning, with different clinical and behavioral outcomes relying on structure at distinct scales. To test this resolution hypothesis, we introduce HiMAE (Hierarchical Masked Autoencoder), a self supervised framework that combines masked autoencoding with a hierarchical convolutional encoder decoder. HiMAE produces multi resolution embeddings that enable systematic evaluation of which temporal scales carry predictive signal, transforming resolution from a hyperparameter into a probe for interpretability. Across classification, regression, and generative benchmarks, HiMAE consistently outperforms state of the art foundation models that collapse scale, while being orders of magnitude smaller. HiMAE is an efficient representation learner compact enough to run entirely on watch, achieving sub millisecond inference on smartwatch class CPUs for true edge inference. Together, these contributions position HiMAE as both an efficient self supervised learning method and a discovery tool for scale sensitive structure in wearable health. △ Less

Submitted 28 October, 2025; originally announced October 2025.

arXiv:2510.22503 [pdf, ps, other]

LLEMA: Evolutionary Search with LLMs for Multi-Objective Materials Discovery

Authors: Nikhil Abhyankar, Sanchit Kabra, Saaketh Desai, Chandan K. Reddy

Abstract: Materials discovery requires navigating vast chemical and structural spaces while satisfying multiple, often conflicting, objectives. We present LLM-guided Evolution for MAterials discovery (LLEMA), a unified framework that couples the scientific knowledge embedded in large language models with chemistry-informed evolutionary rules and memory-based refinement. At each iteration, an LLM proposes cr… ▽ More Materials discovery requires navigating vast chemical and structural spaces while satisfying multiple, often conflicting, objectives. We present LLM-guided Evolution for MAterials discovery (LLEMA), a unified framework that couples the scientific knowledge embedded in large language models with chemistry-informed evolutionary rules and memory-based refinement. At each iteration, an LLM proposes crystallographically specified candidates under explicit property constraints; a surrogate-augmented oracle estimates physicochemical properties; and a multi-objective scorer updates success/failure memories to guide subsequent generations. Evaluated on 14 realistic tasks that span electronics, energy, coatings, optics, and aerospace, LLEMA discovers candidates that are chemically plausible, thermodynamically stable, and property-aligned, achieving higher hit rates and improved Pareto front quality relative to generative and LLM-only baselines. Ablation studies confirm the importance of rule-guided generation, memory-based refinement, and surrogate prediction. By enforcing synthesizability and multi-objective trade-offs, LLEMA provides a principled approach to accelerating practical materials discovery. Project website: https://scientific-discovery.github.io/llema-project/ △ Less

Submitted 5 March, 2026; v1 submitted 25 October, 2025; originally announced October 2025.

Comments: ICLR 2026

arXiv:2509.19147 [pdf, ps, other]

Generative Propaganda

Authors: Madeleine I. G. Daepp, Alejandro Cuevas, Robert Osazuwa Ness, Vickie Yu-Ping Wang, Bharat Kumar Nayak, Dibyendu Mishra, Ti-Chung Cheng, Shaily Desai, Joyojeet Pal

Abstract: Generative propaganda is the use of generative artificial intelligence (AI) to shape public opinion. To characterize its use in real-world settings, we conducted interviews with defenders (e.g., factcheckers, journalists, officials) in Taiwan and creators (e.g., influencers, political consultants, advertisers) as well as defenders in India, centering two places characterized by high levels of onli… ▽ More Generative propaganda is the use of generative artificial intelligence (AI) to shape public opinion. To characterize its use in real-world settings, we conducted interviews with defenders (e.g., factcheckers, journalists, officials) in Taiwan and creators (e.g., influencers, political consultants, advertisers) as well as defenders in India, centering two places characterized by high levels of online propaganda. The term "deepfakes", we find, exerts outsized discursive power in shaping defenders' expectations of misuse and, in turn, the interventions that are prioritized. To better characterize the space of generative propaganda, we develop a taxonomy that distinguishes between obvious versus hidden and promotional versus derogatory use. Deception was neither the main driver nor the main impact vector of AI's use; instead, Indian creators sought to persuade rather than to deceive, often making AI's use obvious in order to reduce legal and reputational risks, while Taiwan's defenders saw deception as a subset of broader efforts to distort the prevalence of strategic narratives online. AI was useful and used, however, in producing efficiency gains in communicating across languages and modes, and in evading human and algorithmic detection. Security researchers should reconsider threat models to clearly differentiate deepfakes from promotional and obvious uses, to complement and bolster the social factors that constrain misuse by internal actors, and to counter efficiency gains globally. △ Less

Submitted 23 September, 2025; originally announced September 2025.

Comments: Working Paper

ACM Class: K.4.2

arXiv:2509.12979 [pdf, ps, other]

Universal share based quantum multi secret image sharing scheme

Authors: Dipak K. Rabari, Yogesh K. Meghrajani, Laxmi S. Desai

Abstract: Image security for information has become increasingly critical as internet become more prevalent due to hacking and unauthorized access. To ensure the security of confidential image data, image encryption using visual cryptography plays a crucial role. To share multiple images using visual cryptography, the company organizer utilizes the concept of a universal or common share. Likewise, quantum c… ▽ More Image security for information has become increasingly critical as internet become more prevalent due to hacking and unauthorized access. To ensure the security of confidential image data, image encryption using visual cryptography plays a crucial role. To share multiple images using visual cryptography, the company organizer utilizes the concept of a universal or common share. Likewise, quantum computing is an emerging technology that facilitates secure communication. The ability of quantum computers to solve certain mathematical problems efficiently threatens the security of many current encryption algorithms. Hence, to leverage the strengths of quantum computing and visual cryptography, this research introduces a novel universal share-based quantum multi-secret sharing technique for secure image communication. Quantum computing enables the scheme to exhibit high resilience to different eavesdropping threats. Consequently, the proposed method offers robust security solution for sharing confidential images across a range of applications, including enterprise data access and military communications. △ Less

Submitted 16 September, 2025; originally announced September 2025.

arXiv:2509.09870 [pdf, ps, other]

Vibe Check: Understanding the Effects of LLM-Based Conversational Agents' Personality and Alignment on User Perceptions in Goal-Oriented Tasks

Authors: Hasibur Rahman, Smit Desai

Abstract: Large language models (LLMs) enable conversational agents (CAs) to express distinctive personalities, raising new questions about how such designs shape user perceptions. This study investigates how personality expression levels and user-agent personality alignment influence perceptions in goal-oriented tasks. In a between-subjects experiment (N=150), participants completed travel planning with CA… ▽ More Large language models (LLMs) enable conversational agents (CAs) to express distinctive personalities, raising new questions about how such designs shape user perceptions. This study investigates how personality expression levels and user-agent personality alignment influence perceptions in goal-oriented tasks. In a between-subjects experiment (N=150), participants completed travel planning with CAs exhibiting low, medium, or high expression across the Big Five traits, controlled via our novel Trait Modulation Keys framework. Results revealed an inverted-U relationship: medium expression produced the most positive evaluations across Intelligence, Enjoyment, Anthropomorphism, Intention to Adopt, Trust, and Likeability, significantly outperforming both extremes. Personality alignment further enhanced outcomes, with Extraversion and Emotional Stability emerging as the most influential traits. Cluster analysis identified three distinct compatibility profiles, with "Well-Aligned" users reporting substantially positive perceptions. These findings demonstrate that personality expression and strategic trait alignment constitute optimal design targets for CA personality, offering design implications as LLM-based CAs become increasingly prevalent. △ Less

Submitted 11 September, 2025; originally announced September 2025.

arXiv:2508.21098 [pdf, ps, other]

TrInk: Ink Generation with Transformer Network

Authors: Zezhong Jin, Shubhang Desai, Xu Chen, Biyi Fang, Zhuoyi Huang, Zhe Li, Chong-Xin Gan, Xiao Tu, Man-Wai Mak, Yan Lu, Shujie Liu

Abstract: In this paper, we propose TrInk, a Transformer-based model for ink generation, which effectively captures global dependencies. To better facilitate the alignment between the input text and generated stroke points, we introduce scaled positional embeddings and a Gaussian memory mask in the cross-attention module. Additionally, we design both subjective and objective evaluation pipelines to comprehe… ▽ More In this paper, we propose TrInk, a Transformer-based model for ink generation, which effectively captures global dependencies. To better facilitate the alignment between the input text and generated stroke points, we introduce scaled positional embeddings and a Gaussian memory mask in the cross-attention module. Additionally, we design both subjective and objective evaluation pipelines to comprehensively assess the legibility and style consistency of the generated handwriting. Experiments demonstrate that our Transformer-based model achieves a 35.56\% reduction in character error rate (CER) and an 29.66% reduction in word error rate (WER) on the IAM-OnDB dataset compared to previous methods. We provide an demo page with handwriting samples from TrInk and baseline models at: https://akahello-a11y.github.io/trink-demo/ △ Less

Submitted 27 August, 2025; originally announced August 2025.

Comments: Accepted to EMNLP 2025 Main Conference

arXiv:2508.14318 [pdf, ps, other]

Power Stabilization for AI Training Datacenters

Authors: Esha Choukse, Brijesh Warrier, Scot Heath, Luz Belmont, April Zhao, Hassan Ali Khan, Brian Harry, Matthew Kappel, Russell J. Hewett, Kushal Datta, Yu Pei, Caroline Lichtenberger, John Siegler, David Lukofsky, Zaid Kahn, Gurpreet Sahota, Andy Sullivan, Charles Frederick, Hien Thai, Rebecca Naughton, Daniel Jurnove, Justin Harp, Reid Carper, Nithish Mahalingam, Srini Varkala , et al. (32 additional authors not shown)

Abstract: Large Artificial Intelligence (AI) training workloads spanning several tens of thousands of GPUs present unique power management challenges. These arise due to the high variability in power consumption during the training. Given the synchronous nature of these jobs, during every iteration there is a computation-heavy phase, where each GPU works on the local data, and a communication-heavy phase wh… ▽ More Large Artificial Intelligence (AI) training workloads spanning several tens of thousands of GPUs present unique power management challenges. These arise due to the high variability in power consumption during the training. Given the synchronous nature of these jobs, during every iteration there is a computation-heavy phase, where each GPU works on the local data, and a communication-heavy phase where all the GPUs synchronize on the data. Because compute-heavy phases require much more power than communication phases, large power swings occur. The amplitude of these power swings is ever increasing with the increase in the size of training jobs. An even bigger challenge arises from the frequency spectrum of these power swings which, if harmonized with critical frequencies of utilities, can cause physical damage to the power grid infrastructure. Therefore, to continue scaling AI training workloads safely, we need to stabilize the power of such workloads. This paper introduces the challenge with production data and explores innovative solutions across the stack: software, GPU hardware, and datacenter infrastructure. We present the pros and cons of each of these approaches and finally present a multi-pronged approach to solving the challenge. The proposed solutions are rigorously tested using a combination of real hardware and Microsoft's in-house cloud power simulator, providing critical insights into the efficacy of these interventions under real-world conditions. △ Less

Submitted 21 August, 2025; v1 submitted 19 August, 2025; originally announced August 2025.

arXiv:2506.07400 [pdf, ps, other]

doi 10.1109/MIPR67560.2025.00078

MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models

Authors: Philip R. Liu, Sparsh Bansal, Jimmy Dinh, Aditya Pawar, Ramani Satishkumar, Shail Desai, Neeraj Gupta, Xin Wang, Shu Hu

Abstract: The integration of deep learning-based glaucoma detection with large language models (LLMs) presents an automated strategy to mitigate ophthalmologist shortages and improve clinical reporting efficiency. However, applying general LLMs to medical imaging remains challenging due to hallucinations, limited interpretability, and insufficient domain-specific medical knowledge, which can potentially red… ▽ More The integration of deep learning-based glaucoma detection with large language models (LLMs) presents an automated strategy to mitigate ophthalmologist shortages and improve clinical reporting efficiency. However, applying general LLMs to medical imaging remains challenging due to hallucinations, limited interpretability, and insufficient domain-specific medical knowledge, which can potentially reduce clinical accuracy. Although recent approaches combining imaging models with LLM reasoning have improved reporting, they typically rely on a single generalist agent, restricting their capacity to emulate the diverse and complex reasoning found in multidisciplinary medical teams. To address these limitations, we propose MedChat, a multi-agent diagnostic framework and platform that combines specialized vision models with multiple role-specific LLM agents, all coordinated by a director agent. This design enhances reliability, reduces hallucination risk, and enables interactive diagnostic reporting through an interface tailored for clinical review and educational use. Code available at https://github.com/Purdue-M2/MedChat. △ Less

Submitted 16 December, 2025; v1 submitted 8 June, 2025; originally announced June 2025.

Journal ref: Proc. 2025 IEEE 8th International Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 456-462, 2025

arXiv:2506.04659 [pdf, ps, other]

Multi-Tool Analysis of User Interface & Accessibility in Deployed Web-Based Chatbots

Authors: Mukesh Rajmohan, Smit Desai, Sanchari Das

Abstract: In this work, we present a multi-tool evaluation of 106 deployed web-based chatbots, across domains like healthcare, education and customer service, comprising both standalone applications and embedded widgets using automated tools (Google Lighthouse, PageSpeed Insights, SiteImprove Accessibility Checker) and manual audits (Microsoft Accessibility Insights). Our analysis reveals that over 80% of c… ▽ More In this work, we present a multi-tool evaluation of 106 deployed web-based chatbots, across domains like healthcare, education and customer service, comprising both standalone applications and embedded widgets using automated tools (Google Lighthouse, PageSpeed Insights, SiteImprove Accessibility Checker) and manual audits (Microsoft Accessibility Insights). Our analysis reveals that over 80% of chatbots exhibit at least one critical accessibility issue, and 45% suffer from missing semantic structures or ARIA role misuse. Furthermore, we found that accessibility scores correlate strongly across tools (e.g., Lighthouse vs PageSpeed Insights, r = 0.861), but performance scores do not (r = 0.436), underscoring the value of a multi-tool approach. We offer a replicable evaluation insights and actionable recommendations to support the development of user-friendly conversational interfaces. △ Less

Submitted 5 June, 2025; originally announced June 2025.

Comments: 9 pages, 6 figures. Submitted to ACM Conversational User Interfaces (CUI) 2025

ACM Class: H.5.2; K.4.2

arXiv:2506.02539 [pdf, ps, other]

VerificAgent: Domain-Specific Memory Verification for Scalable Oversight of Aligned Computer-Use Agents

Authors: Thong Q. Nguyen, Shubhang Desai, Raja Hasnain Anwar, Firoz Shaik, Vishwas Suryanarayanan, Vishal Chowdhary

Abstract: Continual memory augmentation lets computer-using agents (CUAs) learn from prior interactions, but unvetted memories can encode domain-inappropriate or unsafe heuristics--spurious rules that drift from user intent and safety constraints. We introduce VerificAgent, a scalable oversight framework that treats persistent memory as an explicit alignment surface. VerificAgent combines (1) an expert-cura… ▽ More Continual memory augmentation lets computer-using agents (CUAs) learn from prior interactions, but unvetted memories can encode domain-inappropriate or unsafe heuristics--spurious rules that drift from user intent and safety constraints. We introduce VerificAgent, a scalable oversight framework that treats persistent memory as an explicit alignment surface. VerificAgent combines (1) an expert-curated seed of domain knowledge, (2) iterative, trajectory-based memory growth during training, and (3) a post-hoc human fact-checking pass to sanitize accumulated memories before deployment. Evaluated on OSWorld productivity tasks and additional adversarial stress tests, VerificAgent improves task reliability, reduces hallucination-induced failures, and preserves interpretable, auditable guidance--without additional model fine-tuning. By letting humans correct high-impact errors once, the verified memory acts as a frozen safety contract that future agent actions must satisfy. Our results suggest that domain-scoped, human-verified memory offers a scalable oversight mechanism for CUAs, complementing broader alignment strategies by limiting silent policy drift and anchoring agent behavior to the norms and safety constraints of the target domain. △ Less

Submitted 7 August, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

arXiv:2506.00241 [pdf, ps, other]

doi 10.1145/3772318.3791301

Balancing Efficiency and Empathy: Healthcare Providers' Perspectives on AI-Supported Workflows for Serious Illness Conversations in the Emergency Department

Authors: Menglin Zhao, Zhuorui Yong, Ruijia Guan, Kai-Wei Chang, Adrian Haimovich, Kei Ouchi, Timothy Bickmore, Zhan Zhang, Bingsheng Yao, Dakuo Wang, Smit Desai

Abstract: Serious Illness Conversations (SICs), discussions about values and care preferences for patients with life-threatening illness, rarely occur in Emergency Departments (EDs), despite evidence that early conversations improve care alignment and reduce unnecessary interventions. We interviewed 11 ED providers to identify challenges in SICs and opportunities for technology support, with a focus on AI.… ▽ More Serious Illness Conversations (SICs), discussions about values and care preferences for patients with life-threatening illness, rarely occur in Emergency Departments (EDs), despite evidence that early conversations improve care alignment and reduce unnecessary interventions. We interviewed 11 ED providers to identify challenges in SICs and opportunities for technology support, with a focus on AI. Our analysis revealed a four-stage SIC workflow (identification, preparation, conduction, documentation) and barriers at each stage, including fragmented patient information, limited time and space, lack of conversational guidance, and burdensome documentation. Providers expressed interest in AI systems for synthesizing information, supporting real-time conversations, and automating documentation, but emphasized concerns about preserving human connection and clinical autonomy. This tension highlights the need for technologies that enhance efficiency without undermining the interpersonal nature of SICs. We propose design guidelines for ambient and peripheral AI systems to support providers while preserving the essential humanity of these conversations. △ Less

Submitted 31 March, 2026; v1 submitted 30 May, 2025; originally announced June 2025.

Comments: To appear at ACM CHI'26

arXiv:2504.00698 [pdf]

Command A: An Enterprise-Ready Large Language Model

Authors: Team Cohere, :, Aakanksha, Arash Ahmadian, Marwan Ahmed, Jay Alammar, Milad Alizadeh, Yazeed Alnumay, Sophia Althammer, Arkady Arkhangorodsky, Viraat Aryabumi, Dennis Aumiller, Raphaël Avalos, Zahara Aviv, Sammie Bae, Saurabh Baji, Alexandre Barbet, Max Bartolo, Björn Bebensee, Neeral Beladia, Walter Beller-Morales, Alexandre Bérard, Andrew Berneshawi, Anna Bialas, Phil Blunsom , et al. (205 additional authors not shown)

Abstract: In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languages of global business, and a novel hybrid architecture balancing efficiency with top of the range performance. It offers best-in-class Retrieval Augmented Genera… ▽ More In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languages of global business, and a novel hybrid architecture balancing efficiency with top of the range performance. It offers best-in-class Retrieval Augmented Generation (RAG) capabilities with grounding and tool use to automate sophisticated business processes. These abilities are achieved through a decentralised training approach, including self-refinement algorithms and model merging techniques. We also include results for Command R7B which shares capability and architectural similarities to Command A. Weights for both models have been released for research purposes. This technical report details our original training pipeline and presents an extensive evaluation of our models across a suite of enterprise-relevant tasks and public benchmarks, demonstrating excellent performance and efficiency. △ Less

Submitted 14 April, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

Comments: 55 pages

arXiv:2503.14603 [pdf, other]

Command R7B Arabic: A Small, Enterprise Focused, Multilingual, and Culturally Aware Arabic LLM

Authors: Yazeed Alnumay, Alexandre Barbet, Anna Bialas, William Darling, Shaan Desai, Joan Devassy, Kyle Duffy, Stephanie Howe, Olivia Lasche, Justin Lee, Anirudh Shrinivason, Jennifer Tracey

Abstract: Building high-quality large language models (LLMs) for enterprise Arabic applications remains challenging due to the limited availability of digitized Arabic data. In this work, we present a data synthesis and refinement strategy to help address this problem, namely, by leveraging synthetic data generation and human-in-the-loop annotation to expand our Arabic training corpus. We further present ou… ▽ More Building high-quality large language models (LLMs) for enterprise Arabic applications remains challenging due to the limited availability of digitized Arabic data. In this work, we present a data synthesis and refinement strategy to help address this problem, namely, by leveraging synthetic data generation and human-in-the-loop annotation to expand our Arabic training corpus. We further present our iterative post training recipe that is essential to achieving state-of-the-art performance in aligning the model with human preferences, a critical aspect to enterprise use cases. The culmination of this effort is the release of a small, 7B, open-weight model that outperforms similarly sized peers in head-to-head comparisons and on Arabic-focused benchmarks covering cultural knowledge, instruction following, RAG, and contextual faithfulness. △ Less

Submitted 18 March, 2025; originally announced March 2025.

arXiv:2502.20513 [pdf, ps, other]

Personas Evolved: Designing Ethical LLM-Based Conversational Agent Personalities

Authors: Smit Desai, Mateusz Dubiel, Nima Zargham, Thomas Mildner, Laura Spillner

Abstract: The emergence of Large Language Models (LLMs) has revolutionized Conversational User Interfaces (CUIs), enabling more dynamic, context-aware, and human-like interactions across diverse domains, from social sciences to healthcare. However, the rapid adoption of LLM-based personas raises critical ethical and practical concerns, including bias, manipulation, and unforeseen social consequences. Unlike… ▽ More The emergence of Large Language Models (LLMs) has revolutionized Conversational User Interfaces (CUIs), enabling more dynamic, context-aware, and human-like interactions across diverse domains, from social sciences to healthcare. However, the rapid adoption of LLM-based personas raises critical ethical and practical concerns, including bias, manipulation, and unforeseen social consequences. Unlike traditional CUIs, where personas are carefully designed with clear intent, LLM-based personas generate responses dynamically from vast datasets, making their behavior less predictable and harder to govern. This workshop aims to bridge the gap between CUI and broader AI communities by fostering a cross-disciplinary dialogue on the responsible design and evaluation of LLM-based personas. Bringing together researchers, designers, and practitioners, we will explore best practices, develop ethical guidelines, and promote frameworks that ensure transparency, inclusivity, and user-centered interactions. By addressing these challenges collaboratively, we seek to shape the future of LLM-driven CUIs in ways that align with societal values and expectations. △ Less

Submitted 27 February, 2025; originally announced February 2025.

arXiv:2502.16030 [pdf, other]

Real Time Offside Detection using a Single Camera in Soccer

Authors: Shounak Desai

Abstract: Technological advancements in soccer have surged over the past decade, transforming aspects of the sport. Unlike binary rules, many soccer regulations, such as the "Offside Rule," rely on subjective interpretation rather than straightforward True or False criteria. The on-field referee holds ultimate authority in adjudicating these nuanced decisions. A significant breakthrough in soccer officiatin… ▽ More Technological advancements in soccer have surged over the past decade, transforming aspects of the sport. Unlike binary rules, many soccer regulations, such as the "Offside Rule," rely on subjective interpretation rather than straightforward True or False criteria. The on-field referee holds ultimate authority in adjudicating these nuanced decisions. A significant breakthrough in soccer officiating is the Video Assistant Referee (VAR) system, leveraging a network of 20-30 cameras within stadiums to minimize human errors. VAR's operational scope typically encompasses 10-30 cameras, ensuring high decision accuracy but at a substantial cost. This report proposes an innovative approach to offside detection using a single camera, such as the broadcasting camera, to mitigate expenses associated with sophisticated technological setups. △ Less

Submitted 21 February, 2025; originally announced February 2025.

Comments: 5 pages 11 figures

arXiv:2502.11554 [pdf, ps, other]

Toward Metaphor-Fluid Conversation Design for Voice User Interfaces

Authors: Smit Desai, Jessie Chin, Dakuo Wang, Benjamin Cowan, Michael Twidale

Abstract: Metaphors play a critical role in shaping user experiences with Voice User Interfaces (VUIs), yet existing designs often rely on static, human-centric metaphors that fail to adapt to diverse contexts and user needs. This paper introduces Metaphor-Fluid Design, a novel approach that dynamically adjusts metaphorical representations based on conversational use-contexts. We compare this approach to a… ▽ More Metaphors play a critical role in shaping user experiences with Voice User Interfaces (VUIs), yet existing designs often rely on static, human-centric metaphors that fail to adapt to diverse contexts and user needs. This paper introduces Metaphor-Fluid Design, a novel approach that dynamically adjusts metaphorical representations based on conversational use-contexts. We compare this approach to a Default VUI, which characterizes the present implementation of commercial VUIs commonly designed around the persona of an assistant, offering a uniform interaction style across contexts. In Study 1 (N=130), metaphors were mapped to four key use-contexts-commands, information seeking, sociality, and error recovery-along the dimensions of formality and hierarchy, revealing distinct preferences for task-specific metaphorical designs. Study 2 (N=91) evaluates a Metaphor-Fluid VUI against a Default VUI, showing that the Metaphor-Fluid VUI enhances perceived intention to adopt, enjoyment, and likability by aligning better with user expectations for different contexts. However, individual differences in metaphor preferences highlight the need for personalization. These findings challenge the one-size-fits-all paradigm of VUI design and demonstrate the potential of Metaphor-Fluid Design to create more adaptive and engaging human-AI interactions. △ Less

Submitted 23 October, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

arXiv:2502.08134 [pdf, other]

A Survey on Data Curation for Visual Contrastive Learning: Why Crafting Effective Positive and Negative Pairs Matters

Authors: Shasvat Desai, Debasmita Ghose, Deep Chakraborty

Abstract: Visual contrastive learning aims to learn representations by contrasting similar (positive) and dissimilar (negative) pairs of data samples. The design of these pairs significantly impacts representation quality, training efficiency, and computational cost. A well-curated set of pairs leads to stronger representations and faster convergence. As contrastive pre-training sees wider adoption for solv… ▽ More Visual contrastive learning aims to learn representations by contrasting similar (positive) and dissimilar (negative) pairs of data samples. The design of these pairs significantly impacts representation quality, training efficiency, and computational cost. A well-curated set of pairs leads to stronger representations and faster convergence. As contrastive pre-training sees wider adoption for solving downstream tasks, data curation becomes essential for optimizing its effectiveness. In this survey, we attempt to create a taxonomy of existing techniques for positive and negative pair curation in contrastive learning, and describe them in detail. △ Less

Submitted 12 February, 2025; originally announced February 2025.

Comments: 9 pages, 3 figures

arXiv:2502.05115 [pdf, ps, other]

"It Felt Like I Was Left in the Dark": Exploring Information Needs and Design Opportunities for Family Caregivers of Older Adult Patients in Critical Care Settings

Authors: Shihan Fu, Bingsheng Yao, Smit Desai, Yuqi Hu, Yuling Sun, Samantha Stonbraker, Yanjun Gao, Elizabeth M. Goldberg, Dakuo Wang

Abstract: Older adult patients constitute a rapidly growing subgroup of Intensive Care Unit (ICU) patients. In these situations, their family caregivers are expected to represent the unconscious patients to access and interpret patients' medical information. However, caregivers currently have to rely on overloaded clinicians for information updates and typically lack the health literacy to understand comple… ▽ More Older adult patients constitute a rapidly growing subgroup of Intensive Care Unit (ICU) patients. In these situations, their family caregivers are expected to represent the unconscious patients to access and interpret patients' medical information. However, caregivers currently have to rely on overloaded clinicians for information updates and typically lack the health literacy to understand complex medical information. Our project aims to explore the information needs of caregivers of ICU older adult patients, from which we can propose design opportunities to guide future AI systems. The project begins with formative interviews with 11 caregivers to identify their challenges in accessing and interpreting medical information; From these findings, we then synthesize design requirements and propose an AI system prototype to cope with caregivers' challenges. The system prototype has two key features: a timeline visualization to show the AI extracted and summarized older adult patients' key medical events; and an LLM-based chatbot to provide context-aware informational support. We conclude our paper by reporting on the follow-up user evaluation of the system and discussing future AI-based systems for ICU caregivers of older adults. △ Less

Submitted 18 September, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

arXiv:2412.12347 [pdf, other]

AutoSciLab: A Self-Driving Laboratory For Interpretable Scientific Discovery

Authors: Saaketh Desai, Sadhvikas Addamane, Jeffrey Y. Tsao, Igal Brener, Laura P. Swiler, Remi Dingreville, Prasad P. Iyer

Abstract: Advances in robotic control and sensing have propelled the rise of automated scientific laboratories capable of high-throughput experiments. However, automated scientific laboratories are currently limited by human intuition in their ability to efficiently design and interpret experiments in high-dimensional spaces, throttling scientific discovery. We present AutoSciLab, a machine learning framewo… ▽ More Advances in robotic control and sensing have propelled the rise of automated scientific laboratories capable of high-throughput experiments. However, automated scientific laboratories are currently limited by human intuition in their ability to efficiently design and interpret experiments in high-dimensional spaces, throttling scientific discovery. We present AutoSciLab, a machine learning framework for driving autonomous scientific experiments, forming a surrogate researcher purposed for scientific discovery in high-dimensional spaces. AutoSciLab autonomously follows the scientific method in four steps: (i) generating high-dimensional experiments (x \in R^D) using a variational autoencoder (ii) selecting optimal experiments by forming hypotheses using active learning (iii) distilling the experimental results to discover relevant low-dimensional latent variables (z \in R^d, with d << D) with a 'directional autoencoder' and (iv) learning a human interpretable equation connecting the discovered latent variables with a quantity of interest (y = f(z)), using a neural network equation learner. We validate the generalizability of AutoSciLab by rediscovering a) the principles of projectile motion and b) the phase transitions within the spin-states of the Ising model (NP-hard problem). Applying our framework to an open-ended nanophotonics challenge, AutoSciLab uncovers a fundamentally novel method for directing incoherent light emission that surpasses the current state-of-the-art (Iyer et al. 2023b, 2020). △ Less

Submitted 16 December, 2024; originally announced December 2024.

Comments: Pre-print for paper accepted in AAAI

arXiv:2411.03140 [pdf]

The Impact of Medicaid Expansion on Medicare Quality Measures

Authors: Hala Algrain, Elizabeth Cardosa, Shekha Desai, Eugene Fong, Tanguy Ringoir, Huthaifa I. Ashqar

Abstract: The Affordable Care Act was signed into law in 2010, expanding Medicaid and improving access to care for millions of low-income Americans. Fewer uninsured individuals reduced the cost of uncompensated care, consequently improving the financial health of hospitals. We hypothesize that this amelioration in hospital finances resulted in a marked improvement of quality measures in states that chose to… ▽ More The Affordable Care Act was signed into law in 2010, expanding Medicaid and improving access to care for millions of low-income Americans. Fewer uninsured individuals reduced the cost of uncompensated care, consequently improving the financial health of hospitals. We hypothesize that this amelioration in hospital finances resulted in a marked improvement of quality measures in states that chose to expand Medicaid. To our knowledge, the impact of Medicaid expansion on the Medicare population has not been investigated. Using a difference-in-difference analysis, we compare readmission rates for four measures from the Hospital Readmission Reduction Program: acute myocardial infarction, pneumonia, heart failure, and coronary artery bypass graft surgery. Our analysis provides evidence that between 2013 and 2021 expansion states improved hospital quality relative to non-expansion states as it relates to acute myocardial infarction readmissions (p = 0.015) and coronary artery bypass graft surgery readmissions (p = 0.039). Our analysis provides some evidence that expanding Medicaid improved hospital quality, as measured by a reduction in readmission rates. Using visualizations, we provide some evidence that hospital quality improved for the other two measures as well. We believe that a refinement of our estimation method and an improved dataset will increase our chances of finding significant results for these two other measures. △ Less

Submitted 5 November, 2024; originally announced November 2024.

arXiv:2410.22744 [pdf, ps, other]

Designing AI Personalities: Enhancing Human-Agent Interaction Through Thoughtful Persona Design

Authors: Nima Zargham, Mateusz Dubiel, Smit Desai, Thomas Mildner, Hanz-Joachim Belz

Abstract: In the rapidly evolving field of artificial intelligence (AI) agents, designing the agent's characteristics is crucial for shaping user experience. This workshop aims to establish a research community focused on AI agent persona design for various contexts, such as in-car assistants, educational tools, and smart home environments. We will explore critical aspects of persona design, such as voice,… ▽ More In the rapidly evolving field of artificial intelligence (AI) agents, designing the agent's characteristics is crucial for shaping user experience. This workshop aims to establish a research community focused on AI agent persona design for various contexts, such as in-car assistants, educational tools, and smart home environments. We will explore critical aspects of persona design, such as voice, embodiment, and demographics, and their impact on user satisfaction and engagement. Through discussions and hands-on activities, we aim to propose practices and standards that enhance the ecological validity of agent personas. Topics include the design of conversational interfaces, the influence of agent personas on user experience, and approaches for creating contextually appropriate AI agents. This workshop will provide a platform for building a community dedicated to developing AI agent personas that better fit diverse, everyday interactions. △ Less

Submitted 30 October, 2024; originally announced October 2024.

Comments: 8 pages, the workshop accepted at the 23rd International Conference on Mobile and Ubiquitous Multimedia (MUM 2024)

arXiv:2410.03342 [pdf, other]

doi 10.1140/epjp/s13360-025-06397-8

A meta-analysis of impact factors of astrophysics journals

Authors: Rayani Venkat Sai Rithvik, Shantanu Desai

Abstract: We calculate the 2024 impact factors for the 38 most widely used journals in Astrophysics, using the citations collated by NASA/ADS (Astrophysics Data System) and compare them to the official impact factors. This includes journals which publish papers outside of astrophysics such as PRD, EPJC, Nature, etc. We also propose a new metric to gauge the impact factor based on the median number of citati… ▽ More We calculate the 2024 impact factors for the 38 most widely used journals in Astrophysics, using the citations collated by NASA/ADS (Astrophysics Data System) and compare them to the official impact factors. This includes journals which publish papers outside of astrophysics such as PRD, EPJC, Nature, etc. We also propose a new metric to gauge the impact factor based on the median number of citations in a journal and calculate the same for all the journals. We find that the ADS-based impact factors are mostly in agreement, albeit higher than the official impact factors for most journals. The journals with the maximum fractional difference in median-based and old impact factors are JHEAP and PTEP. We find the maximum difference between the ADS and official impact factor for Nature. △ Less

Submitted 4 May, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

Comments: 10 pages, 2 figures. More journals added. Also added miscellaneous publication statistics as well as APC details. Accepted for publication in EPJP

arXiv:2409.08449 [pdf, other]

Beyond Functionality: Co-Designing Voice User Interfaces for Older Adults' Well-being

Authors: Xinhui Hu, Smit Desai, Morgan Lundy, Jessie Chin

Abstract: The global population is rapidly aging, necessitating technologies that promote healthy aging. Voice User Interfaces (VUIs), leveraging natural language interaction, offer a promising solution for older adults due to their ease of use. However, current design practices often overemphasize functionality, neglecting older adults' complex aspirations, psychological well-being, and social connectednes… ▽ More The global population is rapidly aging, necessitating technologies that promote healthy aging. Voice User Interfaces (VUIs), leveraging natural language interaction, offer a promising solution for older adults due to their ease of use. However, current design practices often overemphasize functionality, neglecting older adults' complex aspirations, psychological well-being, and social connectedness. To address this gap, we conducted co-design sessions with 20 older adults employing an empathic design approach. Half of the participants interacted with a probe involving health information learning, while the others focused on a probe related to exercise. This method engaged participants in collaborative activities to uncover non-functional requirements early in the design process. Results indicate that when encouraged to share their needs within a social context, older adults revealed a range of sensory, aesthetic, hedonic, and social preferences and, more importantly, the specific personas of VUIs. These insights inform the relative importance of these factors in VUI design. △ Less

Submitted 12 September, 2024; originally announced September 2024.

arXiv:2409.00018 [pdf, ps, other]

Analysis of nonlocal smart beams following fractional-order constitutive relations

Authors: Shubham Desai, Sai Sidhardh

Abstract: In this study, we develop a fractional-calculus based constitutive model for capturing nonlocal interactions over the multiphysics response in solids. More specifically, we develop constitutive relations for nonlocal piezoelectricity incorporating fractional-order kinematic relations to capture the long-range interactions over electrical and mechanical field variables. This study breaks new ground… ▽ More In this study, we develop a fractional-calculus based constitutive model for capturing nonlocal interactions over the multiphysics response in solids. More specifically, we develop constitutive relations for nonlocal piezoelectricity incorporating fractional-order kinematic relations to capture the long-range interactions over electrical and mechanical field variables. This study breaks new ground by developing fractional-order constitutive models for a two-way multiphysics (electro-mechanical) coupling, specifically the direct and converse piezoelectric effect. It is expected that long-range interactions over each field variable (elastic and electrical) can be leveraged to develop metastructures with enhanced multiphysics coupling. To better illustrate this, we choose the example of a smart beam composed of a nonlocal substrate and a piezoelectric layer. We establish the analytical and numerical framework to analyze nonlocal smart beams based on variational principles. The fractional-Finite Element (f-FE) numerical solver, facilitating multiphysics coupling, undergoes comprehensive validation through multiple case studies. Finally, detailed studies point towards tuning the multiphysics coupling possible via nonlocal interactions across the domain. △ Less

Submitted 16 August, 2024; originally announced September 2024.

Comments: 36 pages, 21 figures

arXiv:2408.16465 [pdf, other]

Human and LLM-Based Voice Assistant Interaction: An Analytical Framework for User Verbal and Nonverbal Behaviors

Authors: Szeyi Chan, Shihan Fu, Jiachen Li, Bingsheng Yao, Smit Desai, Mirjana Prpa, Dakuo Wang

Abstract: Recent progress in large language model (LLM) technology has significantly enhanced the interaction experience between humans and voice assistants (VAs). This project aims to explore a user's continuous interaction with LLM-based VA (LLM-VA) during a complex task. We recruited 12 participants to interact with an LLM-VA during a cooking task, selected for its complexity and the requirement for cont… ▽ More Recent progress in large language model (LLM) technology has significantly enhanced the interaction experience between humans and voice assistants (VAs). This project aims to explore a user's continuous interaction with LLM-based VA (LLM-VA) during a complex task. We recruited 12 participants to interact with an LLM-VA during a cooking task, selected for its complexity and the requirement for continuous interaction. We observed that users show both verbal and nonverbal behaviors, though they know that the LLM-VA can not capture those nonverbal signals. Despite the prevalence of nonverbal behavior in human-human communication, there is no established analytical methodology or framework for exploring it in human-VA interactions. After analyzing 3 hours and 39 minutes of video recordings, we developed an analytical framework with three dimensions: 1) behavior characteristics, including both verbal and nonverbal behaviors, 2) interaction stages--exploration, conflict, and integration--that illustrate the progression of user interactions, and 3) stage transition throughout the task. This analytical framework identifies key verbal and nonverbal behaviors that provide a foundation for future research and practical applications in optimizing human and LLM-VA interactions. △ Less

Submitted 3 September, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

arXiv:2407.16083 [pdf]

Self-driving lab discovers principles for steering spontaneous emission

Authors: Saaketh Desai, Sadhvikas Addamane, Jeffery Y. Tsao, Igal Brener, Remi Dingreville, Prasad P. Iyer

Abstract: We developed an autonomous experimentation platform to accelerate interpretable scientific discovery in ultrafast nanophotonics, targeting a novel method to steer spontaneous emission from reconfigurable semiconductor metasurfaces. Controlling spontaneous emission is crucial for clean-energy solutions in illumination, thermal radiation engineering, and remote sensing. Despite the potential of reco… ▽ More We developed an autonomous experimentation platform to accelerate interpretable scientific discovery in ultrafast nanophotonics, targeting a novel method to steer spontaneous emission from reconfigurable semiconductor metasurfaces. Controlling spontaneous emission is crucial for clean-energy solutions in illumination, thermal radiation engineering, and remote sensing. Despite the potential of reconfigurable semiconductor metasurfaces with embedded sources for spatiotemporal control, achieving arbitrary far-field control remains challenging. Here, we present a self-driving lab (SDL) platform that addresses this challenge by discovering the governing equations for predicting the far-field emission profile from light-emitting metasurfaces. We discover that both the spatial gradient (grating-like) and the curvature (lens-like) of the local refractive index are key factors in steering spontaneous emission. The SDL employs a machine-learning framework comprising: (1) a variational autoencoder for generating complex spatial refractive index profiles, (2) an active learning agent for guiding experiments with real-time closed-loop feedback, and (3) a neural network-based equation learner to uncover structure-property relationships. The SDL demonstrated a four-fold enhancement in peak emission directivity (up to 77%) over a 72° field of view within ~300 experiments. Our findings reveal that combinations of positive gratings and lenses are as effective as negative lenses and gratings for all emission angles, offering a novel strategy for controlling spontaneous emission beyond conventional Fourier optics. △ Less

Submitted 24 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

Comments: 25 pages, 4 figures in main text, 5 figures in supplementary information

arXiv:2406.10290 [pdf, other]

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

Authors: Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese

Abstract: The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understand… ▽ More The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understanding of quantization's impact on various task performances, including LLM tasks, LMM tasks, and, critically, trust and safety. There is a lack of adequate tools for systematically testing these models on mobile devices. To address these gaps, we introduce MobileAIBench, a comprehensive benchmarking framework for evaluating mobile-optimized LLMs and LMMs. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices. Our two-part open-source framework includes a library for running evaluations on desktops and an iOS app for on-device latency and hardware utilization measurements. Our thorough analysis aims to accelerate mobile AI research and deployment by providing insights into the performance and feasibility of deploying LLMs and LMMs on mobile platforms. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.07458 [pdf, other]

Examining Humanness as a Metaphor to Design Voice User Interfaces

Authors: Smit Desai, Mateusz Dubiel, Luis A. Leiva

Abstract: Voice User Interfaces (VUIs) increasingly leverage 'humanness' as a foundational design metaphor, adopting roles like 'assistants,' 'teachers,' and 'secretaries' to foster natural interactions. Yet, this approach can sometimes misalign user trust and reinforce societal stereotypes, leading to socio-technical challenges that might impede long-term engagement. This paper explores an alternative appr… ▽ More Voice User Interfaces (VUIs) increasingly leverage 'humanness' as a foundational design metaphor, adopting roles like 'assistants,' 'teachers,' and 'secretaries' to foster natural interactions. Yet, this approach can sometimes misalign user trust and reinforce societal stereotypes, leading to socio-technical challenges that might impede long-term engagement. This paper explores an alternative approach to navigate these challenges-incorporating non-human metaphors in VUI design. We report on a study with 240 participants examining the effects of human versus non-human metaphors on user perceptions within health and finance domains. Results indicate a preference for the human metaphor (doctor) over the non-human (health encyclopedia) in health contexts for its perceived enjoyability and likeability. In finance, however, user perceptions do not significantly differ between human (financial advisor) and non-human (calculator) metaphors. Importantly, our research reveals that the explicit awareness of a metaphor's use influences adoption intentions, with a marked preference for non-human metaphors when their metaphorical nature is not disclosed. These findings highlight context-specific conversation design strategies required in integrating non-human metaphors into VUI design, suggesting tradeoffs and design considerations that could enhance user engagement and adoption. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: Accepted to appear in the proceedings of CUI 2024

arXiv:2401.13970 [pdf, ps, other]

doi 10.1145/3613905.3636287

CUI@CHI 2024: Building Trust in CUIs-From Design to Deployment

Authors: Smit Desai, Christina Wei, Jaisie Sin, Mateusz Dubiel, Nima Zargham, Shashank Ahire, Martin Porcheron, Anastasia Kuzminykh, Minha Lee, Heloisa Candello, Joel Fischer, Cosmin Munteanu, Benjamin R Cowan

Abstract: Conversational user interfaces (CUIs) have become an everyday technology for people the world over, as well as a booming area of research. Advances in voice synthesis and the emergence of chatbots powered by large language models (LLMs), notably ChatGPT, have pushed CUIs to the forefront of human-computer interaction (HCI) research and practice. Now that these technologies enable an elemental leve… ▽ More Conversational user interfaces (CUIs) have become an everyday technology for people the world over, as well as a booming area of research. Advances in voice synthesis and the emergence of chatbots powered by large language models (LLMs), notably ChatGPT, have pushed CUIs to the forefront of human-computer interaction (HCI) research and practice. Now that these technologies enable an elemental level of usability and user experience (UX), we must turn our attention to higher-order human factors: trust and reliance. In this workshop, we aim to bring together a multidisciplinary group of researchers and practitioners invested in the next phase of CUI design. Through keynotes, presentations, and breakout sessions, we will share our knowledge, identify cutting-edge resources, and fortify an international network of CUI scholars. In particular, we will engage with the complexity of trust and reliance as attitudes and behaviours that emerge when people interact with conversational agents. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2312.05410 [pdf, other]

Rethinking materials simulations: Blending direct numerical simulations with neural operators

Authors: Vivek Oommen, Khemraj Shukla, Saaketh Desai, Remi Dingreville, George Em Karniadakis

Abstract: Direct numerical simulations (DNS) are accurate but computationally expensive for predicting materials evolution across timescales, due to the complexity of the underlying evolution equations, the nature of multiscale spatio-temporal interactions, and the need to reach long-time integration. We develop a new method that blends numerical solvers with neural operators to accelerate such simulations.… ▽ More Direct numerical simulations (DNS) are accurate but computationally expensive for predicting materials evolution across timescales, due to the complexity of the underlying evolution equations, the nature of multiscale spatio-temporal interactions, and the need to reach long-time integration. We develop a new method that blends numerical solvers with neural operators to accelerate such simulations. This methodology is based on the integration of a community numerical solver with a U-Net neural operator, enhanced by a temporal-conditioning mechanism that enables accurate extrapolation and efficient time-to-solution predictions of the dynamics. We demonstrate the effectiveness of this framework on simulations of microstructure evolution during physical vapor deposition modeled via the phase-field method. Such simulations exhibit high spatial gradients due to the co-evolution of different material phases with simultaneous slow and fast materials dynamics. We establish accurate extrapolation of the coupled solver with up to 16.5$\times$ speed-up compared to DNS. This methodology is generalizable to a broad range of evolutionary models, from solid mechanics, to fluid dynamics, geophysics, climate, and more. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2310.20608 [pdf, other]

Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback

Authors: Max Balsells, Marcel Torne, Zihan Wang, Samedh Desai, Pulkit Agrawal, Abhishek Gupta

Abstract: Ideally, we would place a robot in a real-world environment and leave it there improving on its own by gathering more experience autonomously. However, algorithms for autonomous robotic learning have been challenging to realize in the real world. While this has often been attributed to the challenge of sample complexity, even sample-efficient techniques are hampered by two major challenges - the d… ▽ More Ideally, we would place a robot in a real-world environment and leave it there improving on its own by gathering more experience autonomously. However, algorithms for autonomous robotic learning have been challenging to realize in the real world. While this has often been attributed to the challenge of sample complexity, even sample-efficient techniques are hampered by two major challenges - the difficulty of providing well "shaped" rewards, and the difficulty of continual reset-free training. In this work, we describe a system for real-world reinforcement learning that enables agents to show continual improvement by training directly in the real world without requiring painstaking effort to hand-design reward functions or reset mechanisms. Our system leverages occasional non-expert human-in-the-loop feedback from remote users to learn informative distance functions to guide exploration while leveraging a simple self-supervised learning algorithm for goal-directed policy learning. We show that in the absence of resets, it is particularly important to account for the current "reachability" of the exploration policy when deciding which regions of the space to explore. Based on this insight, we instantiate a practical learning system - GEAR, which enables robots to simply be placed in real-world environments and left to train autonomously without interruption. The system streams robot experience to a web interface only requiring occasional asynchronous feedback from remote, crowdsourced, non-expert humans in the form of binary comparative feedback. We evaluate this system on a suite of robotic tasks in simulation and demonstrate its effectiveness at learning behaviors both in simulation and the real world. Project website https://guided-exploration-autonomous-rl.github.io/GEAR/. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: Project website https://guided-exploration-autonomous-rl.github.io/GEAR/

arXiv:2310.00354 [pdf, other]

doi 10.1186/s12903-024-04120-0

AI-Dentify: Deep learning for proximal caries detection on bitewing x-ray -- HUNT4 Oral Health Study

Authors: Javier Pérez de Frutos, Ragnhild Holden Helland, Shreya Desai, Line Cathrine Nymoen, Thomas Langø, Theodor Remman, Abhijit Sen

Abstract: Background: Dental caries diagnosis requires the manual inspection of diagnostic bitewing images of the patient, followed by a visual inspection and probing of the identified dental pieces with potential lesions. Yet the use of artificial intelligence, and in particular deep-learning, has the potential to aid in the diagnosis by providing a quick and informative analysis of the bitewing images.… ▽ More Background: Dental caries diagnosis requires the manual inspection of diagnostic bitewing images of the patient, followed by a visual inspection and probing of the identified dental pieces with potential lesions. Yet the use of artificial intelligence, and in particular deep-learning, has the potential to aid in the diagnosis by providing a quick and informative analysis of the bitewing images. Methods: A dataset of 13,887 bitewings from the HUNT4 Oral Health Study were annotated individually by six different experts, and used to train three different object detection deep-learning architectures: RetinaNet (ResNet50), YOLOv5 (M size), and EfficientDet (D0 and D1 sizes). A consensus dataset of 197 images, annotated jointly by the same six dentist, was used for evaluation. A five-fold cross validation scheme was used to evaluate the performance of the AI models. Results: he trained models show an increase in average precision and F1-score, and decrease of false negative rate, with respect to the dental clinicians. When compared against the dental clinicians, the YOLOv5 model shows the largest improvement, reporting 0.647 mean average precision, 0.548 mean F1-score, and 0.149 mean false negative rate. Whereas the best annotators on each of these metrics reported 0.299, 0.495, and 0.164 respectively. Conclusion: Deep-learning models have shown the potential to assist dental professionals in the diagnosis of caries. Yet, the task remains challenging due to the artifacts natural to the bitewing images. △ Less

Submitted 22 March, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

Comments: 24 pages, 5 figure, 7 tables

ACM Class: I.2.10; I.2.1

Journal ref: BMC Oral Health 24, 344 (2024)

arXiv:2309.12583 [pdf]

doi 10.1145/3571884.3603755

Using ChatGPT in HCI Research -- A Trioethnography

Authors: Smit Desai, Tanusree Sharma, Pratyasha Saha

Abstract: This paper explores the lived experience of using ChatGPT in HCI research through a month-long trioethnography. Our approach combines the expertise of three HCI researchers with diverse research interests to reflect on our daily experience of living and working with ChatGPT. Our findings are presented as three provocations grounded in our collective experiences and HCI theories. Specifically, we e… ▽ More This paper explores the lived experience of using ChatGPT in HCI research through a month-long trioethnography. Our approach combines the expertise of three HCI researchers with diverse research interests to reflect on our daily experience of living and working with ChatGPT. Our findings are presented as three provocations grounded in our collective experiences and HCI theories. Specifically, we examine (1) the emotional impact of using ChatGPT, with a focus on frustration and embarrassment, (2) the absence of accountability and consideration of future implications in design, and raise (3) questions around bias from a Global South perspective. Our work aims to inspire critical discussions about utilizing ChatGPT in HCI research and advance equitable and inclusive technological development. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.11681 [pdf]

doi 10.1145/3637365

Like My Aunt Dorothy: Effects of Conversational Styles on Perceptions, Acceptance and Metaphorical Descriptions of Voice Assistants during Later Adulthood

Authors: Jessie Chin, Smit Desai, Sheny Lin, Shannon Mejia

Abstract: Little research has investigated the design of conversational styles of voice assistants (VA) for adults in their later adulthood with varying personalities. In this Wizard of Oz experiment, 34 middle-aged (50 to 64 years old) and 24 older adults (65 to 80 years old) participated in a user study at a simulated home, interacting with a VA using either formal or informal language. Older adults with… ▽ More Little research has investigated the design of conversational styles of voice assistants (VA) for adults in their later adulthood with varying personalities. In this Wizard of Oz experiment, 34 middle-aged (50 to 64 years old) and 24 older adults (65 to 80 years old) participated in a user study at a simulated home, interacting with a VA using either formal or informal language. Older adults with higher agreeableness perceived VA as being more likeable than middle-aged adults. Middle-aged adults showed similar technology acceptance toward the informal and formal VA, and older adults preferred using informal VA, especially those with low agreeableness. Further, while both middle-aged and older adults frequently anthropomorphized VAs by using human metaphors for them, older adults compared formal VA with professionals (e.g., librarians, teachers) and informal VA with their close ones (e.g., spouses, relatives). Overall, the conversational style showed differential effects on the perceptions of middle-aged and older adults, suggesting personalized design implications. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2308.10714 [pdf, other]

CXL Memory as Persistent Memory for Disaggregated HPC: A Practical Approach

Authors: Yehonatan Fridman, Suprasad Mutalik Desai, Navneet Singh, Thomas Willhalm, Gal Oren

Abstract: In the landscape of High-Performance Computing (HPC), the quest for efficient and scalable memory solutions remains paramount. The advent of Compute Express Link (CXL) introduces a promising avenue with its potential to function as a Persistent Memory (PMem) solution in the context of disaggregated HPC systems. This paper presents a comprehensive exploration of CXL memory's viability as a candidat… ▽ More In the landscape of High-Performance Computing (HPC), the quest for efficient and scalable memory solutions remains paramount. The advent of Compute Express Link (CXL) introduces a promising avenue with its potential to function as a Persistent Memory (PMem) solution in the context of disaggregated HPC systems. This paper presents a comprehensive exploration of CXL memory's viability as a candidate for PMem, supported by physical experiments conducted on cutting-edge multi-NUMA nodes equipped with CXL-attached memory prototypes. Our study not only benchmarks the performance of CXL memory but also illustrates the seamless transition from traditional PMem programming models to CXL, reinforcing its practicality. To substantiate our claims, we establish a tangible CXL prototype using an FPGA card embodying CXL 1.1/2.0 compliant endpoint designs (Intel FPGA CXL IP). Performance evaluations, executed through the STREAM and STREAM-PMem benchmarks, showcase CXL memory's ability to mirror PMem characteristics in App-Direct and Memory Mode while achieving impressive bandwidth metrics with Intel 4th generation Xeon (Sapphire Rapids) processors. The results elucidate the feasibility of CXL memory as a persistent memory solution, outperforming previously established benchmarks. In contrast to published DCPMM results, our CXL-DDR4 memory module offers comparable bandwidth to local DDR4 memory configurations, albeit with a moderate decrease in performance. The modified STREAM-PMem application underscores the ease of transitioning programming models from PMem to CXL, thus underscoring the practicality of adopting CXL memory. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 12 pages, 9 figures

arXiv:2307.11049 [pdf, other]

Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback

Authors: Marcel Torne, Max Balsells, Zihan Wang, Samedh Desai, Tao Chen, Pulkit Agrawal, Abhishek Gupta

Abstract: Exploration and reward specification are fundamental and intertwined challenges for reinforcement learning. Solving sequential decision-making tasks requiring expansive exploration requires either careful design of reward functions or the use of novelty-seeking exploration bonuses. Human supervisors can provide effective guidance in the loop to direct the exploration process, but prior methods to… ▽ More Exploration and reward specification are fundamental and intertwined challenges for reinforcement learning. Solving sequential decision-making tasks requiring expansive exploration requires either careful design of reward functions or the use of novelty-seeking exploration bonuses. Human supervisors can provide effective guidance in the loop to direct the exploration process, but prior methods to leverage this guidance require constant synchronous high-quality human feedback, which is expensive and impractical to obtain. In this work, we present a technique called Human Guided Exploration (HuGE), which uses low-quality feedback from non-expert users that may be sporadic, asynchronous, and noisy. HuGE guides exploration for reinforcement learning not only in simulation but also in the real world, all without meticulous reward specification. The key concept involves bifurcating human feedback and policy learning: human feedback steers exploration, while self-supervised learning from the exploration data yields unbiased policies. This procedure can leverage noisy, asynchronous human feedback to learn policies with no hand-crafted reward design or exploration bonuses. HuGE is able to learn a variety of challenging multi-stage robotic navigation and manipulation tasks in simulation using crowdsourced feedback from non-expert users. Moreover, this paradigm can be scaled to learning directly on real-world robots, using occasional, asynchronous feedback from human supervisors. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2305.15534 [pdf, other]

doi 10.1145/3593013.3594112

Representation Online Matters: Practical End-to-End Diversification in Search and Recommender Systems

Authors: Pedro Silva, Bhawna Juneja, Shloka Desai, Ashudeep Singh, Nadia Fawaz

Abstract: As the use of online platforms continues to grow across all demographics, users often express a desire to feel represented in the content. To improve representation in search results and recommendations, we introduce end-to-end diversification, ensuring that diverse content flows throughout the various stages of these systems, from retrieval to ranking. We develop, experiment, and deploy scalable… ▽ More As the use of online platforms continues to grow across all demographics, users often express a desire to feel represented in the content. To improve representation in search results and recommendations, we introduce end-to-end diversification, ensuring that diverse content flows throughout the various stages of these systems, from retrieval to ranking. We develop, experiment, and deploy scalable diversification mechanisms in multiple production surfaces on the Pinterest platform, including Search, Related Products, and New User Homefeed, to improve the representation of different skin tones in beauty and fashion content. Diversification in production systems includes three components: identifying requests that will trigger diversification, ensuring diverse content is retrieved from the large content corpus during the retrieval stage, and finally, balancing the diversity-utility trade-off in a self-adjusting manner in the ranking stage. Our approaches, which evolved from using Strong-OR logical operator to bucketized retrieval at the retrieval stage and from greedy re-rankers to multi-objective optimization using determinantal point processes for the ranking stage, balances diversity and utility while enabling fast iterations and scalable expansion to diversification over multiple dimensions. Our experiments indicate that these approaches significantly improve diversity metrics, with a neutral to a positive impact on utility metrics and improved user satisfaction, both qualitatively and quantitatively, in production. An accessible PDF of this article is available at https://drive.google.com/file/d/1p5PkqC-sdtX19Y_IAjZCtiSxSEX1IP3q/view △ Less

Submitted 26 May, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT '23), June 12--15, 2023, Chicago, IL, USA

arXiv:2305.13776 [pdf, other]

Counterspeeches up my sleeve! Intent Distribution Learning and Persistent Fusion for Intent-Conditioned Counterspeech Generation

Authors: Rishabh Gupta, Shaily Desai, Manvi Goel, Anil Bandhakavi, Tanmoy Chakraborty, Md. Shad Akhtar

Abstract: Counterspeech has been demonstrated to be an efficacious approach for combating hate speech. While various conventional and controlled approaches have been studied in recent years to generate counterspeech, a counterspeech with a certain intent may not be sufficient in every scenario. Due to the complex and multifaceted nature of hate speech, utilizing multiple forms of counter-narratives with var… ▽ More Counterspeech has been demonstrated to be an efficacious approach for combating hate speech. While various conventional and controlled approaches have been studied in recent years to generate counterspeech, a counterspeech with a certain intent may not be sufficient in every scenario. Due to the complex and multifaceted nature of hate speech, utilizing multiple forms of counter-narratives with varying intents may be advantageous in different circumstances. In this paper, we explore intent-conditioned counterspeech generation. At first, we develop IntentCONAN, a diversified intent-specific counterspeech dataset with 6831 counterspeeches conditioned on five intents, i.e., informative, denouncing, question, positive, and humour. Subsequently, we propose QUARC, a two-stage framework for intent-conditioned counterspeech generation. QUARC leverages vector-quantized representations learned for each intent category along with PerFuMe, a novel fusion module to incorporate intent-specific information into the model. Our evaluation demonstrates that QUARC outperforms several baselines by an average of 10% across evaluation metrics. An extensive human evaluation supplements our hypothesis of better and more appropriate responses than comparative systems. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: ACL 2023

arXiv:2305.04506 [pdf, other]

Pedestrian Behavior Maps for Safety Advisories: CHAMP Framework and Real-World Data Analysis

Authors: Ross Greer, Samveed Desai, Lulua Rakla, Akshay Gopalkrishnan, Afnan Alofi, Mohan Trivedi

Abstract: It is critical for vehicles to prevent any collisions with pedestrians. Current methods for pedestrian collision prevention focus on integrating visual pedestrian detectors with Automatic Emergency Braking (AEB) systems which can trigger warnings and apply brakes as a pedestrian enters a vehicle's path. Unfortunately, pedestrian-detection-based systems can be hindered in certain situations such as… ▽ More It is critical for vehicles to prevent any collisions with pedestrians. Current methods for pedestrian collision prevention focus on integrating visual pedestrian detectors with Automatic Emergency Braking (AEB) systems which can trigger warnings and apply brakes as a pedestrian enters a vehicle's path. Unfortunately, pedestrian-detection-based systems can be hindered in certain situations such as night-time or when pedestrians are occluded. Our system addresses such issues using an online, map-based pedestrian detection aggregation system where common pedestrian locations are learned after repeated passes of locations. Using a carefully collected and annotated dataset in La Jolla, CA, we demonstrate the system's ability to learn pedestrian zones and generate advisory notices when a vehicle is approaching a pedestrian despite challenges like dark lighting or pedestrian occlusion. Using the number of correct advisories, false advisories, and missed advisories to define precision and recall performance metrics, we evaluate our system and discuss future positive effects with further data collection. We have made our code available at https://github.com/s7desai/ped-mapping, and a video demonstration of the CHAMP system at https://youtu.be/dxeCrS_Gpkw. △ Less

Submitted 8 May, 2023; originally announced May 2023.

arXiv:2303.17971 [pdf, other]

Rule Enforcing Through Ordering

Authors: David Sychrovský, Sameer Desai, Martin Loebl

Abstract: In many real world situations, like minor traffic offenses in big cities, a central authority is tasked with periodic administering punishments to a large number of individuals. Common practice is to give each individual a chance to suffer a smaller fine and be guaranteed to avoid the legal process with probable considerably larger punishment. However, thanks to the large number of offenders and a… ▽ More In many real world situations, like minor traffic offenses in big cities, a central authority is tasked with periodic administering punishments to a large number of individuals. Common practice is to give each individual a chance to suffer a smaller fine and be guaranteed to avoid the legal process with probable considerably larger punishment. However, thanks to the large number of offenders and a limited capacity of the central authority, the individual risk is typically small and a rational individual will not choose to pay the fine. Here we show that if the central authority processes the offenders in a publicly known order, it properly incentives the offenders to pay the fine. We show analytically and on realistic experiments that our mechanism promotes non-cooperation and incentives individuals to pay. Moreover, the same holds for an arbitrary coalition. We quantify the expected total payment the central authority receives, and show it increases considerably. △ Less

Submitted 24 October, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

Comments: Accepted at the 14th Conference on Decision and Game Theory for Security (GameSec-23)

arXiv:2303.06008 [pdf]

A detailed review of blockchain and cryptocurrency

Authors: Nayak Bhatia, Sanchit Bansal, Smit Desai

Abstract: Cryptocurrency is something that we have all heard about recently, most likely preceded by bitcoin, and how much its prices have boomed over the decade. These cryptocurrencies are actually based on blockchain, a secure datatype, and recently popular form of technology. This paper gives a detailed review about the concept of blockchain and its potential applications, especially elaborating on crypt… ▽ More Cryptocurrency is something that we have all heard about recently, most likely preceded by bitcoin, and how much its prices have boomed over the decade. These cryptocurrencies are actually based on blockchain, a secure datatype, and recently popular form of technology. This paper gives a detailed review about the concept of blockchain and its potential applications, especially elaborating on cryptocurrency, and it also contains a detailed case study of blockchain Dubai. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Showing 1–50 of 99 results for author: Desai, S