-
"Who wants to be nagged by AI?": Investigating the Effects of Agreeableness on Older Adults' Perception of LLM-Based Voice Assistants' Explanations
Authors:
Niharika Mathur,
Hasibur Rahman,
Smit Desai
Abstract:
LLM-based voice assistants (VAs) increasingly support older adults aging in place, yet how an assistant's agreeableness shapes explanation perception remains underexplored. We conducted a study(N=70) examining how VA agreeableness influences older adults' perceptions of explanations across routine and emergency home scenarios. High-agreeableness assistants were perceived as more trustworthy, empat…
▽ More
LLM-based voice assistants (VAs) increasingly support older adults aging in place, yet how an assistant's agreeableness shapes explanation perception remains underexplored. We conducted a study(N=70) examining how VA agreeableness influences older adults' perceptions of explanations across routine and emergency home scenarios. High-agreeableness assistants were perceived as more trustworthy, empathetic, and likable, but these benefits diminished in emergencies where clarity outweighed warmth. Agreeableness did not affect perceived intelligence, suggesting social tone and competence are separable dimensions. Real-time environmental explanations outperformed history-based ones, and agreeable older adults penalized low-agreeableness assistants more strongly. These findings show the need to move beyond a one-size-fits-all approach to AI explainability, while balancing personality, context, and audience.
△ Less
Submitted 9 March, 2026;
originally announced March 2026.
-
The Differential Effects of Agreeableness and Extraversion on Older Adults' Perceptions of Conversational AI Explanations in Assistive Settings
Authors:
Niharika Mathur,
Hasibur Rahman,
Smit Desai
Abstract:
Large Language Model-based Voice Assistants (LLM-VAs) are increasingly deployed in assistive settings for older adults, yet little is known about how an agent's personality shapes user perceptions of its explanations. This paper presents a mixed factorial experiment (N=140) examining how agreeableness and extraversion in an LLM-VA ("Robin") influence older adults' perceptions across seven measures…
▽ More
Large Language Model-based Voice Assistants (LLM-VAs) are increasingly deployed in assistive settings for older adults, yet little is known about how an agent's personality shapes user perceptions of its explanations. This paper presents a mixed factorial experiment (N=140) examining how agreeableness and extraversion in an LLM-VA ("Robin") influence older adults' perceptions across seven measures: empathy, likeability, trust, reliance, satisfaction, intention to adopt, and perceived intelligence. Results reveal that high agreeableness drove stronger empathy perceptions, while low agreeableness consistently penalized likeability. Importantly, perceived intelligence remained unaffected by personality, suggesting that personality shapes sociability without altering competence perceptions. Real-time environmental explanations outperformed conversational history explanations on five measures, with advantages concentrated in emergency contexts. Notably, highly agreeable participants were especially critical of low-agreeableness agents, revealing a user-agent personality congruence effect. These findings offer design implications for personality-aware, context-sensitive LLM-VAs in assistive settings.
△ Less
Submitted 9 March, 2026;
originally announced March 2026.
-
Unsupervised Physics-Informed Operator Learning through Multi-Stage Curriculum Training
Authors:
Paolo Marcandelli,
Natansh Mathur,
Stefano Markidis,
Martina Siena,
Stefano Mariani
Abstract:
Solving partial differential equations remains a central challenge in scientific machine learning. Neural operators offer a promising route by learning mappings between function spaces and enabling resolution-independent inference, yet they typically require supervised data. Physics-informed neural networks address this limitation through unsupervised training with physical constraints but often s…
▽ More
Solving partial differential equations remains a central challenge in scientific machine learning. Neural operators offer a promising route by learning mappings between function spaces and enabling resolution-independent inference, yet they typically require supervised data. Physics-informed neural networks address this limitation through unsupervised training with physical constraints but often suffer from unstable convergence and limited generalization capability. To overcome these issues, we introduce a multi-stage physics-informed training strategy that achieves convergence by progressively enforcing boundary conditions in the loss landscape and subsequently incorporating interior residuals. At each stage the optimizer is re-initialized, acting as a continuation mechanism that restores stability and prevents gradient stagnation. We further propose the Physics-Informed Spline Fourier Neural Operator (PhIS-FNO), combining Fourier layers with Hermite spline kernels for smooth residual evaluation. Across canonical benchmarks, PhIS-FNO attains a level of accuracy comparable to that of supervised learning, using labeled information only along a narrow boundary region, establishing staged, spline-based optimization as a robust paradigm for physics-informed operator learning.
△ Less
Submitted 2 February, 2026;
originally announced February 2026.
-
SonoEdit: Null-Space Constrained Knowledge Editing for Pronunciation Correction in LLM-Based TTS
Authors:
Ayush Pratap Singh,
Harshit Singh,
Nityanand Mathur,
Akshat Mandloi,
Sudarshan Kamath
Abstract:
Neural text-to-speech (TTS) systems systematically mispronounce low-resource proper nouns, particularly non-English names, brands, and geographic locations, due to their underrepresentation in predominantly English training corpora. Existing solutions typically rely on expensive multilingual data collection, supervised finetuning, or manual phonetic annotation, which limits the deployment of TTS s…
▽ More
Neural text-to-speech (TTS) systems systematically mispronounce low-resource proper nouns, particularly non-English names, brands, and geographic locations, due to their underrepresentation in predominantly English training corpora. Existing solutions typically rely on expensive multilingual data collection, supervised finetuning, or manual phonetic annotation, which limits the deployment of TTS systems in linguistically diverse settings. We introduce SonoEdit, a model editing technique that surgically corrects pronunciation errors in pre-trained TTS models without retraining. Instead of costly finetuning or explicit phoneme injection, we propose a parsimonious alternative based on Null-Space Pronunciation Editing, which performs a single-shot parameter update to modify the pronunciation of specific words while provably preserving all other model behavior. We first adapt Acoustic Causal Tracing to identify the Transformer layers responsible for text-to-pronunciation mapping. We then apply Null-Space Constrained Editing to compute a closed-form weight update that corrects the target pronunciation while remaining mathematically orthogonal to the subspace governing general speech generation. This constrained update steers the model's acoustic output toward a desired pronunciation exemplar while guaranteeing zero first-order change on a preserved speech corpus.
△ Less
Submitted 23 January, 2026;
originally announced January 2026.
-
Taming the Real-world Complexities in CPT E/M Coding with Large Language Models
Authors:
Islam Nassar,
Yang Lin,
Yuan Jin,
Rongxin Zhu,
Chang Wei Tan,
Zenan Zhai,
Nitika Mathur,
Thanh Tien Vu,
Xu Zhong,
Long Duong,
Yuan-Fang Li
Abstract:
Evaluation and Management (E/M) coding, under the Current Procedural Terminology (CPT) taxonomy, documents medical services provided to patients by physicians. Used primarily for billing purposes, it is in physicians' best interest to provide accurate CPT E/M codes. %While important, it is an auxiliary task that adds to physicians' documentation burden. Automating this coding task will help allevi…
▽ More
Evaluation and Management (E/M) coding, under the Current Procedural Terminology (CPT) taxonomy, documents medical services provided to patients by physicians. Used primarily for billing purposes, it is in physicians' best interest to provide accurate CPT E/M codes. %While important, it is an auxiliary task that adds to physicians' documentation burden. Automating this coding task will help alleviate physicians' documentation burden, improve billing efficiency, and ultimately enable better patient care. However, a number of real-world complexities have made E/M encoding automation a challenging task. In this paper, we elaborate some of the key complexities and present ProFees, our LLM-based framework that tackles them, followed by a systematic evaluation. On an expert-curated real-world dataset, ProFees achieves an increase in coding accuracy of more than 36\% over a commercial CPT E/M coding system and almost 5\% over our strongest single-prompt baseline, demonstrating its effectiveness in addressing the real-world complexities.
△ Less
Submitted 28 October, 2025;
originally announced October 2025.
-
Sometimes You Need Facts, and Sometimes a Hug: Understanding Older Adults' Preferences for Explanations in LLM-Based Conversational AI Systems
Authors:
Niharika Mathur,
Tamara Zubatiy,
Agata Rozga,
Jodi Forlizzi,
Elizabeth Mynatt
Abstract:
Designing Conversational AI systems to support older adults requires these systems to explain their behavior in ways that align with older adults' preferences and context. While prior work has emphasized the importance of AI explainability in building user trust, relatively little is known about older adults' requirements and perceptions of AI-generated explanations. To address this gap, we conduc…
▽ More
Designing Conversational AI systems to support older adults requires these systems to explain their behavior in ways that align with older adults' preferences and context. While prior work has emphasized the importance of AI explainability in building user trust, relatively little is known about older adults' requirements and perceptions of AI-generated explanations. To address this gap, we conducted an exploratory Speed Dating study with 23 older adults to understand their responses to contextually grounded AI explanations. Our findings reveal the highly context-dependent nature of explanations, shaped by conversational cues such as the content, tone, and framing of explanation. We also found that explanations are often interpreted as interactive, multi-turn conversational exchanges with the AI, and can be helpful in calibrating urgency, guiding actionability, and providing insights into older adults' daily lives for their family members. We conclude by discussing implications for designing context-sensitive and personalized explanations in Conversational AI systems.
△ Less
Submitted 28 February, 2026; v1 submitted 8 October, 2025;
originally announced October 2025.
-
"It feels like hard work trying to talk to it": Understanding Older Adults' Experiences of Encountering and Repairing Conversational Breakdowns with AI Systems
Authors:
Niharika Mathur,
Tamara Zubatiy,
Agata Rozga,
Elizabeth Mynatt
Abstract:
Designing Conversational AI systems to support older adults requires more than usability and reliability, it also necessitates robustness in handling conversational breakdowns. In this study, we investigate how older adults navigate and repair such breakdowns while interacting with a voice-based AI system deployed in their homes for medication management. Through a 20-week in-home deployment with…
▽ More
Designing Conversational AI systems to support older adults requires more than usability and reliability, it also necessitates robustness in handling conversational breakdowns. In this study, we investigate how older adults navigate and repair such breakdowns while interacting with a voice-based AI system deployed in their homes for medication management. Through a 20-week in-home deployment with 7 older adult participant dyads, we analyzed 844 recoded interactions to identify conversational breakdowns and user-initiated repair strategies. Through findings gleaned from post-deployment interviews, we reflect on the nature of these breakdowns and older adults' experiences of mitigating them. We identify four types of conversational breakdowns and demonstrate how older adults draw on their situated knowledge and environment to make sense of and recover from these disruptions, highlighting the cognitive effort required in doing so. Our findings emphasize the collaborative nature of interactions in human-AI contexts, and point to the need for AI systems to better align with users' expectations for memory, their routines, and external resources in their environment. We conclude by discussing opportunities for AI systems to integrate contextual knowledge from older adults' sociotechnical environment and to facilitate more meaningful and user-centered interactions.
△ Less
Submitted 8 October, 2025;
originally announced October 2025.
-
HIRE: Lightweight High-Resolution Image Feature Enrichment for Multimodal LLMs
Authors:
Nikitha SR,
Aradhya Neeraj Mathur,
Tarun Ram Menta,
Rishabh Jain,
Mausoom Sarkar
Abstract:
The integration of high-resolution image features in modern multimodal large language models has demonstrated significant improvements in fine-grained visual understanding tasks, achieving high performance across multiple benchmarks. Since these features are obtained from large image encoders like ViT, they come with a significant increase in computational costs due to multiple calls to these enco…
▽ More
The integration of high-resolution image features in modern multimodal large language models has demonstrated significant improvements in fine-grained visual understanding tasks, achieving high performance across multiple benchmarks. Since these features are obtained from large image encoders like ViT, they come with a significant increase in computational costs due to multiple calls to these encoders. In this work, we first develop an intuition for feature upsampling as a natural extension of high-resolution feature generation. Through extensive experiments and ablations, we demonstrate how a shallow feature enricher can achieve competitive results with tremendous reductions in training and inference time as well as computational cost, with upto 1.5x saving in FLOPs.
△ Less
Submitted 21 June, 2025;
originally announced June 2025.
-
Bayesian Quantum Orthogonal Neural Networks for Anomaly Detection
Authors:
Natansh Mathur,
Brian Coyle,
Nishant Jain,
Snehal Raj,
Akshat Tandon,
Jasper Simon Krauser,
Rainer Stoessel
Abstract:
Identification of defects or anomalies in 3D objects is a crucial task to ensure correct functionality. In this work, we combine Bayesian learning with recent developments in quantum and quantum-inspired machine learning, specifically orthogonal neural networks, to tackle this anomaly detection problem for an industrially relevant use case. Bayesian learning enables uncertainty quantification of p…
▽ More
Identification of defects or anomalies in 3D objects is a crucial task to ensure correct functionality. In this work, we combine Bayesian learning with recent developments in quantum and quantum-inspired machine learning, specifically orthogonal neural networks, to tackle this anomaly detection problem for an industrially relevant use case. Bayesian learning enables uncertainty quantification of predictions, while orthogonality in weight matrices enables smooth training. We develop orthogonal (quantum) versions of 3D convolutional neural networks and show that these models can successfully detect anomalies in 3D objects. To test the feasibility of incorporating quantum computers into a quantum-enhanced anomaly detection pipeline, we perform hardware experiments with our models on IBM's 127-qubit Brisbane device, testing the effect of noise and limited measurement shots.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
Authors:
Brian Thompson,
Nitika Mathur,
Daniel Deutsch,
Huda Khayrallah
Abstract:
Selecting an automatic metric that best emulates human annotators is often non-trivial, because there is no clear definition of "best emulates." A meta-metric is required to compare the human judgments to the automatic metric scores, and metric rankings depend on the choice of meta-metric. We propose Soft Pairwise Accuracy (SPA), a new meta-metric that builds on Pairwise Accuracy (PA) but incorpor…
▽ More
Selecting an automatic metric that best emulates human annotators is often non-trivial, because there is no clear definition of "best emulates." A meta-metric is required to compare the human judgments to the automatic metric scores, and metric rankings depend on the choice of meta-metric. We propose Soft Pairwise Accuracy (SPA), a new meta-metric that builds on Pairwise Accuracy (PA) but incorporates the statistical significance of both the human judgments and the metric scores. We show that SPA is more stable than PA with respect to changes in the number of systems/segments used for evaluation. We also show that PA can only assign a small set of distinct output values to metrics, and this results in many metrics being artificially assigned the exact same PA score. We demonstrate that SPA fixes this issue. Finally, we show that SPA is more discriminative than PA, producing more statistically significant comparisons between metrics. SPA was selected as the official system-level metric for the 2024 WMT Metrics Shared Task.
△ Less
Submitted 4 October, 2024; v1 submitted 14 September, 2024;
originally announced September 2024.
-
MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification
Authors:
Phu Pham,
Aradhya N. Mathur,
Ojaswa Sharma,
Aniket Bera
Abstract:
The field of text-to-3D content generation has made significant progress in generating realistic 3D objects, with existing methodologies like Score Distillation Sampling (SDS) offering promising guidance. However, these methods often encounter the "Janus" problem-multi-face ambiguities due to imprecise guidance. Additionally, while recent advancements in 3D gaussian splitting have shown its effica…
▽ More
The field of text-to-3D content generation has made significant progress in generating realistic 3D objects, with existing methodologies like Score Distillation Sampling (SDS) offering promising guidance. However, these methods often encounter the "Janus" problem-multi-face ambiguities due to imprecise guidance. Additionally, while recent advancements in 3D gaussian splitting have shown its efficacy in representing 3D volumes, optimization of this representation remains largely unexplored. This paper introduces a unified framework for text-to-3D content generation that addresses these critical gaps. Our approach utilizes multi-view guidance to iteratively form the structure of the 3D model, progressively enhancing detail and accuracy. We also introduce a novel densification algorithm that aligns gaussians close to the surface, optimizing the structural integrity and fidelity of the generated models. Extensive experiments validate our approach, demonstrating that it produces high-quality visual outputs with minimal time cost. Notably, our method achieves high-quality results within half an hour of training, offering a substantial efficiency gain over most existing methods, which require hours of training time to achieve comparable results.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Curvy: A Parametric Cross-section based Surface Reconstruction
Authors:
Aradhya N. Mathur,
Apoorv Khattar,
Ojaswa Sharma
Abstract:
In this work, we present a novel approach for reconstructing shape point clouds using planar sparse cross-sections with the help of generative modeling. We present unique challenges pertaining to the representation and reconstruction in this problem setting. Most methods in the classical literature lack the ability to generalize based on object class and employ complex mathematical machinery to re…
▽ More
In this work, we present a novel approach for reconstructing shape point clouds using planar sparse cross-sections with the help of generative modeling. We present unique challenges pertaining to the representation and reconstruction in this problem setting. Most methods in the classical literature lack the ability to generalize based on object class and employ complex mathematical machinery to reconstruct reliable surfaces. We present a simple learnable approach to generate a large number of points from a small number of input cross-sections over a large dataset. We use a compact parametric polyline representation using adaptive splitting to represent the cross-sections and perform learning using a Graph Neural Network to reconstruct the underlying shape in an adaptive manner reducing the dependence on the number of cross-sections provided.
△ Less
Submitted 1 September, 2024;
originally announced September 2024.
-
Categorizing Sources of Information for Explanations in Conversational AI Systems for Older Adults Aging in Place
Authors:
Niharika Mathur,
Tamara Zubatiy,
Elizabeth Mynatt
Abstract:
As the permeability of AI systems in interpersonal domains like the home expands, their technical capabilities of generating explanations are required to be aligned with user expectations for transparency and reasoning. This paper presents insights from our ongoing work in understanding the effectiveness of explanations in Conversational AI systems for older adults aging in place and their family…
▽ More
As the permeability of AI systems in interpersonal domains like the home expands, their technical capabilities of generating explanations are required to be aligned with user expectations for transparency and reasoning. This paper presents insights from our ongoing work in understanding the effectiveness of explanations in Conversational AI systems for older adults aging in place and their family caregivers. We argue that in collaborative and multi-user environments like the home, AI systems will make recommendations based on a host of information sources to generate explanations. These sources may be more or less salient based on user mental models of the system and the specific task. We highlight the need for cross technological collaboration between AI systems and other available sources of information in the home to generate multiple explanations for a single user query. Through example scenarios in a caregiving home setting, this paper provides an initial framework for categorizing these sources and informing a potential design space for AI explanations surrounding everyday tasks in the home.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Training-efficient density quantum machine learning
Authors:
Brian Coyle,
Snehal Raj,
Natansh Mathur,
El Amine Cherrat,
Nishant Jain,
Skander Kazdaghli,
Iordanis Kerenidis
Abstract:
Quantum machine learning (QML) requires powerful, flexible and efficiently trainable models to be successful in solving challenging problems. We introduce density quantum neural networks, a model family that prepares mixtures of trainable unitaries, with a distributional constraint over coefficients. This framework balances expressivity and efficient trainability, especially on quantum hardware. F…
▽ More
Quantum machine learning (QML) requires powerful, flexible and efficiently trainable models to be successful in solving challenging problems. We introduce density quantum neural networks, a model family that prepares mixtures of trainable unitaries, with a distributional constraint over coefficients. This framework balances expressivity and efficient trainability, especially on quantum hardware. For expressivity, the Hastings-Campbell Mixing lemma converts benefits from linear combination of unitaries into density models with similar performance guarantees but shallower circuits. For trainability, commuting-generator circuits enable density model construction with efficiently extractable gradients. The framework connects to various facets of QML including post-variational and measurement-based learning. In classical settings, density models naturally integrate the mixture of experts formalism, and offer natural overfitting mitigation. The framework is versatile - we uplift several quantum models into density versions to improve model performance, or trainability, or both. These include Hamming weight-preserving and equivariant models, among others. Extensive numerical experiments validate our findings.
△ Less
Submitted 23 May, 2025; v1 submitted 30 May, 2024;
originally announced May 2024.
-
CraftSVG: Multi-Object Text-to-SVG Synthesis via Layout Guided Diffusion
Authors:
Ayan Banerjee,
Nityanand Mathur,
Josep Llados,
Umapada Pal,
Anjan Dutta
Abstract:
Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects, rather than comprehensive scenes comprising multiple elements. In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vect…
▽ More
Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects, rather than comprehensive scenes comprising multiple elements. In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vector graphics depicting entire scenes from textual descriptions. Utilizing a pre-trained LLM for layout generation from text prompts, this framework introduces a technique for producing masked latents in specified bounding boxes for accurate object placement. It introduces a fusion mechanism for integrating attention maps and employs a diffusion U-Net for coherent composition, speeding up the drawing process. The resulting SVG is optimized using a pre-trained encoder and LPIPS loss with opacity modulation to maximize similarity. Additionally, this work explores the potential of primitive shapes in facilitating canvas completion in constrained environments. Through both qualitative and quantitative assessments, SVGCraft is demonstrated to surpass prior works in abstraction, recognizability, and detail, as evidenced by its performance metrics (CLIP-T: 0.4563, Cosine Similarity: 0.6342, Confusion: 0.66, Aesthetic: 6.7832). The code will be available at https://github.com/ayanban011/SVGCraft.
△ Less
Submitted 28 November, 2025; v1 submitted 30 March, 2024;
originally announced April 2024.
-
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models
Authors:
Shyam Marjit,
Harshit Singh,
Nityanand Mathur,
Sayak Paul,
Chia-Mu Yu,
Pin-Yu Chen
Abstract:
In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced se…
▽ More
In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced sensitivity to hyperparameters, leading to a compromise between parameter efficiency and the quality of T2I personalized image synthesis. Addressing these constraints, we introduce \textbf{\textit{DiffuseKronA}}, a novel Kronecker product-based adaptation module that not only significantly reduces the parameter count by 35\% and 99.947\% compared to LoRA-DreamBooth and the original DreamBooth, respectively, but also enhances the quality of image synthesis. Crucially, \textit{DiffuseKronA} mitigates the issue of hyperparameter sensitivity, delivering consistent high-quality generations across a wide range of hyperparameters, thereby diminishing the necessity for extensive fine-tuning. Furthermore, a more controllable decomposition makes \textit{DiffuseKronA} more interpretable and even can achieve up to a 50\% reduction with results comparable to LoRA-Dreambooth. Evaluated against diverse and complex input images and text prompts, \textit{DiffuseKronA} consistently outperforms existing models, producing diverse images of higher quality with improved fidelity and a more accurate color distribution of objects, all the while upholding exceptional parameter efficiency, thus presenting a substantial advancement in the field of T2I generative modeling. Our project page, consisting of links to the code, and pre-trained checkpoints, is available at https://diffusekrona.github.io/.
△ Less
Submitted 28 February, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
RL Dreams: Policy Gradient Optimization for Score Distillation based 3D Generation
Authors:
Aradhya N. Mathur,
Phu Pham,
Aniket Bera,
Ojaswa Sharma
Abstract:
3D generation has rapidly accelerated in the past decade owing to the progress in the field of generative modeling. Score Distillation Sampling (SDS) based rendering has improved 3D asset generation to a great extent. Further, the recent work of Denoising Diffusion Policy Optimization (DDPO) demonstrates that the diffusion process is compatible with policy gradient methods and has been demonstrate…
▽ More
3D generation has rapidly accelerated in the past decade owing to the progress in the field of generative modeling. Score Distillation Sampling (SDS) based rendering has improved 3D asset generation to a great extent. Further, the recent work of Denoising Diffusion Policy Optimization (DDPO) demonstrates that the diffusion process is compatible with policy gradient methods and has been demonstrated to improve the 2D diffusion models using an aesthetic scoring function. We first show that this aesthetic scorer acts as a strong guide for a variety of SDS-based methods and demonstrates its effectiveness in text-to-3D synthesis. Further, we leverage the DDPO approach to improve the quality of the 3D rendering obtained from 2D diffusion models. Our approach, DDPO3D, employs the policy gradient method in tandem with aesthetic scoring. To the best of our knowledge, this is the first method that extends policy gradient methods to 3D score-based rendering and shows improvement across SDS-based methods such as DreamGaussian, which are currently driving research in text-to-3D synthesis. Our approach is compatible with score distillation-based methods, which would facilitate the integration of diverse reward functions into the generative process. Our project page can be accessed via https://ddpo3d.github.io.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
CLIPDraw++: Text-to-Sketch Synthesis with Simple Primitives
Authors:
Nityanand Mathur,
Shyam Marjit,
Abhra Chaudhuri,
Anjan Dutta
Abstract:
With the goal of understanding the visual concepts that CLIP associates with text prompts, we show that the latent space of CLIP can be visualized solely in terms of linear transformations on simple geometric primitives like straight lines and circles. Although existing approaches achieve this by sketch-synthesis-through-optimization, they do so on the space of higher order Bézier curves, which ex…
▽ More
With the goal of understanding the visual concepts that CLIP associates with text prompts, we show that the latent space of CLIP can be visualized solely in terms of linear transformations on simple geometric primitives like straight lines and circles. Although existing approaches achieve this by sketch-synthesis-through-optimization, they do so on the space of higher order Bézier curves, which exhibit a wastefully large set of structures that they can evolve into, as most of them are non-essential for generating meaningful sketches. We present CLIPDraw++, an algorithm that provides significantly better visualizations for CLIP text embeddings, using only simple primitive shapes like straight lines and circles. This constrains the set of possible outputs to linear transformations on these primitives, thereby exhibiting an inherently simpler mathematical form. The synthesis process of CLIPDraw++ can be tracked end-to-end, with each visual concept being expressed exclusively in terms of primitives. Project Page: https://clipdrawx.github.io/.
△ Less
Submitted 8 July, 2025; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Improved Financial Forecasting via Quantum Machine Learning
Authors:
Sohum Thakkar,
Skander Kazdaghli,
Natansh Mathur,
Iordanis Kerenidis,
André J. Ferreira-Martins,
Samurai Brito
Abstract:
Quantum algorithms have the potential to enhance machine learning across a variety of domains and applications. In this work, we show how quantum machine learning can be used to improve financial forecasting. First, we use classical and quantum Determinantal Point Processes to enhance Random Forest models for churn prediction, improving precision by almost 6%. Second, we design quantum neural netw…
▽ More
Quantum algorithms have the potential to enhance machine learning across a variety of domains and applications. In this work, we show how quantum machine learning can be used to improve financial forecasting. First, we use classical and quantum Determinantal Point Processes to enhance Random Forest models for churn prediction, improving precision by almost 6%. Second, we design quantum neural network architectures with orthogonal and compound layers for credit risk assessment, which match classical performance with significantly fewer parameters. Our results demonstrate that leveraging quantum ideas can effectively enhance the performance of machine learning, both today as quantum-inspired classical ML solutions, and even more in the future, with the advent of better quantum hardware.
△ Less
Submitted 3 April, 2024; v1 submitted 31 May, 2023;
originally announced June 2023.
-
Assessing New Hires' Programming Productivity Through UMETRIX -- An Industry Case Study
Authors:
Sai Anirudh Karre,
Neeraj Mathur,
Y. Raghu Reddy
Abstract:
New hires (novice or experienced) usually undergo an onboarding program for a specific period to get acquainted with the processes of the hiring organization to reach expected programming productivity levels. This paper presents a programming productivity framework developed as an outcome of a three-year-long industry study with small to medium-scale organizations using a usability evaluation and…
▽ More
New hires (novice or experienced) usually undergo an onboarding program for a specific period to get acquainted with the processes of the hiring organization to reach expected programming productivity levels. This paper presents a programming productivity framework developed as an outcome of a three-year-long industry study with small to medium-scale organizations using a usability evaluation and code recommendation tool, UMETRIX, to manage new hire programming productivity. We developed a programming productivity framework around this tool called "Utpada" Participating organizations expressed strong interest in relying on this programming productivity framework to assess the skill gap among new hires. It helped identify under-performers early and strategize their upskill plan per their business needs. The participating organizations have seen an 89% rise in quality code contributions by new hires during their probation period compared to traditional new hires'. This framework is reproducible for any new-hire team size and can be easily integrated into existing programming productivity improvement programs.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Quantum Vision Transformers
Authors:
El Amine Cherrat,
Iordanis Kerenidis,
Natansh Mathur,
Jonas Landman,
Martin Strahm,
Yun Yvonna Li
Abstract:
In this work, quantum transformers are designed and analysed in detail by extending the state-of-the-art classical transformer neural network architectures known to be very performant in natural language processing and image analysis. Building upon the previous work, which uses parametrised quantum circuits for data loading and orthogonal neural layers, we introduce three types of quantum transfor…
▽ More
In this work, quantum transformers are designed and analysed in detail by extending the state-of-the-art classical transformer neural network architectures known to be very performant in natural language processing and image analysis. Building upon the previous work, which uses parametrised quantum circuits for data loading and orthogonal neural layers, we introduce three types of quantum transformers for training and inference, including a quantum transformer based on compound matrices, which guarantees a theoretical advantage of the quantum attention mechanism compared to their classical counterpart both in terms of asymptotic run time and the number of model parameters. These quantum architectures can be built using shallow quantum circuits and produce qualitatively different classification models. The three proposed quantum attention layers vary on the spectrum between closely following the classical transformers and exhibiting more quantum characteristics. As building blocks of the quantum transformer, we propose a novel method for loading a matrix as quantum states as well as two new trainable quantum orthogonal layers adaptable to different levels of connectivity and quality of quantum computers. We performed extensive simulations of the quantum transformers on standard medical image datasets that showed competitively, and at times better performance compared to the classical benchmarks, including the best-in-class classical vision transformers. The quantum transformers we trained on these small-scale datasets require fewer parameters compared to standard classical benchmarks. Finally, we implemented our quantum transformers on superconducting quantum computers and obtained encouraging results for up to six qubit experiments.
△ Less
Submitted 20 February, 2024; v1 submitted 16 September, 2022;
originally announced September 2022.
-
LIFI: Towards Linguistically Informed Frame Interpolation
Authors:
Aradhya Neeraj Mathur,
Devansh Batra,
Yaman Kumar,
Rajiv Ratn Shah,
Roger Zimmermann
Abstract:
In this work, we explore a new problem of frame interpolation for speech videos. Such content today forms the major form of online communication. We try to solve this problem by using several deep learning video generation algorithms to generate the missing frames. We also provide examples where computer vision models despite showing high performance on conventional non-linguistic metrics fail to…
▽ More
In this work, we explore a new problem of frame interpolation for speech videos. Such content today forms the major form of online communication. We try to solve this problem by using several deep learning video generation algorithms to generate the missing frames. We also provide examples where computer vision models despite showing high performance on conventional non-linguistic metrics fail to accurately produce faithful interpolation of speech. With this motivation, we provide a new set of linguistically-informed metrics specifically targeted to the problem of speech videos interpolation. We also release several datasets to test computer vision video generation models of their speech understanding.
△ Less
Submitted 2 December, 2020; v1 submitted 30 October, 2020;
originally announced October 2020.
-
Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics
Authors:
Nitika Mathur,
Timothy Baldwin,
Trevor Cohn
Abstract:
Automatic metrics are fundamental for the development and evaluation of machine translation systems. Judging whether, and to what extent, automatic metrics concur with the gold standard of human evaluation is not a straightforward problem. We show that current methods for judging metrics are highly sensitive to the translations used for assessment, particularly the presence of outliers, which ofte…
▽ More
Automatic metrics are fundamental for the development and evaluation of machine translation systems. Judging whether, and to what extent, automatic metrics concur with the gold standard of human evaluation is not a straightforward problem. We show that current methods for judging metrics are highly sensitive to the translations used for assessment, particularly the presence of outliers, which often leads to falsely confident conclusions about a metric's efficacy. Finally, we turn to pairwise system ranking, developing a method for thresholding performance improvement under an automatic metric against human judgements, which allows quantification of type I versus type II errors incurred, i.e., insignificant human differences in system quality that are accepted, and significant human differences that are rejected. Together, these findings suggest improvements to the protocols for metric evaluation and system performance evaluation in machine translation.
△ Less
Submitted 12 June, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Multimodal Medical Volume Colorization from 2D Style
Authors:
Aradhya Neeraj Mathur,
Apoorv Khattar,
Ojaswa Sharma
Abstract:
Colorization involves the synthesis of colors on a target image while preserving structural content as well as the semantics of the target image. This is a well-explored problem in 2D with many state-of-the-art solutions. We propose a novel deep learning-based approach for the colorization of 3D medical volumes. Our system is capable of directly mapping the colors of a 2D photograph to a 3D MRI vo…
▽ More
Colorization involves the synthesis of colors on a target image while preserving structural content as well as the semantics of the target image. This is a well-explored problem in 2D with many state-of-the-art solutions. We propose a novel deep learning-based approach for the colorization of 3D medical volumes. Our system is capable of directly mapping the colors of a 2D photograph to a 3D MRI volume in real-time, producing a high-fidelity color volume suitable for photo-realistic visualization. Since this work is first of its kind, we discuss the full pipeline in detail and the challenges that it brings for 3D medical data. The colorization of medical MRI volume also entails modality conversion that highlights the robustness of our approach in handling multi-modal data.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
Load Balancing Optimization in LTE/LTE-A Cellular Networks: A Review
Authors:
Sumita Mishra,
Nidhi Mathur
Abstract:
During the past few decades wireless technology has seen a tremendous growth. The recent introduction of high-end mobile devices has further increased subscriber's demand for high bandwidth. Current cellular systems require manual configuration and management of networks, which is now costly, time consuming and error prone due to exponentially increasing rate of mobile users and nodes. This leads…
▽ More
During the past few decades wireless technology has seen a tremendous growth. The recent introduction of high-end mobile devices has further increased subscriber's demand for high bandwidth. Current cellular systems require manual configuration and management of networks, which is now costly, time consuming and error prone due to exponentially increasing rate of mobile users and nodes. This leads to introduction of self organizing capabilities for network management with minimum human involvement. It is expected to permit higher end user Quality of Service (QoS) along with less operational and maintenance cost for telecom service providers. Self organized cellular networks incorporate a collection of functions for automatic configuration, optimization and maintenance of cellular networks. As mobile end users continue to use network resources while moving from a cell boundary to other, traffic load within a cell does not remain constant. Thus Load balancing as a part of self organized network solution, has become one of the most active and emerging fields of research in Cellular Network. It involves transfer of load from overloaded cells to the neighbouring cells with free resources for more balanced load distribution in order to maintain appropriate end-user experience and network performance. In this paper, review of various load balancing techniques currently used in mobile networks is presented, with special emphasis on techniques that are suitable for self optimization feature in future cellular networks.
△ Less
Submitted 23 December, 2014;
originally announced December 2014.