arXiv:2603.20021 [pdf, ps, other]

ODySSeI: An Open-Source End-to-End Framework for Automated Detection, Segmentation, and Severity Estimation of Lesions in Invasive Coronary Angiography Images

Authors: Anand Choudhary, Xiaowu Sun, Thabo Mahendiran, Ortal Senouf, Denise Auberson, Bernard De Bruyne, Stephane Fournier, Olivier Muller, Emmanuel Abbé, Pascal Frossard, Dorina Thanou

Abstract: Invasive Coronary Angiography (ICA) is the clinical gold standard for the assessment of coronary artery disease. However, its interpretation remains subjective and prone to intra- and inter-operator variability. In this work, we introduce ODySSeI: an Open-source end-to-end framework for automated Detection, Segmentation, and Severity estimation of lesions in ICA images. ODySSeI integrates deep lea… ▽ More Invasive Coronary Angiography (ICA) is the clinical gold standard for the assessment of coronary artery disease. However, its interpretation remains subjective and prone to intra- and inter-operator variability. In this work, we introduce ODySSeI: an Open-source end-to-end framework for automated Detection, Segmentation, and Severity estimation of lesions in ICA images. ODySSeI integrates deep learning-based lesion detection and lesion segmentation models trained using a novel Pyramidal Augmentation Scheme (PAS) to enhance robustness and real-time performance across diverse patient cohorts (2149 patients from Europe, North America, and Asia). Furthermore, we propose a quantitative coronary angiography-free Lesion Severity Estimation (LSE) technique that directly computes the Minimum Lumen Diameter (MLD) and diameter stenosis from the predicted lesion geometry. Extensive evaluation on both in-distribution and out-of-distribution clinical datasets demonstrates ODySSeI's strong generalizability. Our PAS yields large performance gains in highly complex tasks as compared to relatively simpler ones, notably, a 2.5-fold increase in lesion detection performance versus a 1-3\% increase in lesion segmentation performance over their respective baselines. Our LSE technique achieves high accuracy, with predicted MLD values differing by only $\pm$ 2-3 pixels from the corresponding ground truths. On average, ODySSeI processes a raw ICA image within only a few seconds on a CPU and in a fraction of a second on a GPU and is available as a plug-and-play web interface at swisscardia.epfl.ch. Overall, this work establishes ODySSeI as a comprehensive and open-source framework which supports automated, reproducible, and scalable ICA analysis for real-time clinical decision-making. △ Less

Submitted 20 March, 2026; originally announced March 2026.

arXiv:2603.10904 [pdf, ps, other]

When Fine-Tuning Fails and when it Generalises: Role of Data Diversity and Mixed Training in LLM-based TTS

Authors: Anupam Purwar, Aditya Choudhary

Abstract: Large language models are increasingly adopted as semantic backbones for neural text-to-speech systems. However, frozen LLM representations are insufficient for modeling speaker specific acoustic and perceptual characteristics. Our experiments involving fine tuning of the Language Model backbone of TTS show promise in improving the voice consistency and Signal to Noise ratio SNR in voice cloning t… ▽ More Large language models are increasingly adopted as semantic backbones for neural text-to-speech systems. However, frozen LLM representations are insufficient for modeling speaker specific acoustic and perceptual characteristics. Our experiments involving fine tuning of the Language Model backbone of TTS show promise in improving the voice consistency and Signal to Noise ratio SNR in voice cloning task. Across multiple speakers LoRA finetuning consistently outperforms the non-finetuned base Qwen-0.5B model across three complementary dimensions of speech quality. First, perceptual quality improves significantly with DNS-MOS gains of up to 0.42 points for speakers whose training data exhibits sufficient acoustic variability. Second, speaker fidelity improves for all evaluated speakers with consistent increases in voice similarity indicating that LoRA effectively adapts speaker identity representations without degrading linguistic modeling. Third, signal level quality improves in most cases with signal to noise ratio increasing by as much as 34 percent. Crucially these improvements are strongly governed by the characteristics of the training data. Speakers with high variability in acoustic energy and perceptual quality achieve simultaneous gains in DNS-MOS voice similarity and SNR. Overall this work establishes that LoRA finetuning is not merely a parameter efficient optimization technique but an effective mechanism for better speaker level adaptation in compact LLM-based TTS systems. When supported by sufficiently diverse training data LoRA adapted Qwen-0.5B consistently surpasses its frozen base model in perceptual quality speaker similarity with low latency using GGUF model hosted in quantized form. △ Less

Submitted 11 March, 2026; originally announced March 2026.

Comments: We finetune the Qwen 0.5B backbone in an LLM TTS with LoRA to raise MOS speaker similarity and SNR. It works best with diverse training audio with uniform data it can amplify noise so tune decoding and use GGUF quantization for low latency stable quality

arXiv:2603.09643 [pdf, ps, other]

MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

Authors: Anupam Purwar, Aditya Choudhary

Abstract: Current evaluation frameworks and benchmarks for LLM powered agents focus on text chat driven agents, these frameworks do not expose the persona of user to the agent, thus operating in a user agnostic environment. Importantly, in customer experience management domain, the agent's behaviour evolves as the agent learns about user personality. With proliferation of real time TTS and multi-modal langu… ▽ More Current evaluation frameworks and benchmarks for LLM powered agents focus on text chat driven agents, these frameworks do not expose the persona of user to the agent, thus operating in a user agnostic environment. Importantly, in customer experience management domain, the agent's behaviour evolves as the agent learns about user personality. With proliferation of real time TTS and multi-modal language models, LLM based agents are gradually going to become multi-modal. Towards this, we propose the MM-tau-p$^2$ benchmark with metrics for evaluating the robustness of multi-modal agents in dual control setting with and without persona adaption of user, while also taking user inputs in the planning process to resolve a user query. In particular, our work shows that even with state of-the-art frontier LLMs like GPT-5, GPT 4.1, there are additional considerations measured using metrics viz. multi-modal robustness, turn overhead while introducing multi-modality into LLM based agents. Overall, MM-tau-p$^2$ builds on our prior work FOCAL and provides a holistic way of evaluating multi-modal agents in an automated way by introducing 12 novel metrics. We also provide estimates of these metrics on the telecom and retail domains by using the LLM-as-judge approach using carefully crafted prompts with well defined rubrics for evaluating each conversation. △ Less

Submitted 7 April, 2026; v1 submitted 10 March, 2026; originally announced March 2026.

Comments: A benchmark for evaluating multimodal both voice and text LLM agents in dualcontrol settings. We introduce persona adaptive prompting and 12 new metrics to assess robustness safety efficiency and recovery in customer support scenarios

arXiv:2602.09162 [pdf, ps, other]

Boltzmann Reinforcement Learning for Noise resilience in Analog Ising Machines

Authors: Aditya Choudhary, Saaketh Desai, Prasad Iyer

Abstract: Analog Ising machines (AIMs) have emerged as a promising paradigm for combinatorial optimization, utilizing physical dynamics to solve Ising problems with high energy efficiency. However, the performance of traditional optimization and sampling algorithms on these platforms is often limited by inherent measurement noise. We introduce BRAIN (Boltzmann Reinforcement for Analog Ising Networks), a dis… ▽ More Analog Ising machines (AIMs) have emerged as a promising paradigm for combinatorial optimization, utilizing physical dynamics to solve Ising problems with high energy efficiency. However, the performance of traditional optimization and sampling algorithms on these platforms is often limited by inherent measurement noise. We introduce BRAIN (Boltzmann Reinforcement for Analog Ising Networks), a distribution learning framework that utilizes variational reinforcement learning to approximate the Boltzmann distribution. By shifting from state-by-state sampling to aggregating information across multiple noisy measurements, BRAIN is resilient to Gaussian noise characteristic of AIMs. We evaluate BRAIN across diverse combinatorial topologies, including the Curie-Weiss and 2D nearest-neighbor Ising systems. We find that under realistic 3\% Gaussian measurement noise, BRAIN maintains 98\% ground state fidelity, whereas Markov Chain Monte Carlo (MCMC) methods degrade to 51\% fidelity. Furthermore, BRAIN reaches the MCMC-equivalent solution up to 192x faster under these conditions. BRAIN exhibits $\mathcal{O}(N^{1.55})$ scaling up to 65,536 spins and maintains robustness against severe measurement uncertainty up to 40\%. Beyond ground state optimization, BRAIN accurately captures thermodynamic phase transitions and metastable states, providing a scalable and noise-resilient method for utilizing analog computing architectures in complex optimizations. △ Less

Submitted 9 February, 2026; originally announced February 2026.

arXiv:2601.07367 [pdf, ps, other]

FOCAL: A Novel Benchmarking Technique for Multi-modal Agents

Authors: Anupam Purwar, Aditya Choudhary

Abstract: With the recent advancements in reasoning capabilities, tool calling using MCP servers and Audio Language Models (ALMs), development and integration of multi-modal agents (with voice and text support) has come to the industry forefront. Cascading pipelines for voice agents still play a central role in the industry owing to their superior reasoning capabilities facilitated by LLMs. Although, cascad… ▽ More With the recent advancements in reasoning capabilities, tool calling using MCP servers and Audio Language Models (ALMs), development and integration of multi-modal agents (with voice and text support) has come to the industry forefront. Cascading pipelines for voice agents still play a central role in the industry owing to their superior reasoning capabilities facilitated by LLMs. Although, cascading pipelines often present error propagation through the pipeline. We propose a framework, FOCAL to benchmark end-to-end reasoning, component-wise error propagation and error analysis for automated as well as human-assisted testing of multi-modal agents (voice to voice + text input). We also share two novel metrics viz. Reasoning and Semantic scores to evaluate efficacy of the agent in having meaningful conversations in voice mode. △ Less

Submitted 2 March, 2026; v1 submitted 12 January, 2026; originally announced January 2026.

Comments: We present a framework for evaluation of Multi-modal Agents consisting of Voice-to-voice model components viz. Text to Speech (TTS), Retrieval Augmented Generation (RAG) and Speech-to-text (STT)

arXiv:2512.23747 [pdf, ps, other]

State-of-the-art Small Language Coder Model: Mify-Coder

Authors: Abhinav Parmar, Abhisek Panigrahi, Abhishek Kumar Dwivedi, Abhishek Bhattacharya, Adarsh Ramachandra, Aditya Choudhary, Aditya Garg, Aditya Raj, Alankrit Bhatt, Alpesh Yadav, Anant Vishnu, Ananthu Pillai, Ankush Kumar, Aryan Patnaik, Aswatha Narayanan S, Avanish Raj Singh, Bhavya Shree Gadda, Brijesh Pankajbhai Kachhadiya, Buggala Jahnavi, Chidurala Nithin Krishna, Chintan Shah, Chunduru Akshaya, Debarshi Banerjee, Debrup Dey, Deepa R. , et al. (71 additional authors not shown)

Abstract: We present Mify-Coder, a 2.5B-parameter code model trained on 4.2T tokens using a compute-optimal strategy built on the Mify-2.5B foundation model. Mify-Coder achieves comparable accuracy and safety while significantly outperforming much larger baseline models on standard coding and function-calling benchmarks, demonstrating that compact models can match frontier-grade models in code generation an… ▽ More We present Mify-Coder, a 2.5B-parameter code model trained on 4.2T tokens using a compute-optimal strategy built on the Mify-2.5B foundation model. Mify-Coder achieves comparable accuracy and safety while significantly outperforming much larger baseline models on standard coding and function-calling benchmarks, demonstrating that compact models can match frontier-grade models in code generation and agent-driven workflows. Our training pipeline combines high-quality curated sources with synthetic data generated through agentically designed prompts, refined iteratively using enterprise-grade evaluation datasets. LLM-based quality filtering further enhances data density, enabling frugal yet effective training. Through disciplined exploration of CPT-SFT objectives, data mixtures, and sampling dynamics, we deliver frontier-grade code intelligence within a single continuous training trajectory. Empirical evidence shows that principled data and compute discipline allow smaller models to achieve competitive accuracy, efficiency, and safety compliance. Quantized variants of Mify-Coder enable deployment on standard desktop environments without requiring specialized hardware. △ Less

Submitted 26 December, 2025; originally announced December 2025.

arXiv:2510.25170 [pdf, ps, other]

Multi-Resolution Model Fusion for Accelerating the Convolutional Neural Network Training

Authors: Kewei Wang, Claire Songhyun Lee, Sunwoo Lee, Vishu Gupta, Jan Balewski, Alex Sim, Peter Nugent, Ankit Agrawal, Alok Choudhary, Kesheng Wu, Wei-keng Liao

Abstract: Neural networks are rapidly gaining popularity in scientific research, but training the models is often very time-consuming. Particularly when the training data samples are large high-dimensional arrays, efficient training methodologies that can reduce the computational costs are crucial. To reduce the training cost, we propose a Multi-Resolution Model Fusion (MRMF) method that combines models tra… ▽ More Neural networks are rapidly gaining popularity in scientific research, but training the models is often very time-consuming. Particularly when the training data samples are large high-dimensional arrays, efficient training methodologies that can reduce the computational costs are crucial. To reduce the training cost, we propose a Multi-Resolution Model Fusion (MRMF) method that combines models trained on reduced-resolution data and then refined with data in the original resolution. We demonstrate that these reduced-resolution models and datasets could be generated quickly. More importantly, the proposed approach reduces the training time by speeding up the model convergence in each fusion stage before switching to the final stage of finetuning with data in its original resolution. This strategy ensures the final model retains high-resolution insights while benefiting from the computational efficiency of lower-resolution training. Our experiment results demonstrate that the multi-resolution model fusion method can significantly reduce end-to-end training time while maintaining the same model accuracy. Evaluated using two real-world scientific applications, CosmoFlow and Neuron Inverter, the proposed method improves the training time by up to 47% and 44%, respectively, as compared to the original resolution training, while the model accuracy is not affected. △ Less

Submitted 29 October, 2025; originally announced October 2025.

arXiv:2510.19778 [pdf, ps, other]

GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters

Authors: Anand Choudhary, Yasser Sulaıman, Lukas Mauch, Ghouthi Boukli Hacene, Fabien Cardinaux, Antoine Bosselut

Abstract: Sparse fine-tuning techniques adapt LLMs to downstream tasks by only tuning a sparse subset of model parameters. However, the effectiveness of sparse adaptation depends on optimally selecting the model parameters to be fine-tuned. In this work, we introduce a novel sparse fine-tuning technique named GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters, which fine-tunes only those mod… ▽ More Sparse fine-tuning techniques adapt LLMs to downstream tasks by only tuning a sparse subset of model parameters. However, the effectiveness of sparse adaptation depends on optimally selecting the model parameters to be fine-tuned. In this work, we introduce a novel sparse fine-tuning technique named GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters, which fine-tunes only those model parameters which have the largest gradient magnitudes on downstream tasks and the smallest pre-trained magnitudes, intuitively prioritizing parameters that are highly task-relevant, but minimally disruptive to pre-trained knowledge. Our experimentation with LLaMA3 8B and Gemma 2B as base models shows that GaLLoP consistently improves or matches the in-distribution as well as out-of-distribution performance obtained via the usage of other leading parameter-efficient fine-tuning techniques, including LoRA, DoRA, and SAFT. Our analysis demonstrates that GaLLoP mitigates catastrophic forgetting and memorization of task data, as important pre-trained parameters remain unchanged, and stabilizes performance relative to other fine-tuning techniques, robustly generalizing across most random seeds. △ Less

Submitted 22 October, 2025; originally announced October 2025.

arXiv:2510.04923 [pdf, ps, other]

REN: Anatomically-Informed Mixture-of-Experts for Interstitial Lung Disease Diagnosis

Authors: Alec K. Peltekian, Halil Ertugrul Aktas, Gorkem Durak, Kevin Grudzinski, Bradford C. Bemiss, Carrie Richardson, Jane E. Dematte, G. R. Scott Budinger, Anthony J. Esposito, Alexander Misharin, Alok Choudhary, Ankit Agrawal, Ulas Bagci

Abstract: Mixture-of-Experts (MoE) architectures achieve scalable learning by routing inputs to specialized subnetworks through conditional computation. However, conventional MoE designs assume homogeneous expert capability and domain-agnostic routing-assumptions that are fundamentally misaligned with medical imaging, where anatomical structure and regional disease heterogeneity govern pathological patterns… ▽ More Mixture-of-Experts (MoE) architectures achieve scalable learning by routing inputs to specialized subnetworks through conditional computation. However, conventional MoE designs assume homogeneous expert capability and domain-agnostic routing-assumptions that are fundamentally misaligned with medical imaging, where anatomical structure and regional disease heterogeneity govern pathological patterns. We introduce Regional Expert Networks (REN), the first anatomically-informed MoE framework for medical image classification. REN encodes anatomical priors by training seven specialized experts, each dedicated to a distinct lung lobe or bilateral lung combination, enabling precise modeling of region-specific pathological variation. Multi-modal gating mechanisms dynamically integrate radiomics biomarkers with deep learning (DL) features extracted by convolutional (CNN), Transformer (ViT), and state-space (Mamba) architectures to weight expert contributions at inference. Applied to interstitial lung disease (ILD) classification on a 597-patient, 1,898-scan longitudinal cohort, REN achieves consistently superior performance: the radiomics-guided ensemble attains an average AUC of 0.8646 +- 0.0467, a +12.5 % improvement over the SwinUNETR single-model baseline (AUC 0.7685, p=0.031). Lower-lobe experts reach AUCs of 0.88-0.90, outperforming DL baselines (CNN: 0.76-0.79) and mirroring known patterns of basal ILD progression. Evaluated under rigorous patient-level cross-validation, REN demonstrates strong generalizability and clinical interpretability, establishing a scalable, anatomically-guided framework potentially extensible to other structured medical imaging tasks. Code is available on our GitHub https://github.com/NUBagciLab/MoE-REN. △ Less

Submitted 30 March, 2026; v1 submitted 6 October, 2025; originally announced October 2025.

Comments: 13 pages, 4 figures, 5 tables

arXiv:2509.23530 [pdf, ps, other]

doi 10.1007/978-3-032-07904-6_2

Imaging-Based Mortality Prediction in Patients with Systemic Sclerosis

Authors: Alec K. Peltekian, Karolina Senkow, Gorkem Durak, Kevin M. Grudzinski, Bradford C. Bemiss, Jane E. Dematte, Carrie Richardson, Nikolay S. Markov, Mary Carns, Kathleen Aren, Alexandra Soriano, Matthew Dapas, Harris Perlman, Aaron Gundersheimer, Kavitha C. Selvan, John Varga, Monique Hinchcliff, Krishnan Warrior, Catherine A. Gao, Richard G. Wunderink, GR Scott Budinger, Alok N. Choudhary, Anthony J. Esposito, Alexander V. Misharin, Ankit Agrawal , et al. (1 additional authors not shown)

Abstract: Interstitial lung disease (ILD) is a leading cause of morbidity and mortality in systemic sclerosis (SSc). Chest computed tomography (CT) is the primary imaging modality for diagnosing and monitoring lung complications in SSc patients. However, its role in disease progression and mortality prediction has not yet been fully clarified. This study introduces a novel, large-scale longitudinal chest CT… ▽ More Interstitial lung disease (ILD) is a leading cause of morbidity and mortality in systemic sclerosis (SSc). Chest computed tomography (CT) is the primary imaging modality for diagnosing and monitoring lung complications in SSc patients. However, its role in disease progression and mortality prediction has not yet been fully clarified. This study introduces a novel, large-scale longitudinal chest CT analysis framework that utilizes radiomics and deep learning to predict mortality associated with lung complications of SSc. We collected and analyzed 2,125 CT scans from SSc patients enrolled in the Northwestern Scleroderma Registry, conducting mortality analyses at one, three, and five years using advanced imaging analysis techniques. Death labels were assigned based on recorded deaths over the one-, three-, and five-year intervals, confirmed by expert physicians. In our dataset, 181, 326, and 428 of the 2,125 CT scans were from patients who died within one, three, and five years, respectively. Using ResNet-18, DenseNet-121, and Swin Transformer we use pre-trained models, and fine-tuned on 2,125 images of SSc patients. Models achieved an AUC of 0.769, 0.801, 0.709 for predicting mortality within one-, three-, and five-years, respectively. Our findings highlight the potential of both radiomics and deep learning computational methods to improve early detection and risk assessment of SSc-related interstitial lung disease, marking a significant advancement in the literature. △ Less

Submitted 27 September, 2025; originally announced September 2025.

Comments: 11 pages, 4 figures, 1 table, accepted in MICCAI PRIME 2025

Journal ref: MICCAI PRIME 2025

arXiv:2509.20971 [pdf, ps, other]

i-LAVA: Insights on Low Latency Voice-2-Voice Architecture for Agents

Authors: Anupam Purwar, Aditya Choudhary

Abstract: We experiment with a low-latency, end-to-end voice-to-voice communication model to optimize it for real-time conversational applications. By analyzing components essential to voice to voice (V-2-V) system viz. automatic speech recognition (ASR), text-to-speech (TTS), and dialog management, our work analyzes how to reduce processing time while maintaining high-quality interactions to identify the l… ▽ More We experiment with a low-latency, end-to-end voice-to-voice communication model to optimize it for real-time conversational applications. By analyzing components essential to voice to voice (V-2-V) system viz. automatic speech recognition (ASR), text-to-speech (TTS), and dialog management, our work analyzes how to reduce processing time while maintaining high-quality interactions to identify the levers for optimizing V-2-V system. Our work identifies that TTS component which generates life-like voice, full of emotions including natural pauses and exclamations has highest impact on Real time factor (RTF). The experimented V-2-V architecture utilizes CSM1b has the capability to understand tone as well as context of conversation by ingesting both audio and text of prior exchanges to generate contextually accurate speech. We explored optimization of Residual Vector Quantization (RVQ) iterations by the TTS decoder which come at a cost of decrease in the quality of voice generated. Our experimental evaluations also demonstrate that for V-2-V implementations based on CSM most important optimizations can be brought by reducing the number of RVQ Iterations along with the codebooks used in Mimi. △ Less

Submitted 27 September, 2025; v1 submitted 25 September, 2025; originally announced September 2025.

Comments: This paper analyzes a low-latency, end-to-end voice-to-voice (V-2-V) architecture, identifying that the Text-to-Speech (TTS) component has the highest impact on real-time performance. By reducing the number of Residual Vector Quantization (RVQ) iterations in the TTS model, latency can be effectively halved. Its accepted at AIML Systems 2025

arXiv:2508.06627 [pdf, ps, other]

Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Records

Authors: Mosbah Aouad, Anirudh Choudhary, Awais Farooq, Steven Nevers, Lusine Demirkhanyan, Bhrandon Harris, Suguna Pappu, Christopher Gondi, Ravishankar Iyer

Abstract: Pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest cancers, and early detection remains a major clinical challenge due to the absence of specific symptoms and reliable biomarkers. In this work, we propose a new multimodal approach that integrates longitudinal diagnosis code histories and routinely collected laboratory measurements from electronic health records to detect PDAC up to on… ▽ More Pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest cancers, and early detection remains a major clinical challenge due to the absence of specific symptoms and reliable biomarkers. In this work, we propose a new multimodal approach that integrates longitudinal diagnosis code histories and routinely collected laboratory measurements from electronic health records to detect PDAC up to one year prior to clinical diagnosis. Our method combines neural controlled differential equations to model irregular lab time series, pretrained language models and recurrent networks to learn diagnosis code trajectory representations, and cross-attention mechanisms to capture interactions between the two modalities. We develop and evaluate our approach on a real-world dataset of nearly 4,700 patients and achieve significant improvements in AUC ranging from 6.5% to 15.5% over state-of-the-art methods. Furthermore, our model identifies diagnosis codes and laboratory panels associated with elevated PDAC risk, including both established and new biomarkers. Our code is available at https://github.com/MosbahAouad/EarlyPDAC-MML. △ Less

Submitted 18 August, 2025; v1 submitted 8 August, 2025; originally announced August 2025.

Journal ref: Proceedings of Machine Learning for Healthcare (2025)

arXiv:2508.02610 [pdf, ps, other]

doi 10.1145/3663547.3746365

PunchPulse: A Physically Demanding Virtual Reality Boxing Game Designed with, for and by Blind and Low-Vision Players

Authors: Sanchita S. Kamath, Omar Khan, Anurag Choudhary, Jan Meyerhoff-Liang, Soyoung Choi, JooYoung Seo

Abstract: Blind and low-vision (BLV) individuals experience lower levels of physical activity (PA) compared to sighted peers due to a lack of accessible, engaging exercise options. Existing solutions often rely on auditory cues but do not fully integrate rich sensory feedback or support spatial navigation, limiting their effectiveness. This study introduces PunchPulse, a virtual reality (VR) boxing exergame… ▽ More Blind and low-vision (BLV) individuals experience lower levels of physical activity (PA) compared to sighted peers due to a lack of accessible, engaging exercise options. Existing solutions often rely on auditory cues but do not fully integrate rich sensory feedback or support spatial navigation, limiting their effectiveness. This study introduces PunchPulse, a virtual reality (VR) boxing exergame designed to motivate BLV users to reach and sustain moderate to vigorous physical activity (MVPA) levels. Over a seven-month, multi-phased study, PunchPulse was iteratively refined with three BLV co-designers, informed by two early pilot testers, and evaluated by six additional BLV user-study participants. Data collection included both qualitative (researcher observations, SOPI) and quantitative (MVPA zones, aid usage, completion times) measures of physical exertion and gameplay performance. The user study revealed that all participants reached moderate MVPA thresholds, with high levels of immersion and engagement observed. This work demonstrates the potential of VR as an inclusive medium for promoting meaningful PA in the BLV community and addresses a critical gap in accessible, intensity-driven exercise interventions. △ Less

Submitted 4 August, 2025; originally announced August 2025.

arXiv:2506.17309 [pdf, ps, other]

Efficient Malware Detection with Optimized Learning on High-Dimensional Features

Authors: Aditya Choudhary, Sarthak Pawar, Yashodhara Haribhakta

Abstract: Malware detection using machine learning requires feature extraction from binary files, as models cannot process raw binaries directly. A common approach involves using LIEF for raw feature extraction and the EMBER vectorizer to generate 2381-dimensional feature vectors. However, the high dimensionality of these features introduces significant computational challenges. This study addresses these c… ▽ More Malware detection using machine learning requires feature extraction from binary files, as models cannot process raw binaries directly. A common approach involves using LIEF for raw feature extraction and the EMBER vectorizer to generate 2381-dimensional feature vectors. However, the high dimensionality of these features introduces significant computational challenges. This study addresses these challenges by applying two dimensionality reduction techniques: XGBoost-based feature selection and Principal Component Analysis (PCA). We evaluate three reduced feature dimensions (128, 256, and 384), which correspond to approximately 5.4%, 10.8%, and 16.1% of the original 2381 features, across four models-XGBoost, LightGBM, Extra Trees, and Random Forest-using a unified training, validation, and testing split formed from the EMBER-2018, ERMDS, and BODMAS datasets. This approach ensures generalization and avoids dataset bias. Experimental results show that LightGBM trained on the 384-dimensional feature set after XGBoost feature selection achieves the highest accuracy of 97.52% on the unified dataset, providing an optimal balance between computational efficiency and detection performance. The best model, trained in 61 minutes using 30 GB of RAM and 19.5 GB of disk space, generalizes effectively to completely unseen datasets, maintaining 95.31% accuracy on TRITIUM and 93.98% accuracy on INFERNO. These findings present a scalable, compute-efficient approach for malware detection without compromising accuracy. △ Less

Submitted 18 June, 2025; originally announced June 2025.

Comments: This paper has been accepted for presentation at the International Conference on Innovations in Intelligent Systems: Advancements in Computing, Communication, and Cybersecurity (ISAC3)

arXiv:2506.16597 [pdf, ps, other]

Exoplanet Classification through Vision Transformers with Temporal Image Analysis

Authors: Anupma Choudhary, Sohith Bandari, B. S. Kushvah, C. Swastik

Abstract: The classification of exoplanets has been a longstanding challenge in astronomy, requiring significant computational and observational resources. Traditional methods demand substantial effort, time, and cost, highlighting the need for advanced machine learning techniques to enhance classification efficiency. In this study, we propose a methodology that transforms raw light curve data from NASA's K… ▽ More The classification of exoplanets has been a longstanding challenge in astronomy, requiring significant computational and observational resources. Traditional methods demand substantial effort, time, and cost, highlighting the need for advanced machine learning techniques to enhance classification efficiency. In this study, we propose a methodology that transforms raw light curve data from NASA's Kepler mission into Gramian Angular Fields (GAFs) and Recurrence Plots (RPs) using the Gramian Angular Difference Field and recurrence plot techniques. These transformed images serve as inputs to the Vision Transformer (ViT) model, leveraging its ability to capture intricate temporal dependencies. We assess the performance of the model through recall, precision, and F1 score metrics, using a 5-fold cross-validation approach to obtain a robust estimate of the model's performance and reduce evaluation bias. Our comparative analysis reveals that RPs outperform GAFs, with the ViT model achieving an 89.46$\%$ recall and an 85.09$\%$ precision rate, demonstrating its significant capability in accurately identifying exoplanetary transits. Despite using under-sampling techniques to address class imbalance, dataset size reduction remains a limitation. This study underscores the importance of further research into optimizing model architectures to enhance automation, performance, and generalization of the model. △ Less

Submitted 19 June, 2025; originally announced June 2025.

Comments: Accepted for publication in the Astronomical Journal

arXiv:2506.15114 [pdf, ps, other]

Parallel Data Object Creation: Towards Scalable Metadata Management in High-Performance I/O Library

Authors: Youjia Li, Robert Latham, Robert Ross, Ankit Agrawal, Alok Choudhary, Wei-Keng Liao

Abstract: High-level I/O libraries, such as HDF5 and PnetCDF, are commonly used by large-scale scientific applications to perform I/O tasks in parallel. These I/O libraries store the metadata such as data types and dimensionality along with the raw data in the same files. While these libraries are well-optimized for concurrent access to the raw data, they are designed neither to handle a large number of dat… ▽ More High-level I/O libraries, such as HDF5 and PnetCDF, are commonly used by large-scale scientific applications to perform I/O tasks in parallel. These I/O libraries store the metadata such as data types and dimensionality along with the raw data in the same files. While these libraries are well-optimized for concurrent access to the raw data, they are designed neither to handle a large number of data objects efficiently nor to create different data objects independently by multiple processes, as they require applications to call data object creation APIs collectively with consistent metadata among all processes. Applications that process data gathered from remote sensors, such as particle collision experiments in high-energy physics, may generate data of different sizes from different sensors and desire to store them as separate data objects. For such applications, the I/O library's requirement on collective data object creation can become very expensive, as the cost of metadata consistency check increases with the metadata volume as well as the number of processes. To address this limitation, using PnetCDF as an experimental platform, we investigate solutions in this paper that abide the netCDF file format, as well as propose a new file header format that enables independent data object creation. The proposed file header consists of two sections, an index table and a list of metadata blocks. The index table contains the reference to the metadata blocks and each block stores metadata of objects that can be created collectively or independently. The new design achieves a scalable performance, cutting data object creation times by up to 582x when running on 4096 MPI processes to create 5,684,800 data objects in parallel. Additionally, the new method reduces the memory footprints, with each process requiring an amount of memory space inversely proportional to the number of processes. △ Less

Submitted 17 June, 2025; originally announced June 2025.

arXiv:2505.04962 [pdf, other]

An Efficient Method for Accurate Pose Estimation and Error Correction of Cuboidal Objects

Authors: Utsav Rai, Hardik Mehta, Vismay Vakharia, Aditya Choudhary, Amit Parmar, Rolif Lima, Kaushik Das

Abstract: The proposed system outlined in this paper is a solution to a use case that requires the autonomous picking of cuboidal objects from an organized or unorganized pile with high precision. This paper presents an efficient method for precise pose estimation of cuboid-shaped objects, which aims to reduce errors in target pose in a time-efficient manner. Typical pose estimation methods like global poin… ▽ More The proposed system outlined in this paper is a solution to a use case that requires the autonomous picking of cuboidal objects from an organized or unorganized pile with high precision. This paper presents an efficient method for precise pose estimation of cuboid-shaped objects, which aims to reduce errors in target pose in a time-efficient manner. Typical pose estimation methods like global point cloud registrations are prone to minor pose errors for which local registration algorithms are generally used to improve pose accuracy. However, due to the execution time overhead and uncertainty in the error of the final achieved pose, an alternate, linear time approach is proposed for pose error estimation and correction. This paper presents an overview of the solution followed by a detailed description of individual modules of the proposed algorithm. △ Less

Submitted 8 May, 2025; originally announced May 2025.

Comments: Accepted in IEEE/RSJ IROS 2022 Workshop on Mobile Manipulation and Embodied Intelligence (MOMA)

arXiv:2504.21331 [pdf]

Towards Space Group Determination from EBSD Patterns: The Role of Deep Learning and High-throughput Dynamical Simulations

Authors: Alfred Yan, Muhammad Nur Talha Kilic, Gert Nolze, Ankit Agrawal, Alok Choudhary, Roberto dos Reis, Vinayak Dravid

Abstract: The design of novel materials hinges on the understanding of structure-property relationships. However, in recent times, our capability to synthesize a large number of materials has outpaced our speed at characterizing them. While the overall chemical constituents can be readily known during synthesis, the structural evolution and characterization of newly synthesized samples remains a bottleneck… ▽ More The design of novel materials hinges on the understanding of structure-property relationships. However, in recent times, our capability to synthesize a large number of materials has outpaced our speed at characterizing them. While the overall chemical constituents can be readily known during synthesis, the structural evolution and characterization of newly synthesized samples remains a bottleneck for the ultimate goal of high throughput nanomaterials discovery. Thus, scalable methods for crystal symmetry determination that can analyze a large volume of material samples within a short time-frame are especially needed. Kikuchi diffraction in the SEM is a promising technique for this due to its sensitivity to dynamical scattering, which may provide information beyond just the seven crystal systems and fourteen Bravais lattices. After diffraction patterns are collected from material samples, deep learning methods may be able to classify the space group symmetries using the patterns as input, which paired with the elemental composition, would help enable the determination of the crystal structure. To investigate the feasibility of this solution, neural networks were trained to predict the space group type of background corrected EBSD patterns. Our networks were first trained and tested on an artificial dataset of EBSD patterns of 5,148 different cubic phases, created through physics-based dynamical simulations. Next, Maximum Classifier Discrepancy, an unsupervised deep learning-based domain adaptation method, was utilized to train neural networks to make predictions for experimental EBSD patterns. We introduce a relabeling scheme, which enables our models to achieve accuracy scores higher than 90% on simulated and experimental data, suggesting that neural networks are capable of making predictions of crystal symmetry from an EBSD pattern. △ Less

Submitted 2 May, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

Comments: 33 pages, preliminary version

arXiv:2503.19115 [pdf, ps, other]

Implementation of Support Vector Machines using Reaction Networks

Authors: Amey Choudhary, Jiaxin Jin, Abhishek Deshpande

Abstract: Can machine learning algorithms be implemented using chemistry? We demonstrate that this is possible in the case of support vector machines (SVMs). SVMs are powerful tools for data classification, leveraging Vapnik-Chervonenkis theory to handle high-dimensional data and small datasets effectively. In this work, we propose a chemical reaction network scheme for implementing SVMs, utilizing the stea… ▽ More Can machine learning algorithms be implemented using chemistry? We demonstrate that this is possible in the case of support vector machines (SVMs). SVMs are powerful tools for data classification, leveraging Vapnik-Chervonenkis theory to handle high-dimensional data and small datasets effectively. In this work, we propose a chemical reaction network scheme for implementing SVMs, utilizing the steady-state behavior of reaction network dynamics to model key computational aspects of SVMs. This approach introduces a novel biochemical framework for implementing machine learning algorithms in non-traditional computational environments. △ Less

Submitted 1 April, 2026; v1 submitted 24 March, 2025; originally announced March 2025.

Comments: 28 pages, 4 figures, 1 table

arXiv:2503.08437 [pdf, other]

doi 10.1007/978-3-031-80139-6_3

ICPR 2024 Competition on Rider Intention Prediction

Authors: Shankar Gangisetty, Abdul Wasi, Shyam Nandan Rai, C. V. Jawahar, Sajay Raj, Manish Prajapati, Ayesha Choudhary, Aaryadev Chandra, Dev Chandan, Shireen Chand, Suvaditya Mukherjee

Abstract: The recent surge in the vehicle market has led to an alarming increase in road accidents. This underscores the critical importance of enhancing road safety measures, particularly for vulnerable road users like motorcyclists. Hence, we introduce the rider intention prediction (RIP) competition that aims to address challenges in rider safety by proactively predicting maneuvers before they occur, the… ▽ More The recent surge in the vehicle market has led to an alarming increase in road accidents. This underscores the critical importance of enhancing road safety measures, particularly for vulnerable road users like motorcyclists. Hence, we introduce the rider intention prediction (RIP) competition that aims to address challenges in rider safety by proactively predicting maneuvers before they occur, thereby strengthening rider safety. This capability enables the riders to react to the potential incorrect maneuvers flagged by advanced driver assistance systems (ADAS). We collect a new dataset, namely, rider action anticipation dataset (RAAD) for the competition consisting of two tasks: single-view RIP and multi-view RIP. The dataset incorporates a spectrum of traffic conditions and challenging navigational maneuvers on roads with varying lighting conditions. For the competition, we received seventy-five registrations and five team submissions for inference of which we compared the methods of the top three performing teams on both the RIP tasks: one state-space model (Mamba2) and two learning-based approaches (SVM and CNN-LSTM). The results indicate that the state-space model outperformed the other methods across the entire dataset, providing a balanced performance across maneuver classes. The SVM-based RIP method showed the second-best performance when using random sampling and SMOTE. However, the CNN-LSTM method underperformed, primarily due to class imbalance issues, particularly struggling with minority classes. This paper details the proposed RAAD dataset and provides a summary of the submissions for the RIP 2024 competition. △ Less

Submitted 11 March, 2025; originally announced March 2025.

arXiv:2501.06434 [pdf, other]

Synthetic Feature Augmentation Improves Generalization Performance of Language Models

Authors: Ashok Choudhary, Cornelius Thiels, Hojjat Salehinejad

Abstract: Training and fine-tuning deep learning models, especially large language models (LLMs), on limited and imbalanced datasets poses substantial challenges. These issues often result in poor generalization, where models overfit to dominant classes and underperform on minority classes, leading to biased predictions and reduced robustness in real-world applications. To overcome these challenges, we prop… ▽ More Training and fine-tuning deep learning models, especially large language models (LLMs), on limited and imbalanced datasets poses substantial challenges. These issues often result in poor generalization, where models overfit to dominant classes and underperform on minority classes, leading to biased predictions and reduced robustness in real-world applications. To overcome these challenges, we propose augmenting features in the embedding space by generating synthetic samples using a range of techniques. By upsampling underrepresented classes, this method improves model performance and alleviates data imbalance. We validate the effectiveness of this approach across multiple open-source text classification benchmarks, demonstrating its potential to enhance model robustness and generalization in imbalanced data scenarios. △ Less

Submitted 10 January, 2025; originally announced January 2025.

Comments: Accepted for presentation at IEEE SSCI 2025

arXiv:2501.05260 [pdf]

Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing

Authors: Atharva Mutsaddi, Aditya Choudhary

Abstract: Plagiarism involves using another person's work or concepts without proper attribution, presenting them as original creations. With the growing amount of data communicated in regional languages such as Marathi -- one of India's regional languages -- it is crucial to design robust plagiarism detection systems tailored for low-resource languages. Language models like Bidirectional Encoder Representa… ▽ More Plagiarism involves using another person's work or concepts without proper attribution, presenting them as original creations. With the growing amount of data communicated in regional languages such as Marathi -- one of India's regional languages -- it is crucial to design robust plagiarism detection systems tailored for low-resource languages. Language models like Bidirectional Encoder Representations from Transformers (BERT) have demonstrated exceptional capability in text representation and feature extraction, making them essential tools for semantic analysis and plagiarism detection. However, the application of BERT for low-resource languages remains under-explored, particularly in the context of plagiarism detection. This paper presents a method to enhance the accuracy of plagiarism detection for Marathi texts using BERT sentence embeddings in conjunction with Term Frequency-Inverse Document Frequency (TF-IDF) feature representation. This approach effectively captures statistical, semantic, and syntactic aspects of text features through a weighted voting ensemble of machine learning models. △ Less

Submitted 9 January, 2025; originally announced January 2025.

Comments: Accepted into LoResLM: The First Workshop on Language Models for Low-Resource Languages, colocated with COLING 2025 and set to be published into ACL Anthology

ACM Class: I.2.7; H.3.3

arXiv:2412.01132 [pdf, other]

Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks

Authors: Joseph Raj Vishal, Divesh Basina, Aarya Choudhary, Bharatesh Chakravarthi

Abstract: Recent advances in video question answering (VideoQA) offer promising applications, especially in traffic monitoring, where efficient video interpretation is critical. Within ITS, answering complex, real-time queries like "How many red cars passed in the last 10 minutes?" or "Was there an incident between 3:00 PM and 3:05 PM?" enhances situational awareness and decision-making. Despite progress in… ▽ More Recent advances in video question answering (VideoQA) offer promising applications, especially in traffic monitoring, where efficient video interpretation is critical. Within ITS, answering complex, real-time queries like "How many red cars passed in the last 10 minutes?" or "Was there an incident between 3:00 PM and 3:05 PM?" enhances situational awareness and decision-making. Despite progress in vision-language models, VideoQA remains challenging, especially in dynamic environments involving multiple objects and intricate spatiotemporal relationships. This study evaluates state-of-the-art VideoQA models using non-benchmark synthetic and real-world traffic sequences. The framework leverages GPT-4o to assess accuracy, relevance, and consistency across basic detection, temporal reasoning, and decomposition queries. VideoLLaMA-2 excelled with 57% accuracy, particularly in compositional reasoning and consistent answers. However, all models, including VideoLLaMA-2, faced limitations in multi-object tracking, temporal coherence, and complex scene interpretation, highlighting gaps in current architectures. These findings underscore VideoQA's potential in traffic monitoring but also emphasize the need for improvements in multi-object tracking, temporal reasoning, and compositional capabilities. Enhancing these areas could make VideoQA indispensable for incident detection, traffic flow management, and responsive urban planning. The study's code and framework are open-sourced for further exploration: https://github.com/joe-rabbit/VideoQA_Pilot_Study △ Less

Submitted 2 December, 2024; originally announced December 2024.

arXiv:2411.17475 [pdf, ps, other]

COBRA: A Continual Learning Approach to Vision-Brain Understanding

Authors: Xuan-Bac Nguyen, Manuel Serna-Aguilera, Arabinda Kumar Choudhary, Pawan Sinha, Xin Li, Khoa Luu

Abstract: Vision-Brain Understanding (VBU) aims to extract visual information perceived by humans from brain activity recorded through functional Magnetic Resonance Imaging (fMRI). Despite notable advancements in recent years, existing studies in VBU continue to face the challenge of catastrophic forgetting, where models lose knowledge from prior subjects as they adapt to new ones. Addressing continual lear… ▽ More Vision-Brain Understanding (VBU) aims to extract visual information perceived by humans from brain activity recorded through functional Magnetic Resonance Imaging (fMRI). Despite notable advancements in recent years, existing studies in VBU continue to face the challenge of catastrophic forgetting, where models lose knowledge from prior subjects as they adapt to new ones. Addressing continual learning in this field is, therefore, essential. This paper introduces a novel framework called Continual Learning for Vision-Brain (COBRA) to address continual learning in VBU. Our approach includes three novel modules: a Subject Commonality (SC) module, a Prompt-based Subject Specific (PSS) module, and a transformer-based module for fMRI, denoted as MRIFormer module. The SC module captures shared vision-brain patterns across subjects, preserving this knowledge as the model encounters new subjects, thereby reducing the impact of catastrophic forgetting. On the other hand, the PSS module learns unique vision-brain patterns specific to each subject. Finally, the MRIFormer module contains a transformer encoder and decoder that learns the fMRI features for VBU from common and specific patterns. In a continual learning setup, COBRA is trained in new PSS and MRIFormer modules for new subjects, leaving the modules of previous subjects unaffected. As a result, COBRA effectively addresses catastrophic forgetting and achieves state-of-the-art performance in both continual learning and vision-brain reconstruction tasks, surpassing previous methods. △ Less

Submitted 6 August, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

arXiv:2411.13378 [pdf, ps, other]

Quantum-Brain: Quantum-Inspired Neural Network Approach to Vision-Brain Understanding

Authors: Hoang-Quan Nguyen, Xuan-Bac Nguyen, Hugh Churchill, Arabinda Kumar Choudhary, Pawan Sinha, Samee U. Khan, Khoa Luu

Abstract: Vision-brain understanding aims to extract semantic information about brain signals from human perceptions. Existing deep learning methods for vision-brain understanding are usually introduced in a traditional learning paradigm missing the ability to learn the connectivities between brain regions. Meanwhile, the quantum computing theory offers a new paradigm for designing deep learning models. Mot… ▽ More Vision-brain understanding aims to extract semantic information about brain signals from human perceptions. Existing deep learning methods for vision-brain understanding are usually introduced in a traditional learning paradigm missing the ability to learn the connectivities between brain regions. Meanwhile, the quantum computing theory offers a new paradigm for designing deep learning models. Motivated by the connectivities in the brain signals and the entanglement properties in quantum computing, we propose a novel Quantum-Brain approach, a quantum-inspired neural network, to tackle the vision-brain understanding problem. To compute the connectivity between areas in brain signals, we introduce a new Quantum-Inspired Voxel-Controlling module to learn the impact of a brain voxel on others represented in the Hilbert space. To effectively learn connectivity, a novel Phase-Shifting module is presented to calibrate the value of the brain signals. Finally, we introduce a new Measurement-like Projection module to present the connectivity information from the Hilbert space into the feature space. The proposed approach can learn to find the connectivities between fMRI voxels and enhance the semantic information obtained from human perceptions. Our experimental results on the Natural Scene Dataset benchmarks illustrate the effectiveness of the proposed method with Top-1 accuracies of 95.1% and 95.6% on image and brain retrieval tasks and an Inception score of 95.3% on fMRI-to-image reconstruction task. Our proposed quantum-inspired network brings a potential paradigm to solving the vision-brain problems via the quantum computing theory. △ Less

Submitted 14 August, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

arXiv:2411.10622 [pdf, other]

KAT to KANs: A Review of Kolmogorov-Arnold Networks and the Neural Leap Forward

Authors: Divesh Basina, Joseph Raj Vishal, Aarya Choudhary, Bharatesh Chakravarthi

Abstract: The curse of dimensionality poses a significant challenge to modern multilayer perceptron-based architectures, often causing performance stagnation and scalability issues. Addressing this limitation typically requires vast amounts of data. In contrast, Kolmogorov-Arnold Networks have gained attention in the machine learning community for their bold claim of being unaffected by the curse of dimensi… ▽ More The curse of dimensionality poses a significant challenge to modern multilayer perceptron-based architectures, often causing performance stagnation and scalability issues. Addressing this limitation typically requires vast amounts of data. In contrast, Kolmogorov-Arnold Networks have gained attention in the machine learning community for their bold claim of being unaffected by the curse of dimensionality. This paper explores the Kolmogorov-Arnold representation theorem and the mathematical principles underlying Kolmogorov-Arnold Networks, which enable their scalability and high performance in high-dimensional spaces. We begin with an introduction to foundational concepts necessary to understand Kolmogorov-Arnold Networks, including interpolation methods and Basis-splines, which form their mathematical backbone. This is followed by an overview of perceptron architectures and the Universal approximation theorem, a key principle guiding modern machine learning. This is followed by an overview of the Kolmogorov-Arnold representation theorem, including its mathematical formulation and implications for overcoming dimensionality challenges. Next, we review the architecture and error-scaling properties of Kolmogorov-Arnold Networks, demonstrating how these networks achieve true freedom from the curse of dimensionality. Finally, we discuss the practical viability of Kolmogorov-Arnold Networks, highlighting scenarios where their unique capabilities position them to excel in real-world applications. This review aims to offer insights into Kolmogorov-Arnold Networks' potential to redefine scalability and performance in high-dimensional learning tasks. △ Less

Submitted 15 November, 2024; originally announced November 2024.

arXiv:2405.15777 [pdf, other]

Multi-agent Collaborative Perception for Robotic Fleet: A Systematic Review

Authors: Apoorv Singh, Gaurav Raut, Alka Choudhary

Abstract: Collaborative perception in multi-robot fleets is a way to incorporate the power of unity in robotic fleets. Collaborative perception refers to the collective ability of multiple entities or agents to share and integrate their sensory information for a more comprehensive understanding of their environment. In other words, it involves the collaboration and fusion of data from various sensors or sou… ▽ More Collaborative perception in multi-robot fleets is a way to incorporate the power of unity in robotic fleets. Collaborative perception refers to the collective ability of multiple entities or agents to share and integrate their sensory information for a more comprehensive understanding of their environment. In other words, it involves the collaboration and fusion of data from various sensors or sources to enhance perception and decision-making capabilities. By combining data from diverse sources, such as cameras, lidar, radar, or other sensors, the system can create a more accurate and robust representation of the environment. In this review paper, we have summarized findings from 20+ research papers on collaborative perception. Moreover, we discuss testing and evaluation frameworks commonly accepted in academia and industry for autonomous vehicles and autonomous mobile robots. Our experiments with the trivial perception module show an improvement of over 200% with collaborative perception compared to individual robot perception. Here's our GitHub repository that shows the benefits of collaborative perception: https://github.com/synapsemobility/synapseBEV △ Less

Submitted 22 March, 2024; originally announced May 2024.

Comments: 10 pages, 4 figures, 3 tables

arXiv:2311.13495 [pdf, other]

Current Topological and Machine Learning Applications for Bias Detection in Text

Authors: Colleen Farrelly, Yashbir Singh, Quincy A. Hathaway, Gunnar Carlsson, Ashok Choudhary, Rahul Paul, Gianfranco Doretto, Yassine Himeur, Shadi Atalls, Wathiq Mansoor

Abstract: Institutional bias can impact patient outcomes, educational attainment, and legal system navigation. Written records often reflect bias, and once bias is identified; it is possible to refer individuals for training to reduce bias. Many machine learning tools exist to explore text data and create predictive models that can search written records to identify real-time bias. However, few previous stu… ▽ More Institutional bias can impact patient outcomes, educational attainment, and legal system navigation. Written records often reflect bias, and once bias is identified; it is possible to refer individuals for training to reduce bias. Many machine learning tools exist to explore text data and create predictive models that can search written records to identify real-time bias. However, few previous studies investigate large language model embeddings and geometric models of biased text data to understand geometry's impact on bias modeling accuracy. To overcome this issue, this study utilizes the RedditBias database to analyze textual biases. Four transformer models, including BERT and RoBERTa variants, were explored. Post-embedding, t-SNE allowed two-dimensional visualization of data. KNN classifiers differentiated bias types, with lower k-values proving more effective. Findings suggest BERT, particularly mini BERT, excels in bias classification, while multilingual models lag. The recommendation emphasizes refining monolingual models and exploring domain-specific biases. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.02087 [pdf]

doi 10.1063/5.0178244

Design Of Rubble Analyzer Probe Using ML For Earthquake

Authors: Abhishek Sebastian, R Pragna, K Vishal Vythianathan, Dasaraju Sohan Sai, U Shiva Sri Hari Al, R Anirudh, Apurv Choudhary

Abstract: The earthquake rubble analyzer uses machine learning to detect human presence via ambient sounds, achieving 97.45% accuracy. It also provides real-time environmental data, aiding in assessing survival prospects for trapped individuals, crucial for post-earthquake rescue efforts The earthquake rubble analyzer uses machine learning to detect human presence via ambient sounds, achieving 97.45% accuracy. It also provides real-time environmental data, aiding in assessing survival prospects for trapped individuals, crucial for post-earthquake rescue efforts △ Less

Submitted 24 October, 2023; originally announced November 2023.

arXiv:2309.03493 [pdf, other]

SAM3D: Segment Anything Model in Volumetric Medical Images

Authors: Nhat-Tan Bui, Dinh-Hieu Hoang, Minh-Triet Tran, Gianfranco Doretto, Donald Adjeroh, Brijesh Patel, Arabinda Choudhary, Ngan Le

Abstract: Image segmentation remains a pivotal component in medical image analysis, aiding in the extraction of critical information for precise diagnostic practices. With the advent of deep learning, automated image segmentation methods have risen to prominence, showcasing exceptional proficiency in processing medical imagery. Motivated by the Segment Anything Model (SAM)-a foundational model renowned for… ▽ More Image segmentation remains a pivotal component in medical image analysis, aiding in the extraction of critical information for precise diagnostic practices. With the advent of deep learning, automated image segmentation methods have risen to prominence, showcasing exceptional proficiency in processing medical imagery. Motivated by the Segment Anything Model (SAM)-a foundational model renowned for its remarkable precision and robust generalization capabilities in segmenting 2D natural images-we introduce SAM3D, an innovative adaptation tailored for 3D volumetric medical image analysis. Unlike current SAM-based methods that segment volumetric data by converting the volume into separate 2D slices for individual analysis, our SAM3D model processes the entire 3D volume image in a unified approach. Extensive experiments are conducted on multiple medical image datasets to demonstrate that our network attains competitive results compared with other state-of-the-art methods in 3D medical segmentation tasks while being significantly efficient in terms of parameters. Code and checkpoints are available at https://github.com/UARK-AICV/SAM3D. △ Less

Submitted 5 March, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: Accepted at ISBI 2024

arXiv:2308.15618 [pdf]

RACR-MIL: Rank-aware contextual reasoning for weakly supervised grading of squamous cell carcinoma using whole slide images

Authors: Anirudh Choudhary, Mosbah Aouad, Krishnakant Saboo, Angelina Hwang, Jacob Kechter, Blake Bordeaux, Puneet Bhullar, David DiCaudo, Steven Nelson, Nneka Comfere, Emma Johnson, Olayemi Sokumbi, Jason Sluzevich, Leah Swanson, Dennis Murphree, Aaron Mangold, Ravishankar Iyer

Abstract: Squamous cell carcinoma (SCC) is the most common cancer subtype, with an increasing incidence and a significant impact on cancer-related mortality. SCC grading using whole slide images is inherently challenging due to the lack of a reliable protocol and substantial tissue heterogeneity. We propose RACR-MIL, the first weakly-supervised SCC grading approach achieving robust generalization across mul… ▽ More Squamous cell carcinoma (SCC) is the most common cancer subtype, with an increasing incidence and a significant impact on cancer-related mortality. SCC grading using whole slide images is inherently challenging due to the lack of a reliable protocol and substantial tissue heterogeneity. We propose RACR-MIL, the first weakly-supervised SCC grading approach achieving robust generalization across multiple anatomies (skin, head and neck, lung). RACR-MIL is an attention-based multiple-instance learning framework that enhances grade-relevant contextual representation learning and addresses tumor heterogeneity through two key innovations: (1) a hybrid WSI graph that captures both local tissue context and non-local phenotypical dependencies between tumor regions, and (2) a rank-ordering constraint in the attention mechanism that consistently prioritizes higher-grade tumor regions, aligning with pathologists diagnostic process. Our model achieves state-of-the-art performance across multiple SCC datasets, achieving 3-9% higher grading accuracy, resilience to class imbalance, and up to 16% improved tumor localization. In a pilot study, pathologists reported that RACR-MIL improved grading efficiency in 60% of cases, underscoring its potential as a clinically viable cancer diagnosis and grading assistant. △ Less

Submitted 19 July, 2025; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: 17 pages main text, 2 page references, 2 page appendix; under submission

arXiv:2304.14839 [pdf, other]

Sampling-based Path Planning Algorithms: A Survey

Authors: Alka Choudhary

Abstract: Path planning is a classic problem for autonomous robots. To ensure safe and efficient point-to-point navigation an appropriate algorithm should be chosen keeping the robot's dimensions and its classification in mind. Autonomous robots use path-planning algorithms to safely navigate a dynamic, dense, and unknown environment. A few metrics for path planning algorithms to be taken into account are s… ▽ More Path planning is a classic problem for autonomous robots. To ensure safe and efficient point-to-point navigation an appropriate algorithm should be chosen keeping the robot's dimensions and its classification in mind. Autonomous robots use path-planning algorithms to safely navigate a dynamic, dense, and unknown environment. A few metrics for path planning algorithms to be taken into account are safety, efficiency, lowest-cost path generation, and obstacle avoidance. Before path planning can take place we need map representation which can be discretized or open configuration space. Discretized configuration space provides node/connectivity information from one point to another. While in open/free configuration space it is up to the algorithm to create a list of nodes and then find a feasible path. Both types of maps are populated by obstacle positions using perception obstacle detection techniques to represent current obstacles from the perspective of the robot. For open configuration spaces, sampling-based planning algorithms are used. This paper aims to explore various types of Sampling-based path-planning algorithms such as Probabilistic RoadMap (PRM), and Rapidly-exploring Random Trees (RRT). These two algorithms also have optimized versions - PRM* and RRT* and this paper discusses how that optimization is achieved and is beneficial. △ Less

Submitted 23 April, 2023; originally announced April 2023.

Comments: 6 Pages, 2 Figures, 4 Algorithms, 2 Tables

arXiv:2211.04011 [pdf, other]

An Incremental Phase Mapping Approach for X-ray Diffraction Patterns using Binary Peak Representations

Authors: Dipendra Jha, K. V. L. V. Narayanachari, Ruifeng Zhang, Justin Liao, Denis T. Keane, Wei-keng Liao, Alok Choudhary, Yip-Wah Chung, Michael Bedzyk, Ankit Agrawal

Abstract: Despite the huge advancement in knowledge discovery and data mining techniques, the X-ray diffraction (XRD) analysis process has mostly remained untouched and still involves manual investigation, comparison, and verification. Due to the large volume of XRD samples from high-throughput XRD experiments, it has become impossible for domain scientists to process them manually. Recently, they have star… ▽ More Despite the huge advancement in knowledge discovery and data mining techniques, the X-ray diffraction (XRD) analysis process has mostly remained untouched and still involves manual investigation, comparison, and verification. Due to the large volume of XRD samples from high-throughput XRD experiments, it has become impossible for domain scientists to process them manually. Recently, they have started leveraging standard clustering techniques, to reduce the XRD pattern representations requiring manual efforts for labeling and verification. Nevertheless, these standard clustering techniques do not handle problem-specific aspects such as peak shifting, adjacent peaks, background noise, and mixed phases; hence, resulting in incorrect composition-phase diagrams that complicate further steps. Here, we leverage data mining techniques along with domain expertise to handle these issues. In this paper, we introduce an incremental phase mapping approach based on binary peak representations using a new threshold based fuzzy dissimilarity measure. The proposed approach first applies an incremental phase computation algorithm on discrete binary peak representation of XRD samples, followed by hierarchical clustering or manual merging of similar pure phases to obtain the final composition-phase diagram. We evaluate our method on the composition space of two ternary alloy systems- Co-Ni-Ta and Co-Ti-Ta. Our results are verified by domain scientists and closely resembles the manually computed ground-truth composition-phase diagrams. The proposed approach takes us closer towards achieving the goal of complete end-to-end automated XRD analysis. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: Accepted and presented at the International Workshop on Domain-Driven Data Mining (DDDM) as a part of the SIAM International Conference on Data Mining (SDM 2021). Contains 11 pages and 5 figures

arXiv:2205.01168 [pdf, other]

A Case Study on Parallel HDF5 Dataset Concatenation for High Energy Physics Data Analysis

Authors: Sunwoo Lee, Kai-yuan Hou, Kewei Wang, Saba Sehrish, Marc Paterno, James Kowalkowski, Quincey Koziol, Robert Ross, Ankit Agrawal, Alok Choudhary, Wei-keng Liao

Abstract: In High Energy Physics (HEP), experimentalists generate large volumes of data that, when analyzed, helps us better understand the fundamental particles and their interactions. This data is often captured in many files of small size, creating a data management challenge for scientists. In order to better facilitate data management, transfer, and analysis on large scale platforms, it is advantageous… ▽ More In High Energy Physics (HEP), experimentalists generate large volumes of data that, when analyzed, helps us better understand the fundamental particles and their interactions. This data is often captured in many files of small size, creating a data management challenge for scientists. In order to better facilitate data management, transfer, and analysis on large scale platforms, it is advantageous to aggregate data further into a smaller number of larger files. However, this translation process can consume significant time and resources, and if performed incorrectly the resulting aggregated files can be inefficient for highly parallel access during analysis on large scale platforms. In this paper, we present our case study on parallel I/O strategies and HDF5 features for reducing data aggregation time, making effective use of compression, and ensuring efficient access to the resulting data during analysis at scale. We focus on NOvA detector data in this case study, a large-scale HEP experiment generating many terabytes of data. The lessons learned from our case study inform the handling of similar datasets, thus expanding community knowledge related to this common data management task. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2204.04348 [pdf, other]

doi 10.1038/s41598-023-40766-6

Neuronal diversity can improve machine learning for physics and beyond

Authors: Anshul Choudhary, Anil Radhakrishnan, John F. Lindner, Sudeshna Sinha, William L. Ditto

Abstract: Diversity conveys advantages in nature, yet homogeneous neurons typically comprise the layers of artificial neural networks. Here we construct neural networks from neurons that learn their own activation functions, quickly diversify, and subsequently outperform their homogeneous counterparts on image classification and nonlinear regression tasks. Sub-networks instantiate the neurons, which meta-le… ▽ More Diversity conveys advantages in nature, yet homogeneous neurons typically comprise the layers of artificial neural networks. Here we construct neural networks from neurons that learn their own activation functions, quickly diversify, and subsequently outperform their homogeneous counterparts on image classification and nonlinear regression tasks. Sub-networks instantiate the neurons, which meta-learn especially efficient sets of nonlinear responses. Examples include conventional neural networks classifying digits and forecasting a van der Pol oscillator and physics-informed Hamiltonian neural networks learning Hénon-Heiles stellar orbits and the swing of a video recorded pendulum clock. Such \textit{learned diversity} provides examples of dynamical systems selecting diversity over uniformity and elucidates the role of diversity in natural and artificial systems. △ Less

Submitted 30 August, 2023; v1 submitted 8 April, 2022; originally announced April 2022.

Comments: 13 pages, 9 figures

arXiv:2106.16187 [pdf, other]

Reinforcement Learning based Disease Progression Model for Alzheimer's Disease

Authors: Krishnakant V. Saboo, Anirudh Choudhary, Yurui Cao, Gregory A. Worrell, David T. Jones, Ravishankar K. Iyer

Abstract: We model Alzheimer's disease (AD) progression by combining differential equations (DEs) and reinforcement learning (RL) with domain knowledge. DEs provide relationships between some, but not all, factors relevant to AD. We assume that the missing relationships must satisfy general criteria about the working of the brain, for e.g., maximizing cognition while minimizing the cost of supporting cognit… ▽ More We model Alzheimer's disease (AD) progression by combining differential equations (DEs) and reinforcement learning (RL) with domain knowledge. DEs provide relationships between some, but not all, factors relevant to AD. We assume that the missing relationships must satisfy general criteria about the working of the brain, for e.g., maximizing cognition while minimizing the cost of supporting cognition. This allows us to extract the missing relationships by using RL to optimize an objective (reward) function that captures the above criteria. We use our model consisting of DEs (as a simulator) and the trained RL agent to predict individualized 10-year AD progression using baseline (year 0) features on synthetic and real data. The model was comparable or better at predicting 10-year cognition trajectories than state-of-the-art learning-based models. Our interpretable model demonstrated, and provided insights into, "recovery/compensatory" processes that mitigate the effect of AD, even though those processes were not explicitly encoded in the model. Our framework combines DEs with RL for modelling AD progression and has broad applicability for understanding other neurological disorders. △ Less

Submitted 2 November, 2021; v1 submitted 30 June, 2021; originally announced June 2021.

Comments: 10 pages main text, 3 page references, 11 page appendix

arXiv:2106.13675 [pdf]

Creating and Implementing a Smart Speaker

Authors: Sanskar Jethi, Avinash Kumar Choudhary, Yash Gupta, Abhishek Chaudhary

Abstract: We have seen significant advancements in Artificial Intelligence and Machine Learning in the 21st century. It has enabled a new technology where we can have a human-like conversation with the machines. The most significant use of this speech recognition and contextual understanding technology exists in the form of a Smart Speaker. We have a wide variety of Smart Speaker products available to us. T… ▽ More We have seen significant advancements in Artificial Intelligence and Machine Learning in the 21st century. It has enabled a new technology where we can have a human-like conversation with the machines. The most significant use of this speech recognition and contextual understanding technology exists in the form of a Smart Speaker. We have a wide variety of Smart Speaker products available to us. This paper aims to decode its creation and explain the technology that makes these Speakers, "Smart." △ Less

Submitted 30 May, 2021; originally announced June 2021.

Journal ref: IT in Industry, Vol. 9, No.3, 2021

arXiv:2105.05151 [pdf, other]

Improved Approximate Rips Filtrations with Shifted Integer Lattices and Cubical Complexes

Authors: Aruni Choudhary, Michael Kerber, Sharath Raghvendra

Abstract: Rips complexes are important structures for analyzing topological features of metric spaces. Unfortunately, generating these complexes is expensive because of a combinatorial explosion in the complex size. For $n$ points in $\mathbb{R}^d$, we present a scheme to construct a $2$-approximation of the filtration of the Rips complex in the $L_\infty$-norm, which extends to a $2d^{0.25}$-approximation… ▽ More Rips complexes are important structures for analyzing topological features of metric spaces. Unfortunately, generating these complexes is expensive because of a combinatorial explosion in the complex size. For $n$ points in $\mathbb{R}^d$, we present a scheme to construct a $2$-approximation of the filtration of the Rips complex in the $L_\infty$-norm, which extends to a $2d^{0.25}$-approximation in the Euclidean case. The $k$-skeleton of the resulting approximation has a total size of $n2^{O(d\log k +d)}$. The scheme is based on the integer lattice and simplicial complexes based on the barycentric subdivision of the $d$-cube. We extend our result to use cubical complexes in place of simplicial complexes by introducing cubical maps between complexes. We get the same approximation guarantee as the simplicial case, while reducing the total size of the approximation to only $n2^{O(d)}$ (cubical) cells. There are two novel techniques that we use in this paper. The first is the use of acyclic carriers for proving our approximation result. In our application, these are maps which relate the Rips complex and the approximation in a relatively simple manner and greatly reduce the complexity of showing the approximation guarantee. The second technique is what we refer to as scale balancing, which is a simple trick to improve the approximation ratio under certain conditions. △ Less

Submitted 11 May, 2021; originally announced May 2021.

Comments: To appear in Journal of Applied and Computational Topology. arXiv admin note: substantial text overlap with arXiv:1706.07399

arXiv:2103.04870 [pdf, other]

doi 10.1007/978-981-16-1103-2_8

Domain Adaptive Egocentric Person Re-identification

Authors: Ankit Choudhary, Deepak Mishra, Arnab Karmakar

Abstract: Person re-identification (re-ID) in first-person (egocentric) vision is a fairly new and unexplored problem. With the increase of wearable video recording devices, egocentric data becomes readily available, and person re-identification has the potential to benefit greatly from this. However, there is a significant lack of large scale structured egocentric datasets for person re-identification, due… ▽ More Person re-identification (re-ID) in first-person (egocentric) vision is a fairly new and unexplored problem. With the increase of wearable video recording devices, egocentric data becomes readily available, and person re-identification has the potential to benefit greatly from this. However, there is a significant lack of large scale structured egocentric datasets for person re-identification, due to the poor video quality and lack of individuals in most of the recorded content. Although a lot of research has been done in person re-identification based on fixed surveillance cameras, these do not directly benefit egocentric re-ID. Machine learning models trained on the publicly available large scale re-ID datasets cannot be applied to egocentric re-ID due to the dataset bias problem. The proposed algorithm makes use of neural style transfer (NST) that incorporates a variant of Convolutional Neural Network (CNN) to utilize the benefits of both fixed camera vision and first-person vision. NST generates images having features from both egocentric datasets and fixed camera datasets, that are fed through a VGG-16 network trained on a fixed-camera dataset for feature extraction. These extracted features are then used to re-identify individuals. The fixed camera dataset Market-1501 and the first-person dataset EGO Re-ID are applied for this work and the results are on par with the present re-identification models in the egocentric domain. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 12 pages, 4 figures, In Proceedings of the Fifth IAPR International Conference on Computer Vision & Image Processing (CVIP), 2020

arXiv:2101.10553 [pdf, other]

A General Framework Combining Generative Adversarial Networks and Mixture Density Networks for Inverse Modeling in Microstructural Materials Design

Authors: Zijiang Yang, Dipendra Jha, Arindam Paul, Wei-keng Liao, Alok Choudhary, Ankit Agrawal

Abstract: Microstructural materials design is one of the most important applications of inverse modeling in materials science. Generally speaking, there are two broad modeling paradigms in scientific applications: forward and inverse. While the forward modeling estimates the observations based on known parameters, the inverse modeling attempts to infer the parameters given the observations. Inverse problems… ▽ More Microstructural materials design is one of the most important applications of inverse modeling in materials science. Generally speaking, there are two broad modeling paradigms in scientific applications: forward and inverse. While the forward modeling estimates the observations based on known parameters, the inverse modeling attempts to infer the parameters given the observations. Inverse problems are usually more critical as well as difficult in scientific applications as they seek to explore the parameters that cannot be directly observed. Inverse problems are used extensively in various scientific fields, such as geophysics, healthcare and materials science. However, it is challenging to solve inverse problems, because they usually need to learn a one-to-many non-linear mapping, and also require significant computing time, especially for high-dimensional parameter space. Further, inverse problems become even more difficult to solve when the dimension of input (i.e. observation) is much lower than that of output (i.e. parameters). In this work, we propose a framework consisting of generative adversarial networks and mixture density networks for inverse modeling, and it is evaluated on a materials science dataset for microstructural materials design. Compared with baseline methods, the results demonstrate that the proposed framework can overcome the above-mentioned challenges and produce multiple promising solutions in an efficient manner. △ Less

Submitted 25 January, 2021; originally announced January 2021.

arXiv:2010.15201 [pdf, other]

Forecasting Hamiltonian dynamics without canonical coordinates

Authors: Anshul Choudhary, John F. Lindner, Elliott G. Holliday, Scott T. Miller, Sudeshna Sinha, William L. Ditto

Abstract: Conventional neural networks are universal function approximators, but because they are unaware of underlying symmetries or physical laws, they may need impractically many training data to approximate nonlinear dynamics. Recently introduced Hamiltonian neural networks can efficiently learn and forecast dynamical systems that conserve energy, but they require special inputs called canonical coordin… ▽ More Conventional neural networks are universal function approximators, but because they are unaware of underlying symmetries or physical laws, they may need impractically many training data to approximate nonlinear dynamics. Recently introduced Hamiltonian neural networks can efficiently learn and forecast dynamical systems that conserve energy, but they require special inputs called canonical coordinates, which may be hard to infer from data. Here we significantly expand the scope of such networks by demonstrating a simple way to train them with any set of generalised coordinates, including easily observable ones. △ Less

Submitted 28 October, 2020; originally announced October 2020.

Comments: 7 pages, 4 figures

arXiv:2008.08519 [pdf, other]

doi 10.1109/HiPC.2017.00052

Building Halo Merger Trees from the Q Continuum Simulation

Authors: Esteban Rangel, Nicholas Frontiere, Salman Habib, Katrin Heitmann, Wei-keng Liao, Ankit Agrawal, Alok Choudhary

Abstract: Cosmological N-body simulations rank among the most computationally intensive efforts today. A key challenge is the analysis of structure, substructure, and the merger history for many billions of compact particle clusters, called halos. Effectively representing the merging history of halos is essential for many galaxy formation models used to generate synthetic sky catalogs, an important applicat… ▽ More Cosmological N-body simulations rank among the most computationally intensive efforts today. A key challenge is the analysis of structure, substructure, and the merger history for many billions of compact particle clusters, called halos. Effectively representing the merging history of halos is essential for many galaxy formation models used to generate synthetic sky catalogs, an important application of modern cosmological simulations. Generating realistic mock catalogs requires computing the halo formation history from simulations with large volumes and billions of halos over many time steps, taking hundreds of terabytes of analysis data. We present fast parallel algorithms for producing halo merger trees and tracking halo substructure from a single-level, density-based clustering algorithm. Merger trees are created from analyzing the halo-particle membership function in adjacent snapshots, and substructure is identified by tracking the "cores" of merging halos -- sets of particles near the halo center. Core tracking is performed after creating merger trees and uses the relationships found during tree construction to associate substructures with hosts. The algorithms are implemented with MPI and evaluated on a Cray XK7 supercomputer using up to 16,384 processes on data from HACC, a modern cosmological simulation framework. We present results for creating merger trees from 101 analysis snapshots taken from the Q Continuum, a large volume, high mass resolution, cosmological simulation evolving half a trillion particles. △ Less

Submitted 19 August, 2020; originally announced August 2020.

Comments: 2017 IEEE 24th International Conference on High Performance Computing

Journal ref: 2017 IEEE 24th International Conference on High Performance Computing (HiPC), pp. 398-407. IEEE, 2017

arXiv:2008.04214 [pdf, other]

Mastering high-dimensional dynamics with Hamiltonian neural networks

Authors: Scott T. Miller, John F. Lindner, Anshul Choudhary, Sudeshna Sinha, William L. Ditto

Abstract: We detail how incorporating physics into neural network design can significantly improve the learning and forecasting of dynamical systems, even nonlinear systems of many dimensions. A map building perspective elucidates the superiority of Hamiltonian neural networks over conventional neural networks. The results clarify the critical relation between data, dimension, and neural network learning pe… ▽ More We detail how incorporating physics into neural network design can significantly improve the learning and forecasting of dynamical systems, even nonlinear systems of many dimensions. A map building perspective elucidates the superiority of Hamiltonian neural networks over conventional neural networks. The results clarify the critical relation between data, dimension, and neural network learning performance. △ Less

Submitted 28 July, 2020; originally announced August 2020.

Comments: 7 pages, 9 figures

arXiv:2003.09266 [pdf, other]

Computational Complexity of the $α$-Ham-Sandwich Problem

Authors: Man-Kwun Chiu, Aruni Choudhary, Wolfgang Mulzer

Abstract: The classic Ham-Sandwich theorem states that for any $d$ measurable sets in $\mathbb{R}^d$, there is a hyperplane that bisects them simultaneously. An extension by Bárány, Hubard, and Jerónimo [DCG 2008] states that if the sets are convex and \emph{well-separated}, then for any given $α_1, \dots, α_d \in [0, 1]$, there is a unique oriented hyperplane that cuts off a respective fraction… ▽ More The classic Ham-Sandwich theorem states that for any $d$ measurable sets in $\mathbb{R}^d$, there is a hyperplane that bisects them simultaneously. An extension by Bárány, Hubard, and Jerónimo [DCG 2008] states that if the sets are convex and \emph{well-separated}, then for any given $α_1, \dots, α_d \in [0, 1]$, there is a unique oriented hyperplane that cuts off a respective fraction $α_1, \dots, α_d$ from each set. Steiger and Zhao [DCG 2010] proved a discrete analogue of this theorem, which we call the \emph{$α$-Ham-Sandwich theorem}. They gave an algorithm to find the hyperplane in time $O(n (\log n)^{d-3})$, where $n$ is the total number of input points. The computational complexity of this search problem in high dimensions is open, quite unlike the complexity of the Ham-Sandwich problem, which is now known to be PPA-complete (Filos-Ratsikas and Goldberg [STOC 2019]). Recently, Fearley, Gordon, Mehta, and Savani [ICALP 2019] introduced a new sub-class of CLS (Continuous Local Search) called \emph{Unique End-of-Potential Line} (UEOPL). This class captures problems in CLS that have unique solutions. We show that for the $α$-Ham-Sandwich theorem, the search problem of finding the dividing hyperplane lies in UEOPL. This gives the first non-trivial containment of the problem in a complexity class and places it in the company of classic search problems such as finding the fixed point of a contraction map, the unique sink orientation problem and the $P$-matrix linear complementarity problem. △ Less

Submitted 20 March, 2020; originally announced March 2020.

arXiv:2003.02245 [pdf, other]

Data Augmentation using Pre-trained Transformer Models

Authors: Varun Kumar, Ashutosh Choudhary, Eunah Cho

Abstract: Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of transformer based pre-trained models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple y… ▽ More Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of transformer based pre-trained models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation. Additionally, on three classification benchmarks, pre-trained Seq2Seq model outperforms other data augmentation methods in a low-resource setting. Further, we explore how different pre-trained model based data augmentation differs in-terms of data diversity, and how well such methods preserve the class-label information. △ Less

Submitted 31 January, 2021; v1 submitted 4 March, 2020; originally announced March 2020.

Comments: In Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems @ AACL 2020; Code: https://github.com/varinf/TransformersDataAugmentation

arXiv:1907.12953 [pdf, other]

A real-time iterative machine learning approach for temperature profile prediction in additive manufacturing processes

Authors: Arindam Paul, Mojtaba Mozaffar, Zijiang Yang, Wei-keng Liao, Alok Choudhary, Jian Cao, Ankit Agrawal

Abstract: Additive Manufacturing (AM) is a manufacturing paradigm that builds three-dimensional objects from a computer-aided design model by successively adding material layer by layer. AM has become very popular in the past decade due to its utility for fast prototyping such as 3D printing as well as manufacturing functional parts with complex geometries using processes such as laser metal deposition that… ▽ More Additive Manufacturing (AM) is a manufacturing paradigm that builds three-dimensional objects from a computer-aided design model by successively adding material layer by layer. AM has become very popular in the past decade due to its utility for fast prototyping such as 3D printing as well as manufacturing functional parts with complex geometries using processes such as laser metal deposition that would be difficult to create using traditional machining. As the process for creating an intricate part for an expensive metal such as Titanium is prohibitive with respect to cost, computational models are used to simulate the behavior of AM processes before the experimental run. However, as the simulations are computationally costly and time-consuming for predicting multiscale multi-physics phenomena in AM, physics-informed data-driven machine-learning systems for predicting the behavior of AM processes are immensely beneficial. Such models accelerate not only multiscale simulation tools but also empower real-time control systems using in-situ data. In this paper, we design and develop essential components of a scientific framework for developing a data-driven model-based real-time control system. Finite element methods are employed for solving time-dependent heat equations and developing the database. The proposed framework uses extremely randomized trees - an ensemble of bagged decision trees as the regression algorithm iteratively using temperatures of prior voxels and laser information as inputs to predict temperatures of subsequent voxels. The models achieve mean absolute percentage errors below 1% for predicting temperature profiles for AM processes. The code is made available for the research community at https://github.com/paularindam/ml-iter-additive. △ Less

Submitted 9 August, 2019; v1 submitted 28 July, 2019; originally announced July 2019.

Comments: 10 pages, 8 figures

Journal ref: 6th IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2019

arXiv:1907.12656 [pdf, other]

doi 10.1109/TPDS.2020.3000458

Improving MPI Collective I/O Performance With Intra-node Request Aggregation

Authors: Qiao Kang, Sunwoo Lee, Kai-yuan Hou, Robert Ross, Ankit Agrawal, Alok Choudhary, Wei-keng Liao

Abstract: Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistributes I/O requests among the calling processes into a form that minimizes the file access costs. As modern parallel computers continue to grow into the exascale era, the communication cost of such request redistribution can quickly overwhelm collective I/O performance. This effect has been observed from… ▽ More Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistributes I/O requests among the calling processes into a form that minimizes the file access costs. As modern parallel computers continue to grow into the exascale era, the communication cost of such request redistribution can quickly overwhelm collective I/O performance. This effect has been observed from parallel jobs that run on multiple compute nodes with a high count of MPI processes on each node. To reduce the communication cost, we present a new design for collective I/O by adding an extra communication layer that performs request aggregation among processes within the same compute nodes. This approach can significantly reduce inter-node communication congestion when redistributing the I/O requests. We evaluate the performance and compare with the original two-phase I/O on a Cray XC40 parallel computer with Intel KNL processors. Using I/O patterns from two large-scale production applications and an I/O benchmark, we show the performance improvement of up to 29 times when running 16384 MPI processes on 256 compute nodes. △ Less

Submitted 29 July, 2019; originally announced July 2019.

Comments: 12 pages, 7 figures

arXiv:1907.04284 [pdf, other]

doi 10.1007/s00454-022-00380-1

No-dimensional Tverberg Theorems and Algorithms

Authors: Aruni Choudhary, Wolfgang Mulzer

Abstract: Tverberg's theorem states that for any $k \ge 2$ and any set $P \subset \mathbb{R}^d$ of at least $(d + 1)(k - 1) + 1$ points in $d$ dimensions, we can partition $P$ into $k$ subsets whose convex hulls have a non-empty intersection. The associated search problem of finding the partition lies in the complexity class $\text{CLS} = \text{PPAD} \cap \text{PLS}$, but no hardness results are known. In t… ▽ More Tverberg's theorem states that for any $k \ge 2$ and any set $P \subset \mathbb{R}^d$ of at least $(d + 1)(k - 1) + 1$ points in $d$ dimensions, we can partition $P$ into $k$ subsets whose convex hulls have a non-empty intersection. The associated search problem of finding the partition lies in the complexity class $\text{CLS} = \text{PPAD} \cap \text{PLS}$, but no hardness results are known. In the colorful Tverberg theorem, the points in $P$ have colors, and under certain conditions, $P$ can be partitioned into colorful sets, in which each color appears exactly once and whose convex hulls intersect. To date, the complexity of the associated search problem is unresolved. Recently, Adiprasito, Barany, and Mustafa gave a no-dimensional Tverberg theorem, in which the convex hulls may intersect in an approximate fashion. This relaxes the requirement on the cardinality of $P$. The argument is constructive, but does not result in a polynomial-time algorithm. We present a deterministic algorithm that finds for any $n$-point set $P \subset \mathbb{R}^d$ and any $k \in \{2, \dots, n\}$ in $O(nd \lceil{\log k}\rceil)$ time a $k$-partition of $P$ such that there is a ball of radius $O\left((k/\sqrt{n})\mathrm{diam(P)}\right)$ that intersects the convex hull of each set. Given that this problem is not known to be solvable exactly in polynomial time, our result provides a remarkably efficient and simple new notion of approximation. Our main contribution is to generalize Sarkaria's method to reduce the Tverberg problem to the Colorful Caratheodory problem (in the simplified tensor product interpretation of Barany and Onn) and to apply it algorithmically. It turns out that this not only leads to an alternative algorithmic proof of a no-dimensional Tverberg theorem, but it also generalizes to other settings such as the colorful variant of the problem. △ Less

Submitted 5 July, 2023; v1 submitted 9 July, 2019; originally announced July 2019.

Comments: 29 pages, 3 figures; a preliminary version appeared as SoCG 2020

Journal ref: Discrete and Computational Geometry (DCG), 68(4), December 2022, pp. 964-996

arXiv:1907.03222 [pdf, other]

doi 10.1145/3292500.3330703

IRNet: A General Purpose Deep Residual Regression Framework for Materials Discovery

Authors: Dipendra Jha, Logan Ward, Zijiang Yang, Christopher Wolverton, Ian Foster, Wei-keng Liao, Alok Choudhary, Ankit Agrawal

Abstract: Materials discovery is crucial for making scientific advances in many domains. Collections of data from experiments and first-principle computations have spurred interest in applying machine learning methods to create predictive models capable of mapping from composition and crystal structures to materials properties. Generally, these are regression problems with the input being a 1D vector compos… ▽ More Materials discovery is crucial for making scientific advances in many domains. Collections of data from experiments and first-principle computations have spurred interest in applying machine learning methods to create predictive models capable of mapping from composition and crystal structures to materials properties. Generally, these are regression problems with the input being a 1D vector composed of numerical attributes representing the material composition and/or crystal structure. While neural networks consisting of fully connected layers have been applied to such problems, their performance often suffers from the vanishing gradient problem when network depth is increased. In this paper, we study and propose design principles for building deep regression networks composed of fully connected layers with numerical vectors as input. We introduce a novel deep regression network with individual residual learning, IRNet, that places shortcut connections after each layer so that each layer learns the residual mapping between its output and input. We use the problem of learning properties of inorganic materials from numerical attributes derived from material composition and/or crystal structure to compare IRNet's performance against that of other machine learning techniques. Using multiple datasets from the Open Quantum Materials Database (OQMD) and Materials Project for training and evaluation, we show that IRNet provides significantly better prediction performance than the state-of-the-art machine learning approaches currently used by domain scientists. We also show that IRNet's use of individual residual learning leads to better convergence during the training phase than when shortcut connections are between multi-layer stacks while maintaining the same number of parameters. △ Less

Submitted 7 July, 2019; originally announced July 2019.

Comments: 9 pages, under publication at KDD'19

arXiv:1903.03178 [pdf, other]

Transfer Learning Using Ensemble Neural Networks for Organic Solar Cell Screening

Authors: Arindam Paul, Dipendra Jha, Reda Al-Bahrani, Wei-keng Liao, Alok Choudhary, Ankit Agrawal

Abstract: Organic Solar Cells are a promising technology for solving the clean energy crisis in the world. However, generating candidate chemical compounds for solar cells is a time-consuming process requiring thousands of hours of laboratory analysis. For a solar cell, the most important property is the power conversion efficiency which is dependent on the highest occupied molecular orbitals (HOMO) values… ▽ More Organic Solar Cells are a promising technology for solving the clean energy crisis in the world. However, generating candidate chemical compounds for solar cells is a time-consuming process requiring thousands of hours of laboratory analysis. For a solar cell, the most important property is the power conversion efficiency which is dependent on the highest occupied molecular orbitals (HOMO) values of the donor molecules. Recently, machine learning techniques have proved to be very useful in building predictive models for HOMO values of donor structures of Organic Photovoltaic Cells (OPVs). Since experimental datasets are limited in size, current machine learning models are trained on data derived from calculations based on density functional theory (DFT). Molecular line notations such as SMILES or InChI are popular input representations for describing the molecular structure of donor molecules. The two types of line representations encode different information, such as SMILES defines the bond types while InChi defines protonation. In this work, we present an ensemble deep neural network architecture, called SINet, which harnesses both the SMILES and InChI molecular representations to predict HOMO values and leverage the potential of transfer learning from a sizeable DFT-computed dataset- Harvard CEP to build more robust predictive models for relatively smaller HOPV datasets. Harvard CEP dataset contains molecular structures and properties for 2.3 million candidate donor structures for OPV while HOPV contains DFT-computed and experimental values of 350 and 243 molecules respectively. Our results demonstrate significant performance improvement from the use of transfer learning and leveraging both molecular representations. △ Less

Submitted 28 July, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

Comments: 8 pages, 11 figures, International Joint Conference on Neural Networks

Journal ref: International Joint Conference on Neural Networks, Budapest Hungary, 14-19 July 2019

Showing 1–50 of 68 results for author: Choudhary, A