-
Dilaton-Flattened Axion Inflation
Authors:
Pirzada,
Ali Muhammad,
Tianjun Li,
Imtiaz Khan,
Mussawir Khan
Abstract:
We present a solvable same-sector effective theory for anomaly-inspired axion inflation, in which a heavy trace-anomaly mode dynamically backreacts on the axion potential. The tree-level elimination of the radial field resums the backreaction into a closed-form Lambert-$W$ potential, naturally flattening the hilltop potential without external plateau operators. By deriving the exact trough metric,…
▽ More
We present a solvable same-sector effective theory for anomaly-inspired axion inflation, in which a heavy trace-anomaly mode dynamically backreacts on the axion potential. The tree-level elimination of the radial field resums the backreaction into a closed-form Lambert-$W$ potential, naturally flattening the hilltop potential without external plateau operators. By deriving the exact trough metric, we evaluate all the observables on the fully reduced one-field action, bypassing uncontrolled kinetic approximations. Calibrated at $N_\star=56$, reheating-compatible branches yield $r\simeq0.033$--$0.036$ and $α_s\simeq-(4.6$--$4.7)\times10^{-4}$, comfortably satisfying the current ACT/SPT/BICEP constraints. The evolution remains strictly adiabatic ($m_\perp^2/H^2\gtrsim6.1$, $Ω/H\lesssim7.6\times10^{-4}$) with negligible sound-speed and metric corrections. We provide analytic control over the constant-$w_{\rm eff}$ reheating map, the $N_{\rm re}=0$ boundary, and robustness against vacuum-offset deformations. This Lambert-$W$ backbone establishes a precise, deformable benchmark for confining axion inflation, with microscopic matching and reheating microphysics accessible as systematic EFT refinements.
△ Less
Submitted 16 April, 2026;
originally announced April 2026.
-
EviSearch: A Human in the Loop System for Extracting and Auditing Clinical Evidence for Systematic Reviews
Authors:
Naman Ahuja,
Saniya Mulla,
Muhammad Ali Khan,
Zaryab Bin Riaz,
Kaneez Zahra Rubab Khakwani,
Mohamad Bassam Sonbol,
Irbaz Bin Riaz,
Vivek Gupta
Abstract:
We present EviSearch, a multi-agent extraction system that automates the creation of ontology-aligned clinical evidence tables directly from native trial PDFs while guaranteeing per-cell provenance for audit and human verification. EviSearch pairs a PDF-query agent (which preserves rendered layout and figures) with a retrieval-guided search agent and a reconciliation module that forces page-level…
▽ More
We present EviSearch, a multi-agent extraction system that automates the creation of ontology-aligned clinical evidence tables directly from native trial PDFs while guaranteeing per-cell provenance for audit and human verification. EviSearch pairs a PDF-query agent (which preserves rendered layout and figures) with a retrieval-guided search agent and a reconciliation module that forces page-level verification when agents disagree. The pipeline is designed for high-precision extraction across multimodal evidence sources (text, tables, figures) and for generating reviewer-actionable provenance that clinicians can inspect and correct. On a clinician-curated benchmark of oncology trial papers, EviSearch substantially improves extraction accuracy relative to strong parsed-text baselines while providing comprehensive attribution coverage. By logging reconciler decisions and reviewer edits, the system produces structured preference and supervision signals that bootstrap iterative model improvement. EviSearch is intended to accelerate living systematic review workflows, reduce manual curation burden, and provide a safe, auditable path for integrating LLM-based extraction into evidence synthesis pipelines.
△ Less
Submitted 23 March, 2026;
originally announced April 2026.
-
ReConText3D: Replay-based Continual Text-to-3D Generation
Authors:
Muhammad Ahmed Ullah Khan,
Muhammad Haris Bin Amir,
Didier Stricker,
Muhammad Zeshan Afzal
Abstract:
Continual learning enables models to acquire new knowledge over time while retaining previously learned capabilities. However, its application to text-to-3D generation remains unexplored. We present ReConText3D, the first framework for continual text-to-3D generation. We first demonstrate that existing text-to-3D models suffer from catastrophic forgetting under incremental training. ReConText3D en…
▽ More
Continual learning enables models to acquire new knowledge over time while retaining previously learned capabilities. However, its application to text-to-3D generation remains unexplored. We present ReConText3D, the first framework for continual text-to-3D generation. We first demonstrate that existing text-to-3D models suffer from catastrophic forgetting under incremental training. ReConText3D enables generative models to incrementally learn new 3D categories from textual descriptions while preserving the ability to synthesize previously seen assets. Our method constructs a compact and diverse replay memory through text-embedding k-Center selection, allowing representative rehearsal of prior knowledge without modifying the underlying architecture. To systematically evaluate continual text-to-3D learning, we introduce Toys4K-CL, a benchmark derived from the Toys4K dataset that provides balanced and semantically diverse class-incremental splits. Extensive experiments on the Toys4K-CL benchmark show that ReConText3D consistently outperforms all baselines across different generative backbones, maintaining high-quality generation for both old and new classes. To the best of our knowledge, this work establishes the first continual learning framework and benchmark for text-to-3D generation, opening a new direction for incremental 3D generative modeling. Project page is available at: https://mauk95.github.io/ReConText3D/.
△ Less
Submitted 15 April, 2026;
originally announced April 2026.
-
GCA Framework: A Gulf-Grounded Dataset and Agentic Pipeline for Climate Decision Support
Authors:
Muhammad Umer Sheikh,
Khawar Shehzad,
Salman Khan,
Fahad Shahbaz Khan,
Muhammad Haris Khan
Abstract:
Climate decision-making in the Gulf increasingly demands systems that can translate heterogeneous scientific and policy evidence into actionable guidance, yet general-purpose large language models (LLMs) remain weak both in region-specific climate knowledge and grounded interaction with geospatial and forecasting tools. We present the GCA framework, which unifies (i) GCA-DS, a curated Gulf-focused…
▽ More
Climate decision-making in the Gulf increasingly demands systems that can translate heterogeneous scientific and policy evidence into actionable guidance, yet general-purpose large language models (LLMs) remain weak both in region-specific climate knowledge and grounded interaction with geospatial and forecasting tools. We present the GCA framework, which unifies (i) GCA-DS, a curated Gulf-focused multimodal dataset, and (ii) Gulf Climate Agent (GCA), a tool-augmented agent for climate analysis. GCA-DS comprises ~200k question-answer pairs spanning governmental policies and adaptation plans, NGO and international frameworks, academic literature, and event-driven reporting on heatwaves, dust storms, and floods, complemented with remote-sensing inputs that couple imagery with textual evidence. Building on this foundation, the GCA agent orchestrates a modular tool pipeline grounded in real-time and historical signals and geospatial processing that produces derived indices and interpretable visualizations. Finally, we benchmark open and proprietary LLMs on Gulf climate tasks and show that domain fine-tuning and tool integration substantially improve reliability over general-purpose baselines.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
GeoMeld: Toward Semantically Grounded Foundation Models for Remote Sensing
Authors:
Maram Hasan,
Md Aminur Hossain,
Savitra Roy,
Souparna Bhowmik,
Ayush V. Patel,
Mainak Singha,
Subhasis Chaudhuri,
Muhammad Haris Khan,
Biplab Banerjee
Abstract:
Effective foundation modeling in remote sensing requires spatially aligned heterogeneous modalities coupled with semantically grounded supervision, yet such resources remain limited at scale. We present GeoMeld, a large-scale multimodal dataset with approximately 2.5 million spatially aligned samples. The dataset spans diverse modalities and resolutions and is constructed under a unified alignment…
▽ More
Effective foundation modeling in remote sensing requires spatially aligned heterogeneous modalities coupled with semantically grounded supervision, yet such resources remain limited at scale. We present GeoMeld, a large-scale multimodal dataset with approximately 2.5 million spatially aligned samples. The dataset spans diverse modalities and resolutions and is constructed under a unified alignment protocol for modality-aware representation learning. GeoMeld provides semantically grounded language supervision through an agentic captioning framework that synthesizes and verifies annotations from spectral signals, terrain statistics, and structured geographic metadata, encoding measurable cross-modality relationships within textual descriptions. To leverage this dataset, we introduce GeoMeld-FM, a pretraining framework that combines multi-pretext masked autoencoding over aligned modalities, JEPA representation learning, and caption-vision contrastive alignment. This joint objective enables the learned representation space to capture both reliable cross-sensor physical consistency and grounded semantics. Experiments demonstrate consistent gains in downstream transfer and cross-sensor robustness. Together, GeoMeld and GeoMeld-FM establish a scalable reference framework for semantically grounded multi-modal foundation modeling in remote sensing.
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
SWE-Shepherd: Advancing PRMs for Reinforcing Code Agents
Authors:
Mahir Labib Dihan,
Md Ashrafur Rahman Khan
Abstract:
Automating real-world software engineering tasks remains challenging for large language model (LLM)-based agents due to the need for long-horizon reasoning over large, evolving codebases and making consistent decisions across interdependent actions. Existing approaches typically rely on static prompting strategies or handcrafted heuristics to select actions such as code editing, file navigation, a…
▽ More
Automating real-world software engineering tasks remains challenging for large language model (LLM)-based agents due to the need for long-horizon reasoning over large, evolving codebases and making consistent decisions across interdependent actions. Existing approaches typically rely on static prompting strategies or handcrafted heuristics to select actions such as code editing, file navigation, and test execution, but they lack fine-grained feedback on intermediate decisions. This leads to inefficient exploration, error propagation, and brittle solution trajectories. To address this limitation, we propose SWE-Shepherd, a framework that introduces Process Reward Models (PRMs) to provide dense, step-level supervision for repository-level code agents. Using trajectories from SWE-Bench, we construct an action-level reward dataset and train a lightweight reward model on a base LLM to estimate the usefulness of intermediate actions. During inference, the PRM evaluates candidate actions and guides the agent toward higher-reward decisions without requiring full reinforcement learning. Experiments on SWE-Bench Verified demonstrate improved interaction efficiency and action quality, while also highlighting challenges in aligning intermediate rewards with final task success.
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
Hall transports from Taub-NUT AdS black holes
Authors:
Mohd Aariyan Khan,
Hemant Rathi,
Dibakar Roychowdhury
Abstract:
We compute Hall transport coefficients associated with Taub-NUT AdS black holes in four space-time dimensions using the probe D-brane approach. In particular, we examine the effects due to the NUT parameter ($n$), or equivalently, the novel frame-dragging on the holographic charge transport properties. In our analysis, we treat the external electric field as a constant background, while varying th…
▽ More
We compute Hall transport coefficients associated with Taub-NUT AdS black holes in four space-time dimensions using the probe D-brane approach. In particular, we examine the effects due to the NUT parameter ($n$), or equivalently, the novel frame-dragging on the holographic charge transport properties. In our analysis, we treat the external electric field as a constant background, while varying the magnetic field ($B$) from small to finite. Within this framework, we analyze conductivities in both low and high temperature regions, focusing on locations that are both near and far from the Misner string. Our calculations show that frame-dragging effects are significant primarily at lower temperatures and near the Misner string, while a small magnetic field is maintained. However, these effects become negligibly small at a ``finite" magnetic field and even at lower temperatures. Our analysis reveals the existence of finite Hall transport, that has its origin in the novel frame-dragging.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
DSVTLA: Deep Swin Vision Transformer-Based Transfer Learning Architecture for Multi-Type Cancer Histopathological Cancer Image Classification
Authors:
Muazzem Hussain Khan,
Tasdid Hasnain,
Md. Jamil khan,
Ruhul Amin,
Md. Shamim Reza,
Md. Al Mehedi Hasan,
Md Ashad Alam
Abstract:
In this study, we proposed a deep Swin-Vision Transformer-based transfer learning architecture for robust multi-cancer histopathological image classification. The proposed framework integrates a hierarchical Swin Transformer with ResNet50-based convolution features extraction, enabling the model to capture both long-range contextual dependencies and fine-grained local morphological patterns within…
▽ More
In this study, we proposed a deep Swin-Vision Transformer-based transfer learning architecture for robust multi-cancer histopathological image classification. The proposed framework integrates a hierarchical Swin Transformer with ResNet50-based convolution features extraction, enabling the model to capture both long-range contextual dependencies and fine-grained local morphological patterns within histopathological images. To validate the efficiency of the proposed architecture, an extensive experiment was executed on a comprehensive multi-cancer dataset including Breast Cancer, Oral Cancer, Lung and Colon Cancer, Kidney Cancer, and Acute Lymphocytic Leukemia (ALL), including both original and segmented images were analyzed to assess model robustness across heterogeneous clinical imaging conditions. Our approach is benchmarked alongside several state-of-the-art CNN and transfer models, including DenseNet121, DenseNet201, InceptionV3, ResNet50, EfficientNetB3, multiple ViT variants, and Swin Transformer models. However, all models were trained and validated using a unified pipeline, incorporating balanced data preprocessing, transfer learning, and fine-tuning strategies. The experimental results demonstrated that our proposed architecture consistently gained superior performance, reaching 100% test accuracy for lung-colon cancer, segmented leukemia datasets, and up to 99.23% accuracy for breast cancer classification. The model also achieved near-perfect precision, f1 score, and recall, indicating highly stable scores across divers cancer types. Overall, the proposed model establishes a highly accurate, interpretable, and also robust multi-cancer classification system, demonstrating strong benchmark for future research and provides a unified comparative assessment useful for designing reliable AI-assisted histopathological diagnosis and clinical decision-making.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
A Multi-Stage Drop-the-Loser Design with Superiority Boundaries
Authors:
Peter Greenstreet,
Manel Khan,
Salmaan Kanji,
Pouya Motazedian,
Andrew Seely,
Stephanie Sibley,
Tim Ramsay
Abstract:
Multi-arm multi-stage (MAMS) trials have gained popularity, due to their improved efficiency in evaluating multiple treatments. A traditional MAMS trial often decreases the expected sample size of the trial compared to just running a multi-arm approach, but with the drawback of an increase in maximum sample size. For academic led trials this poses a particular challenge, as funding is typically ba…
▽ More
Multi-arm multi-stage (MAMS) trials have gained popularity, due to their improved efficiency in evaluating multiple treatments. A traditional MAMS trial often decreases the expected sample size of the trial compared to just running a multi-arm approach, but with the drawback of an increase in maximum sample size. For academic led trials this poses a particular challenge, as funding is typically based on the maximum required sample size. To address this, drop-the-loser designs were introduced, where a fixed number of treatments are dropped at each interim stage, thereby reducing the maximum sample size. In this work, we propose an enhanced multi-stage drop-the-loser design that also allows for early stopping of the entire trial for superiority. This approach aims to retain the benefits of a reduced maximum sample size while also lowering the expected sample size. The proposed design is motivated by a trial in atrial fibrillation. We derive analytical expressions for the type I error rate, power, and expected sample size, and compare the proposed design's performance to alternative methods. We outline the key requirements for implementing the proposed design and discuss the contexts in which it should be considered. For the motivating example the results show that the proposed design substantially reduces the expected sample size compared to a standard drop-the-loser design, while lowering the maximum sample size relative to running a traditional MAMS trial or multiple separate trials.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction
Authors:
Diana Romero,
Mutahar Ali,
Momin Ahmad Khan,
Habiba Farrukh,
Fatima Anwar,
Salma Elmalaki
Abstract:
Scanpath prediction models forecast the sequence and timing of human fixations during visual search, driving foveated rendering and attention-based interaction in mobile systems where their integrity is a first-class security concern. We present the first study of backdoor attacks against VLM-based scanpath prediction, evaluated on GazeFormer and COCO-Search18. We show that naive fixed-path attack…
▽ More
Scanpath prediction models forecast the sequence and timing of human fixations during visual search, driving foveated rendering and attention-based interaction in mobile systems where their integrity is a first-class security concern. We present the first study of backdoor attacks against VLM-based scanpath prediction, evaluated on GazeFormer and COCO-Search18. We show that naive fixed-path attacks, while effective, create detectable clustering in the continuous output space. To overcome this, we design two variable-output attacks: an input-aware spatial attack that redirects predicted fixations toward an attacker-chosen target object, and a scanpath duration attack that inflates fixation durations to delay visual search completion. Both attacks condition their output on the input scene, producing diverse and plausible scanpaths that evade cluster-based detection. We evaluate across three trigger modalities (visual, textual, and multimodal), multiple poisoning ratios, and five post-training defenses, finding that no defense simultaneously suppresses the attacks and preserves clean performance across all configurations. We further demonstrate that backdoor behavior survives quantization and deployment on both flagship and legacy commodity smartphones, confirming practical threat viability for edge-deployed gaze-driven systems.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset
Authors:
Tashreef Muhammad,
Tahsin Ahmed,
Meherun Farzana,
Md. Mahmudul Hasan,
Abrar Eyasir,
Md. Emon Khan,
Mahafuzul Islam Shawon,
Ferdous Mondol,
Mahmudul Hasan,
Muhammad Ibrahim
Abstract:
Accurate short-term forecasting of agricultural commodity prices is critical for food security planning and smallholder income stabilisation in developing economies, yet machine-learning-ready datasets for this purpose remain scarce in South Asia. This paper makes two contributions. First, we introduce AgriPriceBD, a benchmark dataset of 1,779 daily retail mid-prices for five Bangladeshi commoditi…
▽ More
Accurate short-term forecasting of agricultural commodity prices is critical for food security planning and smallholder income stabilisation in developing economies, yet machine-learning-ready datasets for this purpose remain scarce in South Asia. This paper makes two contributions. First, we introduce AgriPriceBD, a benchmark dataset of 1,779 daily retail mid-prices for five Bangladeshi commodities - garlic, chickpea, green chilli, cucumber, and sweet pumpkin - spanning July 2020 to June 2025, extracted from government reports via an LLM-assisted digitisation pipeline. Second, we evaluate seven forecasting approaches spanning classical models - naïve persistence, SARIMA, and Prophet - and deep learning architectures - BiLSTM, Transformer, Time2Vec-enhanced Transformer, and Informer - with Diebold-Mariano statistical significance tests. Commodity price forecastability is fundamentally heterogeneous: naïve persistence dominates on near-random-walk commodities. Time2Vec temporal encoding provides no statistically significant advantage over fixed sinusoidal encoding and causes catastrophic degradation on green chilli (+146.1% MAE, p<0.001). Prophet fails systematically, attributable to discrete step-function price dynamics incompatible with its smooth decomposition assumptions. Informer produces erratic predictions (variance up to 50x ground-truth), confirming sparse-attention Transformers require substantially larger training sets than small agricultural datasets provide. All code, models, and data are released publicly to support replication and future forecasting research on agricultural commodity markets in Bangladesh and similar developing economies.
△ Less
Submitted 27 March, 2026;
originally announced April 2026.
-
Agents for Agents: An Interrogator-Based Secure Framework for Autonomous Internet of Underwater Things
Authors:
Ali Akarma,
Toqeer Ali Syed,
Abdul Khadar Jilani,
Salman Jan,
Hammad Muneer,
Muazzam A. Khan,
Changli Yu
Abstract:
Autonomous underwater vehicles (AUVs) and sensor nodes increasingly support decentralized sensing and coordination in the Internet of Underwater Things (IoUT), yet most deployments rely on static trust once authentication is established, leaving long-duration missions vulnerable to compromised or behaviorally deviating agents. In this paper, an interrogator based structure is presented that incorp…
▽ More
Autonomous underwater vehicles (AUVs) and sensor nodes increasingly support decentralized sensing and coordination in the Internet of Underwater Things (IoUT), yet most deployments rely on static trust once authentication is established, leaving long-duration missions vulnerable to compromised or behaviorally deviating agents. In this paper, an interrogator based structure is presented that incorporates the idea of behavioral trust monitoring into underwater multi-agent operation without interfering with autonomy. Privileged interrogator module is a passive communication metadata analyzer that uses a lightweight transformer model to calculate dynamic trust scores, which are used to authorize the forwarding of mission critical data. Suspicious agents cause proportional monitoring and conditional restrictions, which allow fast containment and maintain network continuity. The evidence of trust is stored in a permissioned blockchain consortium which offers identity management which is not tampered and is decentralized without causing the overhead of public consensus mechanisms. Simulation based analysis shows that the evaluation of the result compares to a relative improvement of 21.7% in the detection accuracy compared to the static trust baselines with limited energy overhead. These findings suggest that behavior driven validation has the capability of reinforcing underwater coordination without compromising scalability and deployment.
△ Less
Submitted 5 April, 2026;
originally announced April 2026.
-
Dymnikova-Schwinger quantum-corrected slowly rotating wormholes: Photon and spinning particle dynamics
Authors:
A. Errehymy,
Y. Khedif,
M. Daoud,
B. Turimov,
M. A. Khan,
S. Usanov
Abstract:
This work studies light propagation near slowly rotating traversable wormholes supported by a quantum-inspired matter source. The model is based on the Dymnikova density profile, viewed as a gravitational analogue of the Schwinger mechanism, which yields a smooth, non-singular core. Quantum effects are included through the generalized uncertainty principle (GUP), introducing a minimal length scale…
▽ More
This work studies light propagation near slowly rotating traversable wormholes supported by a quantum-inspired matter source. The model is based on the Dymnikova density profile, viewed as a gravitational analogue of the Schwinger mechanism, which yields a smooth, non-singular core. Quantum effects are included through the generalized uncertainty principle (GUP), introducing a minimal length scale while preserving regularity. Within a stationary and axisymmetric framework, we construct rotating wormhole solutions sustained by the GUP-corrected Dymnikova-Schwinger profile. The geometry satisfies key conditions such as asymptotic flatness and the flare-out requirement, and incorporates rotational features like frame dragging. We then examine photon motion via null geodesics. Both rotation and quantum corrections modify the photon sphere structure, with rotation producing a splitting between co-rotating and counter-rotating trajectories. This results in small asymmetries in photon paths and the shadow. These results provide a novel and consistent framework to probe quantum-gravity imprints in strong-field optics.
△ Less
Submitted 4 April, 2026;
originally announced April 2026.
-
Topolons: Stable Particle-Like Remnants of Collapsed Vacuum Bubbles
Authors:
Muhammad Ghulam Khuwajah Khan
Abstract:
We study a three-form gauge sector in four spacetime dimensions coupled to electrically charged spherical membranes whose worldvolume dynamics are governed by a Dirac--Born--Infeld action. The associated four-form field strength has no local propagating degrees of freedom and contributes a branch-dependent vacuum energy. Motivated by the Hartle--Hawking--Wu selection argument, we restrict attentio…
▽ More
We study a three-form gauge sector in four spacetime dimensions coupled to electrically charged spherical membranes whose worldvolume dynamics are governed by a Dirac--Born--Infeld action. The associated four-form field strength has no local propagating degrees of freedom and contributes a branch-dependent vacuum energy. Motivated by the Hartle--Hawking--Wu selection argument, we restrict attention to the semiclassically admissible four form flux window for which the Hartle-Hawking wave function has support. We then endow the bubble wall with a worldvolume $U(1)$ gauge field carrying quantized monopole flux $n \in \mathbb{Z}$ and evaluate the full DBI energy of the resulting spherical configurations. We show that the energetically preferred branch collapses toward a microscopic core rather than stabilizing at finite radius, but for nonzero monopole flux the energy does not vanish in the collapsed limit. Instead, the bubble relaxes to a finite-energy remnant whose mass is set by the wall scale and the conserved flux. We interpret these objects as stable flux-supported particle-like states, which we call topolons. Within the admissible sector, the effective energy analysis distinguishes stable collapsed remnants from the contrasting runaway vacuum-decay channel, thereby isolating the sector relevant for cosmological relic formation. At macroscopic distances, topolons behave as heavy localized states and provide a concrete microphysical realization of a dark relic candidate. The detailed cosmological abundance and phenomenology are left for future work.
△ Less
Submitted 10 April, 2026; v1 submitted 3 April, 2026;
originally announced April 2026.
-
ECG Foundation Models and Medical LLMs for Agentic Cardiovascular Intelligence at the Edge: A Review and Outlook
Authors:
Mudassir Hasan Khan,
Ahmad Nayfeh,
Mudassir Masood,
Ali Ahmad Al-Shaikhi,
Muhammad Mahboob Ur Rahman,
Tareq Y. Al-Naffouri
Abstract:
Electrocardiogram (ECG) foundation models represent a paradigm shift from task-specific pipelines to generalizable architectures pre-trained on large-scale unlabeled waveform data. This survey presents a unified and deployment-aware review of foundation models and medical large language models (LLMs) for ECG intelligence in cardiovascular disease (CVD) diagnosis, monitoring, and clinical decision…
▽ More
Electrocardiogram (ECG) foundation models represent a paradigm shift from task-specific pipelines to generalizable architectures pre-trained on large-scale unlabeled waveform data. This survey presents a unified and deployment-aware review of foundation models and medical large language models (LLMs) for ECG intelligence in cardiovascular disease (CVD) diagnosis, monitoring, and clinical decision support. The central thesis of this survey paper is that next-generation cardiovascular AI systems will be inherently agentic, requiring the synergistic integration of two complementary model classes: (i) ECG foundation models that act as signal-level interpreters, learning rich electrophysiological representations via self-supervised and multimodal pretraining, and (ii) medical LLMs, trained on biomedical text corpora, that function as knowledge-based reasoning backbones for contextual inference, guideline alignment, and clinical decision support. Thus, the survey systematically reviews existing pool of generalist medical LLMs, as well as ECG foundation models that utilize techniques such as self-supervised learning, multimodal ECG-language alignment, vision transformer architectures, and possess capabilities such as zero-shot classification, automated report generation, and longitudinal risk modeling. Recognizing the constraints of consumer-grade wearable edge devices, we further examine model optimization techniques such as quantization, pruning, knowledge distillation, as well as the role of small language models in enabling low-latency, energy-efficient, and privacy-preserving ECG intelligence on edge platforms such as smartwatches. Finally, we outline future directions in multimodal ECG foundation models, agent-driven monitoring, and explainable, secure edge intelligence, with particular emphasis on real-time, on-device cardiovascular analytics in consumer electronics ecosystems.
△ Less
Submitted 2 April, 2026;
originally announced April 2026.
-
HandVQA: Diagnosing and Improving Fine-Grained Spatial Reasoning about Hands in Vision-Language Models
Authors:
MD Khalequzzaman Chowdhury Sayem,
Mubarrat Tajoar Chowdhury,
Yihalem Yimolal Tiruneh,
Muneeb A. Khan,
Muhammad Salman Ali,
Binod Bhattarai,
Seungryul Baek
Abstract:
Understanding the fine-grained articulation of human hands is critical in high-stakes settings such as robot-assisted surgery, chip manufacturing, and AR/VR-based human-AI interaction. Despite achieving near-human performance on general vision-language benchmarks, current vision-language models (VLMs) struggle with fine-grained spatial reasoning, especially in interpreting complex and articulated…
▽ More
Understanding the fine-grained articulation of human hands is critical in high-stakes settings such as robot-assisted surgery, chip manufacturing, and AR/VR-based human-AI interaction. Despite achieving near-human performance on general vision-language benchmarks, current vision-language models (VLMs) struggle with fine-grained spatial reasoning, especially in interpreting complex and articulated hand poses. We introduce HandVQA, a large-scale diagnostic benchmark designed to evaluate VLMs' understanding of detailed hand anatomy through visual question answering. Built upon high-quality 3D hand datasets (FreiHAND, InterHand2.6M, FPHA), our benchmark includes over 1.6M controlled multiple-choice questions that probe spatial relationships between hand joints, such as angles, distances, and relative positions. We evaluate several state-of-the-art VLMs (LLaVA, DeepSeek and Qwen-VL) in both base and fine-tuned settings, using lightweight fine-tuning via LoRA. Our findings reveal systematic limitations in current models, including hallucinated finger parts, incorrect geometric interpretations, and poor generalization. HandVQA not only exposes these critical reasoning gaps but provides a validated path to improvement. We demonstrate that the 3D-grounded spatial knowledge learned from our benchmark transfers in a zero-shot setting, significantly improving accuracy of model on novel downstream tasks like hand gesture recognition (+10.33%) and hand-object interaction (+2.63%).
△ Less
Submitted 27 March, 2026;
originally announced March 2026.
-
LHC Run-3, Dark Matter and Supersymmetric Spectra in the Supersymmetric Pati-Salam Model
Authors:
Ali Muhammad,
Imtiaz Khan,
Tianjun Li,
Shabbar Raza,
Mussawir Khan,
and Pirzada
Abstract:
Driven by the growing agreement between the experimentally measured muon anomalous magnetic moment and its SM prediction, we reexamine phenomenological consequences of the MSSM, which is embedded in the supersymmetric $SU(4)_C \times SU(2)_L \times SU(2)_R$ Pati-Salam model. In contrast to earlier studies that predominantly favored a specific sign for the Higgsino mass parameter, our analysis syst…
▽ More
Driven by the growing agreement between the experimentally measured muon anomalous magnetic moment and its SM prediction, we reexamine phenomenological consequences of the MSSM, which is embedded in the supersymmetric $SU(4)_C \times SU(2)_L \times SU(2)_R$ Pati-Salam model. In contrast to earlier studies that predominantly favored a specific sign for the Higgsino mass parameter, our analysis systematically explores both $μ> 0$, and $μ< 0$ scenarios in light of current collider, cosmological, and DM constraints. Within this framework, we identify viable parameter space regions where the observed DM relic density is reproduced through multiple mechanisms: co-annihilations involving sbottom-neutralino, gluino-neutralino, stop-neutralino, stau-neutralino, and chargino-neutralino coannihilation, as well as resonant s-annihilation channel via the pseudoscalar Higgs boson. We demonstrate that all such scenarios are consistent with present bounds from LHC supersymmetry searches, the Planck~2018 DM relic density bound, and current limits from DD DM searches. Our results reveal characteristic mass spectra associated with these mechanisms. In particular, sbottom-neutralino coannihilation typically requires sbottom masses near $2.8~\text{TeV}$, while gluino-neutralino and stop-neutralino coannihilation scenarios allow gluino masses in the range $1$--$3~\text{TeV}$ and stop masses between $1$ and $3.5~\text{TeV}$. In coannihilation-dominated regions, the stau and chargino masses may reach values as high as $3.8~\text{TeV}$, whereas viable $A$ resonance solutions are realized for pseudoscalar Higgs masses spanning approximately $1.6$--$3.8~\text{TeV}$. We anticipate that a portion of the parameter space will be accessible to supersymmetry searches in LHC Run-3 and future runs.
△ Less
Submitted 25 March, 2026;
originally announced March 2026.
-
MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage
Authors:
Ufaq Khan,
Umair Nawaz,
L D M S S Teja,
Numaan Saeed,
Muhammad Bilal,
Yutong Xie,
Mohammad Yaqub,
Muhammad Haris Khan
Abstract:
Vision Language Models (VLMs) are increasingly used for tasks like medical report generation and visual question answering. However, fluent diagnostic text does not guarantee safe visual understanding. In clinical practice, interpretation begins with pre-diagnostic sanity checks: verifying that the input is valid to read (correct modality and anatomy, plausible viewpoint and orientation, and no ob…
▽ More
Vision Language Models (VLMs) are increasingly used for tasks like medical report generation and visual question answering. However, fluent diagnostic text does not guarantee safe visual understanding. In clinical practice, interpretation begins with pre-diagnostic sanity checks: verifying that the input is valid to read (correct modality and anatomy, plausible viewpoint and orientation, and no obvious integrity violations). Existing benchmarks largely assume this step is solved, and therefore miss a critical failure mode: a model can produce plausible narratives even when the input is inconsistent or invalid. We introduce MedObvious, a 1,880-task benchmark that isolates input validation as a set-level consistency capability over small multi-panel image sets: the model must identify whether any panel violates expected coherence. MedObvious spans five progressive tiers, from basic orientation/modality mismatches to clinically motivated anatomy/viewpoint verification and triage-style cues, and includes five evaluation formats to test robustness across interfaces. Evaluating 17 different VLMs, we find that sanity checking remains unreliable: several models hallucinate anomalies on normal (negative-control) inputs, performance degrades when scaling to larger image sets, and measured accuracy varies substantially between multiple-choice and open-ended settings. These results show that pre-diagnostic verification remains unsolved for medical VLMs and should be treated as a distinct, safety-critical capability before deployment.
△ Less
Submitted 24 March, 2026;
originally announced March 2026.
-
Bridging the numerical-physical gap in acoustic holography via end-to-end differentiable structural optimization
Authors:
Moon Hwan Lee,
Mohd. Afzal Khan,
Akm Ashiquzzaman,
Eunbin Lee,
Jonghun Lee,
Euiheon Chung,
Hyuk-Sang Kwon,
Jae Youn Hwang
Abstract:
Acoustic holography provides a practical means of flexibly controlling acoustic wavefronts. However, high-fidelity shaping of acoustic fields remains constrained by the numerical-physical gap inherent in conventional phase-only designs. These approaches realize a two-dimensional phase-delay profile as a three-dimensional thickness-varying lens, while neglecting wave-matter interactions arising fro…
▽ More
Acoustic holography provides a practical means of flexibly controlling acoustic wavefronts. However, high-fidelity shaping of acoustic fields remains constrained by the numerical-physical gap inherent in conventional phase-only designs. These approaches realize a two-dimensional phase-delay profile as a three-dimensional thickness-varying lens, while neglecting wave-matter interactions arising from the lens structure. Here, we introduce an end-to-end, physics-aware differentiable structural optimization framework that directly incorporates three-dimensional lens geometries into the acoustic simulation and optimization loop. Using a novel differentiable relaxation, termed Differentiable Hologram Lens Approximation (DHLA), the lens geometry is treated as a differentiable design variable, ensuring intrinsic consistency between numerical design and physical realization. The resulting Thickness-Only Acoustic Holograms (TOAHs) significantly outperform state-of-the-art phase-only acoustic holograms (POAHs) in field reconstruction fidelity and precision under complex conditions. We further demonstrate the application of the framework to spatially selective neuromodulation in a neuropathic pain mouse model, highlighting its potential for non-invasive transcranial neuromodulation. In summary, by reconciling numerical design with physical realization, this work establishes a robust strategy for high-fidelity acoustic wavefront shaping in complex environments.
△ Less
Submitted 24 March, 2026;
originally announced March 2026.
-
Evaluating LLM-Based Test Generation Under Software Evolution
Authors:
Sabaat Haroon,
Mohammad Taha Khan,
Muhammad Ali Gulzar
Abstract:
Large Language Models (LLMs) are increasingly used for automated unit test generation. However, it remains unclear whether these tests reflect genuine reasoning about program behavior or simply reproduce superficial patterns learned during training. If the latter dominates, LLM-generated tests may exhibit weaknesses such as reduced coverage, missed regressions, and undetected faults. Understanding…
▽ More
Large Language Models (LLMs) are increasingly used for automated unit test generation. However, it remains unclear whether these tests reflect genuine reasoning about program behavior or simply reproduce superficial patterns learned during training. If the latter dominates, LLM-generated tests may exhibit weaknesses such as reduced coverage, missed regressions, and undetected faults. Understanding how LLMs generate tests and how those tests respond to code evolution is therefore essential. We present a large-scale empirical study of LLM-based test generation under program changes. Using an automated mutation-driven framework, we analyze how generated tests react to semantic-altering changes (SAC) and semantic-preserving changes (SPC) across eight LLMs and 22,374 program variants.
LLMs achieve strong baseline results, reaching 79% line coverage and 76% branch coverage with fully passing test suites on the original programs. However, performance degrades as programs evolve. Under SACs, the pass rate of newly generated tests drops to 66%, and branch coverage declines to 60%. More than 99% of failing SAC tests pass on the original program while executing the modified region, indicating residual alignment with the original behavior rather than adaptation to updated semantics. Performance also declines under SPCs despite unchanged functionality: pass rates fall to 79% and branch coverage to 69%. Although SPC edits preserve semantics, they often introduce larger syntactic changes, leading to instability in generated test suites. Models generate more new tests while discarding many baseline tests, suggesting sensitivity to lexical changes rather than true semantic impact. Overall, our results indicate that current LLM-based test generation relies heavily on surface-level cues and struggles to maintain regression awareness as programs evolve.
△ Less
Submitted 24 March, 2026;
originally announced March 2026.
-
Uncertainty quantification of holographic transport and energy loss for the hot and baryon-dense QGP
Authors:
Musa R. Khan,
Ayrton Nascimento,
Yumu Yang,
Joaquin Grefa,
Mauricio Hippert,
Jorge Noronha,
Claudia Ratti,
Romulo Rougemont
Abstract:
We investigate several transport coefficients across the phase diagram of a holographic Einstein-Maxwell-Dilaton (EMD) model of hot and dense QCD with $N_f=2+1$ flavors. Our results are obtained from an open-source implementation of this model in C++, publicly available as a module within the MUSES Framework. This code includes a new numerical method to extract thermodynamic quantities from near-b…
▽ More
We investigate several transport coefficients across the phase diagram of a holographic Einstein-Maxwell-Dilaton (EMD) model of hot and dense QCD with $N_f=2+1$ flavors. Our results are obtained from an open-source implementation of this model in C++, publicly available as a module within the MUSES Framework. This code includes a new numerical method to extract thermodynamic quantities from near-boundary asymptotics in holographic models, introduced here for the first time, which greatly improves numerical stability and performance in comparison to earlier implementations. Thanks to this improved technique, we are able to compute results for many realizations of our holographic model, sampled from a Bayesian posterior distribution constrained by lattice QCD results at zero chemical potential. This allows us to propagate lattice QCD error bars to predictions of transport coefficients in a wide window of temperature and baryon chemical potential, covering the crossover region, the neighborhood of the predicted critical point, and the line of first-order phase transition. The physical observables include baryon and thermal conductivities, baryon diffusion, shear and bulk viscosities, the jet-quenching parameter, the heavy-quark drag force, and Langevin diffusion coefficients. At vanishing baryon density, we compare our results to estimates extracted by the JETSCAPE Collaboration from heavy-ion data, with which we find good agreement.
△ Less
Submitted 20 March, 2026;
originally announced March 2026.
-
AURORA: Adaptive Unified Representation for Robust Ultrasound Analysis
Authors:
Ufaq Khan,
L. D. M. S. Sai Teja,
Ayuba Shakiru,
Mai A. Shaaban,
Yutong Xie,
Muhammad Bilal,
Muhammad Haris Khan
Abstract:
Ultrasound images vary widely across scanners, operators, and anatomical targets, which often causes models trained in one setting to generalize poorly to new hospitals and clinical conditions. The Foundation Model Challenge for Ultrasound Image Analysis (FMC-UIA) reflects this difficulty by requiring a single model to handle multiple tasks, including segmentation, detection, classification, and l…
▽ More
Ultrasound images vary widely across scanners, operators, and anatomical targets, which often causes models trained in one setting to generalize poorly to new hospitals and clinical conditions. The Foundation Model Challenge for Ultrasound Image Analysis (FMC-UIA) reflects this difficulty by requiring a single model to handle multiple tasks, including segmentation, detection, classification, and landmark regression across diverse organs and datasets. We propose a unified multi-task framework based on a transformer visual encoder from the Qwen3-VL family. Intermediate token features are projected into spatial feature maps and fused using a lightweight multi-scale feature pyramid, enabling both pixel-level predictions and global reasoning within a shared representation. Each task is handled by a small task-specific prediction head, while training uses task-aware sampling and selective loss balancing to manage heterogeneous supervision and reduce task imbalance. Our method is designed to be simple to optimize and adaptable across a wide range of ultrasound analysis tasks. The performance improved from 67% to 85% on the validation set and achieved an average score of 81.84% on the official test set across all tasks. The code is publicly available at: https://github.com/saitejalekkala33/FMCUIA-ISBI.git
△ Less
Submitted 19 March, 2026;
originally announced March 2026.
-
Impact of Differentials in SIMON32 Algorithm for Lightweight Security of Internet of Things
Authors:
Jonathan Cook,
Sabih ur Rehman,
M. Arif Khan
Abstract:
SIMON and SPECK were among the first efficient encryption algorithms introduced for resource-constrained applications. SIMON is suitable for Internet of Things (IoT) devices and has rapidly attracted the attention of the research community to understand its structure and analyse its security. To analyse the security of an encryption algorithm, researchers often employ cryptanalysis techniques. How…
▽ More
SIMON and SPECK were among the first efficient encryption algorithms introduced for resource-constrained applications. SIMON is suitable for Internet of Things (IoT) devices and has rapidly attracted the attention of the research community to understand its structure and analyse its security. To analyse the security of an encryption algorithm, researchers often employ cryptanalysis techniques. However, cryptanalysis is a resource and time-intensive task. To improve cryptanalysis efficiency, state-of-the-art research has proposed implementing heuristic search and sampling methods. Despite recent advances, the cryptanalysis of the SIMON cypher remains inefficient. Contributing factors are the large size of the difference distribution tables utilised in cryptanalysis and the scarcity of differentials with a high transition probability. To address these limitations, we introduce an analysis of differential properties of the SIMON32 cypher, revealing differential characteristics that pave the way for future efficiency enhancements. Our analysis has further increased the number of targeted rounds by identifying high probability differentials within a partial difference distribution table of the SIMON cypher, exceeding existing state-of-the-art benchmarks. The code designed for this work is available at https://github.com/johncook1979/simon32-analysis.
△ Less
Submitted 18 March, 2026;
originally announced March 2026.
-
Integrating Explainable Machine Learning and Mixed-Integer Optimization for Personalized Sleep Quality Intervention
Authors:
Mahfuz Ahmed Anik,
Mohsin Mahmud Topu,
Azmine Toushik Wasi,
Md Isfar Khan,
MD Manjurul Ahsan
Abstract:
Sleep quality is influenced by a complex interplay of behavioral, environmental, and psychosocial factors, yet most computational studies focus mainly on predictive risk identification rather than actionable intervention design. Although machine learning models can accurately predict subjective sleep outcomes, they rarely translate predictive insights into practical intervention strategies. To add…
▽ More
Sleep quality is influenced by a complex interplay of behavioral, environmental, and psychosocial factors, yet most computational studies focus mainly on predictive risk identification rather than actionable intervention design. Although machine learning models can accurately predict subjective sleep outcomes, they rarely translate predictive insights into practical intervention strategies. To address this gap, we propose a personalized predictive-prescriptive framework that integrates interpretable machine learning with mixed-integer optimization. A supervised classifier trained on survey data predicts sleep quality, while SHAP-based feature attribution quantifies the influence of modifiable factors. These importance measures are incorporated into a mixed-integer optimization model that identifies minimal and feasible behavioral adjustments, while modelling resistance to change through a penalty mechanism. The framework achieves strong predictive performance, with a test F1-score of 0.9544 and an accuracy of 0.9366. Sensitivity and Pareto analyses reveal a clear trade-off between expected improvement and intervention intensity, with diminishing returns as additional changes are introduced. At the individual level, the model generates concise recommendations, often suggesting one or two high-impact behavioral adjustments and sometimes recommending no change when expected gains are minimal. By integrating prediction, explanation, and constrained optimization, this framework demonstrates how data-driven insights can be translated into structured and personalized decision support for sleep improvement.
△ Less
Submitted 14 March, 2026;
originally announced March 2026.
-
Quantum Pattern Matching in Generalised Degenerate Strings
Authors:
Massimo Equi,
Md Rabiul Islam Khan,
Veli Mäkinen
Abstract:
A degenerate string is a sequence of sets of characters. A generalized degenerate (GD) string extends this notion to the sequence of sets of strings, where strings of the same set are of equal length. Finding an exact match for a pattern string inside a GD string can be done in $O(mn+N)$ time (Ascone et al., WABI 2024), where $m$ is the pattern length, $n$ is the number of strings and $N$ the tota…
▽ More
A degenerate string is a sequence of sets of characters. A generalized degenerate (GD) string extends this notion to the sequence of sets of strings, where strings of the same set are of equal length. Finding an exact match for a pattern string inside a GD string can be done in $O(mn+N)$ time (Ascone et al., WABI 2024), where $m$ is the pattern length, $n$ is the number of strings and $N$ the total length of strings constituting the GD string. We modify this algorithm to work under a quantum model of computation, achieving running time $\tilde{O}(\sqrt{mnN})$.
△ Less
Submitted 17 March, 2026;
originally announced March 2026.
-
Physics-Constrained Neural Closure for Lattice Boltzmann Large-Eddy Simulation
Authors:
Muhammad Idrees Khan,
Sauro Succi,
Hua-Dong Yao,
Giacomo Falcucci
Abstract:
We present a physics-constrained, data-driven subgrid-scale (SGS) stress closure for large-eddy simulation (LES) in the lattice Boltzmann method (LBM). Trained on filtered-downsampled (FD) data from LBM direct numerical simulation (DNS) of forced homogeneous isotropic turbulence (FHIT) spanning multiple filter widths, a compact neural network maps nine macroscopic derivative inputs - six strain-ra…
▽ More
We present a physics-constrained, data-driven subgrid-scale (SGS) stress closure for large-eddy simulation (LES) in the lattice Boltzmann method (LBM). Trained on filtered-downsampled (FD) data from LBM direct numerical simulation (DNS) of forced homogeneous isotropic turbulence (FHIT) spanning multiple filter widths, a compact neural network maps nine macroscopic derivative inputs - six strain-rate and three vorticity components - to the six independent components of the SGS stress tensor; a deviatoric projection is applied post-inference to obtain the traceless stress used in the solver. Training combines a stress data loss with physics terms for SGS energy-transfer (Pi) matching, rotational equivariance under cube rotations, and compatibility of the implied SGS forcing with the divergence-based coupling.
The predicted stress is coupled to the solver through a split strategy: a dissipative, strain-aligned contribution is represented through an effective-viscosity projection, while the remaining anisotropic residual is applied through a forcing term. This construction is intended to retain both backscatter (via the effective viscosity) and non-dissipative anisotropic effects (via the residual forcing), while remaining compatible with LBM deployment. In the cases considered here, a priori results show good agreement with FD references across stress components and SGS-transfer statistics, and a posteriori rollouts improve several energetic and statistical measures relative to static and dynamic Smagorinsky baselines. A preliminary transfer test in turbulent channel flow is also reported without retraining. Finally, we demonstrate production deployment via ONNX Runtime, with throughput comparable to a dynamic Smagorinsky baseline in the tested configuration.
△ Less
Submitted 16 March, 2026;
originally announced March 2026.
-
Structure-preserving preconditioning of discrete space-fractional diffusion equations with variable coefficient and θ-Method
Authors:
Muhammad Faisal Khan,
Asim Ilyas,
Rolf Krause,
Stefano Serra-Capizzano,
Cristina Tablino-Possio
Abstract:
This paper studies the spectral properties of large matrices and the preconditioning of linear systems, arising from the finite difference discretization of a time-dependent space-fractional diffusion equation with a variable coefficient $a(x)$ defined on $Ω\subset \mathbb{R}^d$, $d=1,2$. The model involves a one-sided Riemann-Liouville fractional derivative multiplied by the function $a(x)$, disc…
▽ More
This paper studies the spectral properties of large matrices and the preconditioning of linear systems, arising from the finite difference discretization of a time-dependent space-fractional diffusion equation with a variable coefficient $a(x)$ defined on $Ω\subset \mathbb{R}^d$, $d=1,2$. The model involves a one-sided Riemann-Liouville fractional derivative multiplied by the function $a(x)$, discretized by the shifted Gr"unwald formula in space and the $θ$-method in time. The resulting all-at-once linear systems exhibit a $(d+1)$-level Toeplitz-like matrix structure, with $d=1,2$ denoting the space dimension, while the additional level is due to the time variable.
A preconditioning strategy is developed based on the structural properties of the discretized operator. Using the generalized locally Toeplitz (GLT) theory, we analyze the spectral distribution of the unpreconditioned and preconditioned matrix sequences. The main novelty is that the analysis fully covers the case where the variable coefficient $a$ is nonconstant. Numerical results are provided to support the GLT based theoretical findings, and some possible extensions are briefly discussed.
△ Less
Submitted 16 March, 2026;
originally announced March 2026.
-
Balancing Multimodal Domain Generalization via Gradient Modulation and Projection
Authors:
Hongzhao Li,
Guohao Shen,
Shupan Li,
Mingliang Xu,
Muhammad Haris Khan
Abstract:
Multimodal Domain Generalization (MMDG) leverages the complementary strengths of multiple modalities to enhance model generalization on unseen domains. A central challenge in multimodal learning is optimization imbalance, where modalities converge at different speeds during training. This imbalance leads to unequal gradient contributions, allowing some modalities to dominate the learning process w…
▽ More
Multimodal Domain Generalization (MMDG) leverages the complementary strengths of multiple modalities to enhance model generalization on unseen domains. A central challenge in multimodal learning is optimization imbalance, where modalities converge at different speeds during training. This imbalance leads to unequal gradient contributions, allowing some modalities to dominate the learning process while others lag behind. Existing balancing strategies typically regulate each modality's gradient contribution based on its classification performance on the source domain to alleviate this issue. However, relying solely on source-domain accuracy neglects a key insight in MMDG: modalities that excel on the source domain may generalize poorly to unseen domains, limiting cross-domain gains. To overcome this limitation, we propose Gradient Modulation Projection (GMP), a unified strategy that promotes balanced optimization in MMDG. GMP first decouples gradients associated with classification and domain-invariance objectives. It then modulates each modality's gradient based on semantic and domain confidence. Moreover, GMP dynamically adjusts gradient projections by tracking the relative strength of each task, mitigating conflicts between classification and domain-invariant learning within modality-specific encoders. Extensive experiments demonstrate that GMP achieves state-of-the-art performance and integrates flexibly with diverse MMDG methods, significantly improving generalization across multiple benchmarks.
△ Less
Submitted 14 March, 2026;
originally announced March 2026.
-
Deep Learning Based Estimation of Blood Glucose Levels from Multidirectional Scleral Blood Vessel Imaging
Authors:
Muhammad Ahmed Khan,
Manqiang Peng,
Ding Lin,
Saif Ur Rehman Khan
Abstract:
Regular monitoring of glycemic status is essential for diabetes management, yet conventional blood-based testing can be burdensome for frequent assessment. The sclera contains superficial microvasculature that may exhibit diabetes related alterations and is readily visible on the ocular surface. We propose ScleraGluNet, a multiview deep-learning framework for three-class metabolic status classific…
▽ More
Regular monitoring of glycemic status is essential for diabetes management, yet conventional blood-based testing can be burdensome for frequent assessment. The sclera contains superficial microvasculature that may exhibit diabetes related alterations and is readily visible on the ocular surface. We propose ScleraGluNet, a multiview deep-learning framework for three-class metabolic status classification (normal, controlled diabetes, and high-glucose diabetes) and continuous fasting plasma glucose (FPG) estimation from multidirectional scleral vessel images. The dataset comprised 445 participants (150/140/155) and 2,225 anterior-segment images acquired from five gaze directions per participant. After vascular enhancement, features were extracted using parallel convolutional branches, refined with Manta Ray Foraging Optimization (MRFO), and fused via transformer-based cross-view attention. Performance was evaluated using subject-wise five-fold cross-validation, with all images from each participant assigned to the same fold. ScleraGluNet achieved 93.8% overall accuracy, with one-vs-rest AUCs of 0.971,0.956, and 0.982 for normal, controlled diabetes, and high-glucose diabetes, respectively. For FPG estimation, the model achieved MAE = 6.42 mg/dL and RMSE = 7.91 mg/dL, with strong correlation to laboratory measurements (r = 0.983; R2 = 0.966). Bland Altman analysis showed a mean bias of +1.45 mg/dL with 95% limits of agreement from -8.33 to +11.23$ mg/dL. These results support multidirectional scleral vessel imaging with multiview learning as a promising noninvasive approach for glycemic assessment, warranting multicenter validation before clinical deployment.
△ Less
Submitted 13 March, 2026;
originally announced March 2026.
-
Fractional Rotation, Full Potential? Investigating Performance and Convergence of Partial RoPE
Authors:
Mohammad Aflah Khan,
Krishna P. Gummadi,
Manish Gupta,
Abhilasha Ravichander
Abstract:
Rotary Positional Embedding (RoPE) is a common choice in transformer architectures for encoding relative positional information. Although earlier work has examined omitting RoPE in specific layers, the effect of varying the fraction of hidden dimensions that receive rotary transformations remains largely unexplored. This design choice can yield substantial memory savings, which becomes especially…
▽ More
Rotary Positional Embedding (RoPE) is a common choice in transformer architectures for encoding relative positional information. Although earlier work has examined omitting RoPE in specific layers, the effect of varying the fraction of hidden dimensions that receive rotary transformations remains largely unexplored. This design choice can yield substantial memory savings, which becomes especially significant at long context lengths. We find up to 10x memory savings over the standard RoPE cache, while achieving comparable final loss. In this work, we present a systematic study examining the impact of partial RoPE on training dynamics and convergence across architectures and datasets. Our findings uncover several notable patterns: (1) applying RoPE to only a small fraction of dimensions (around 10%) achieves convergence comparable to using full RoPE; (2) these trends hold consistently across model size, sequence lengths and datasets of varying quality and architectures, with higher-quality data resulting in lower overall loss and similar benchmark performance; and (3) some models trained with NoPE (No Positional Encoding) showcase unstable learning trajectories, which can be alleviated through minimal RoPE application or QK-Norm which converges to a higher loss. Together, these results offer practical guidance for model designers aiming to balance efficiency and training stability, while emphasizing the previously overlooked importance of partial RoPE.
△ Less
Submitted 12 March, 2026;
originally announced March 2026.
-
STM32-Based Smart Waste Bin for Hygienic Disposal Using Embedded Sensing and Automated Control
Authors:
Mohammed Aman Bhuiyan,
Aritra Islam Saswato,
Md. Misbah Khan,
Anish Paul,
Ahmed Faizul Haque Dhrubo,
Mohammad Abdul Qayum
Abstract:
The increasing demand for hygienic and contactless solutions in public and private environments has encouraged the development of automated systems for everyday applications. This paper presents the design and implementation of a motion- sensing automatic waste bin using an STM32 microcontroller, ultrasonic sensors, and a servo motor. The system detects user presence through ultrasonic sensing and…
▽ More
The increasing demand for hygienic and contactless solutions in public and private environments has encouraged the development of automated systems for everyday applications. This paper presents the design and implementation of a motion- sensing automatic waste bin using an STM32 microcontroller, ultrasonic sensors, and a servo motor. The system detects user presence through ultrasonic sensing and automatically opens the bin lid using a servo motor controlled by the microcontroller. An additional ultrasonic sensor is used to monitor the internal waste level of the bin, while an OLED display provides real- time feedback regarding system status. The proposed system offers a low-cost, reliable, and easily deployable solution for touch-free waste disposal. Experimental evaluation demonstrates fast response time, stable sensing performance, and smooth mechanical operation. The system can be effectively deployed in homes, educational institutions, hospitals, and public facilities to improve hygiene and user convenience.
△ Less
Submitted 11 March, 2026;
originally announced March 2026.
-
Enhanced Random Subspace Local Projections for High-Dimensional Time Series Analysis
Authors:
Eman Khalid,
Moimma Ali Khan,
Zarmeena Ali,
Abdullah Illyas,
Muhammad Usman,
Saoud Ahmed
Abstract:
High-dimensional time series forecasting suffers from severe overfitting when the number of predictors exceeds available observations, making standard local projection methods unstable and unreliable. We propose an enhanced Random Subspace Local Projection (RSLP) framework designed to deliver robust impulse response estimation in the presence of hundreds of correlated predictors. The method introd…
▽ More
High-dimensional time series forecasting suffers from severe overfitting when the number of predictors exceeds available observations, making standard local projection methods unstable and unreliable. We propose an enhanced Random Subspace Local Projection (RSLP) framework designed to deliver robust impulse response estimation in the presence of hundreds of correlated predictors. The method introduces weighted subspace aggregation, category-aware subspace sampling, adaptive subspace size selection, and a bootstrap inference procedure tailored to dependent data. These enhancements substantially improve estimator stability at longer forecast horizons while providing more reliable finite-sample inference. Experiments on synthetic data, macroeconomic indicators, and the FRED-MD dataset demonstrate a 33 percent reduction in estimator variability at horizons h >= 3 through adaptive subspace size selection. The bootstrap inference procedure produces conservative confidence intervals that are 14 percent narrower at policy-relevant horizons in very high-dimensional settings (FRED-MD with 126 predictors) while maintaining proper coverage. The framework provides practitioners with a principled approach for incorporating rich information sets into impulse response analysis without the instability of traditional high-dimensional methods.
△ Less
Submitted 8 March, 2026;
originally announced March 2026.
-
Realizing microrheological response of configurable viscoelastic media with a dynamic optical trap
Authors:
Sanatan Halder,
Manas Khan
Abstract:
The local viscoelastic (VE) environment governs the motion of an embedded microsphere and consequently, pertinent dynamical phenomena. However, studying such phenomena with varying VE properties remains challenging for various reasons, including the strong coupling among the VE parameters and their dependence on experimental conditions, such as temperature. Here, we demonstrate the experimental re…
▽ More
The local viscoelastic (VE) environment governs the motion of an embedded microsphere and consequently, pertinent dynamical phenomena. However, studying such phenomena with varying VE properties remains challenging for various reasons, including the strong coupling among the VE parameters and their dependence on experimental conditions, such as temperature. Here, we demonstrate the experimental realization of configurable VE media with broad variations, wherein the VE properties can be systematically and independently tuned, employing a dynamic optical trap. Specifically, the dynamics of a particle in a slowly diffusing optical trap provides the linear microrheological response of single-relaxation VE fluids, namely, Jeffreys or Maxwell-Voigt (MV) fluids, where the trap strength and its diffusion coefficient regulate the elastic response and the low-frequency viscosity, respectively. We validate this approach by comparing the experimentally observed dynamics of the trapped bead with those of a probe particle in real single-relaxation complex fluids, analytical predictions, and simulation results following harmonically bound Brownian particle with long-time diffusion model describing MV fluids. We extend the applicability of this scheme for realizing the microrheological response of double-relaxation VE media by incorporating appropriately correlated noise in the trap trajectory, signifying its validity for any linear VE media with multiple relaxations. Our scheme can be further extended to realize probe particle dynamics in an active VE environment, e.g., an entangled network of active polymers, by translating the trap along an active Brownian trajectory. Therefore, our scheme enables systematic microrheological studies in VE regimes that are otherwise challenging to realize or not readily accessible with real materials.
△ Less
Submitted 7 March, 2026;
originally announced March 2026.
-
TaPD: Temporal-adaptive Progressive Distillation for Observation-Adaptive Trajectory Forecasting in Autonomous Driving
Authors:
Mingyu Fan,
Yi Liu,
Hao Zhou,
Deheng Qian,
Mohammad Haziq Khan,
Matthias Raetsch
Abstract:
Trajectory prediction is essential for autonomous driving, enabling vehicles to anticipate the motion of surrounding agents to support safe planning. However, most existing predictors assume fixed-length histories and suffer substantial performance degradation when observations are variable or extremely short in real-world settings (e.g., due to occlusion or a limited sensing range). We propose Ta…
▽ More
Trajectory prediction is essential for autonomous driving, enabling vehicles to anticipate the motion of surrounding agents to support safe planning. However, most existing predictors assume fixed-length histories and suffer substantial performance degradation when observations are variable or extremely short in real-world settings (e.g., due to occlusion or a limited sensing range). We propose TaPD (Temporal-adaptive Progressive Distillation), a unified plug-and-play framework for observation-adaptive trajectory forecasting under variable history lengths. TaPD comprises two cooperative modules: an Observation-Adaptive Forecaster (OAF) for future prediction and a Temporal Backfilling Module (TBM) for explicit reconstruction of the past. OAF is built on progressive knowledge distillation (PKD), which transfers motion pattern knowledge from long-horizon "teachers" to short-horizon "students" via hierarchical feature regression, enabling short observations to recover richer motion context. We further introduce a cosine-annealed distillation weighting scheme to balance forecasting supervision and feature alignment, improving optimization stability and cross-length consistency. For extremely short histories where implicit alignment is insufficient, TBM backfills missing historical segments conditioned on scene evolution, producing context-rich trajectories that strengthen PKD and thereby improve OAF. We employ a decoupled pretrain-reconstruct-finetune protocol to preserve real-motion priors while adapting to backfilled inputs. Extensive experiments on Argoverse 1 and Argoverse 2 show that TaPD consistently outperforms strong baselines across all observation lengths, delivers especially large gains under very short inputs, and improves other predictors (e.g., HiVT) in a plug-and-play manner. Code will be available at https://github.com/zhouhao94/TaPD.
△ Less
Submitted 6 March, 2026;
originally announced March 2026.
-
DreamCAD: Scaling Multi-modal CAD Generation using Differentiable Parametric Surfaces
Authors:
Mohammad Sadil Khan,
Muhammad Usama,
Rolandos Alexandros Potamias,
Didier Stricker,
Muhammad Zeshan Afzal,
Jiankang Deng,
Ismail Elezi
Abstract:
Computer-Aided Design (CAD) relies on structured and editable geometric representations, yet existing generative methods are constrained by small annotated datasets with explicit design histories or boundary representation (BRep) labels. Meanwhile, millions of unannotated 3D meshes remain untapped, limiting progress in scalable CAD generation. To address this, we propose DreamCAD, a multi-modal ge…
▽ More
Computer-Aided Design (CAD) relies on structured and editable geometric representations, yet existing generative methods are constrained by small annotated datasets with explicit design histories or boundary representation (BRep) labels. Meanwhile, millions of unannotated 3D meshes remain untapped, limiting progress in scalable CAD generation. To address this, we propose DreamCAD, a multi-modal generative framework that directly produces editable BReps from point-level supervision, without CAD-specific annotations. DreamCAD represents each BRep as a set of parametric patches (e.g., Bézier surfaces) and uses a differentiable tessellation method to generate meshes. This enables large-scale training on 3D datasets while reconstructing connected and editable surfaces. Furthermore, we introduce CADCap-1M, the largest CAD captioning dataset to date, with 1M+ descriptions generated using GPT-5 for advancing text-to-CAD research. DreamCAD achieves state-of-the-art performance on ABC and Objaverse benchmarks across text, image, and point modalities, improving geometric fidelity and surpassing 75% user preference. Code and dataset will be publicly available.
△ Less
Submitted 5 March, 2026;
originally announced March 2026.
-
Data-Driven Optimization of Multi-Generational Cellular Networks: A Performance Classification Framework for Strategic Infrastructure Management
Authors:
Maryam Sabahat,
M. Umar Khan
Abstract:
The exponential growth in mobile data demand necessitates intelligent management of telecommunications infrastructure to ensure Quality of Service (QoS) and operational efficiency. This paper presents a comprehensive analysis of a multigenerational cellular network dataset, sourced from the OpenCelliD project, to identify patterns in network deployment, utilization, and infrastructure gaps. The me…
▽ More
The exponential growth in mobile data demand necessitates intelligent management of telecommunications infrastructure to ensure Quality of Service (QoS) and operational efficiency. This paper presents a comprehensive analysis of a multigenerational cellular network dataset, sourced from the OpenCelliD project, to identify patterns in network deployment, utilization, and infrastructure gaps. The methodology involves geographical, temporal, and performance analysis of 1,818 cell tower entries, predominantly Long Term Evolution (LTE), across three countries with a significant concentration in Pakistan. Key findings reveal the long-term persistence of legacy 2G/3G infrastructure in major urban centers, the existence of a substantial number of under-utilized towers representing opportunities for cost savings, and the identification of specific "non-4G demand zones" where active user bases are served by outdated technologies. By introducing a signal-density metric, we distinguish between absolute over-utilization and localized congestion. The results provide actionable intelligence for Mobile Network Operators (MNOs) to guide strategic LTE upgrades, optimize resource allocation, and bridge the digital divide in underserved regions.
△ Less
Submitted 16 February, 2026;
originally announced March 2026.
-
Kinematic budget of quantum correlations
Authors:
Maaz Khan,
Subhadip Mitra
Abstract:
The diversity of quantum correlations -- discord, entanglement, steering, and Bell nonlocality -- disappears at the observable second-moment kinematic level. By treating state purity as a finite resource, we introduce a local-unitary-invariant budget split of symmetrised second moments into local and nonlocal sectors that maps quantum systems onto compact, two-dimensional, hole-free manifolds. The…
▽ More
The diversity of quantum correlations -- discord, entanglement, steering, and Bell nonlocality -- disappears at the observable second-moment kinematic level. By treating state purity as a finite resource, we introduce a local-unitary-invariant budget split of symmetrised second moments into local and nonlocal sectors that maps quantum systems onto compact, two-dimensional, hole-free manifolds. The topology of these manifolds is governed by state purity and time-reversal symmetry. This dimensional reduction reveals a deep structural link: exceeding classical capacity limits forces the activation of time-asymmetric generators, guaranteeing non-positive partial transpose entanglement. For two qubits, the geometry is analytically solvable. A single boundary elegantly isolates classical correlations, while nested regions physically dictate entanglement, steering, Bell nonlocality, and bounds on non-stabiliser magic. Beyond two qubits, dimensional capacity bottlenecks enforce these universal kinematic limits on correlation structures. Because this macroscopic representation is completely determined by global and marginal purities, it bypasses the exponential scaling of full-state tomography. By coarse-graining over gauge-like first moments, the budget geometry acts as a thermodynamic phase diagram, exposing both the static hierarchy of quantum resources and their dynamic redistribution under decoherence.
△ Less
Submitted 4 March, 2026;
originally announced March 2026.
-
A Stein Identity for q-Gaussians with Bounded Support
Authors:
Sophia Sklaviadis,
Thomas Moellenhoff,
Andre F. T. Martins,
Mario A. T. Figueiredo,
Mohammad Emtiyaz Khan
Abstract:
Stein's identity is a fundamental tool in machine learning with applications in generative models, stochastic optimization, and other problems involving gradients of expectations under Gaussian distributions. Less attention has been paid to problems with non-Gaussian expectations. Here, we consider the class of bounded-support $q$-Gaussians and derive a new Stein identity leading to gradient estim…
▽ More
Stein's identity is a fundamental tool in machine learning with applications in generative models, stochastic optimization, and other problems involving gradients of expectations under Gaussian distributions. Less attention has been paid to problems with non-Gaussian expectations. Here, we consider the class of bounded-support $q$-Gaussians and derive a new Stein identity leading to gradient estimators which have nearly identical forms to the Gaussian ones, and which are similarly easy to implement. We do this by extending the previous results of Landsman, Vanduffel, and Yao (2013) to prove new Bonnet- and Price-type theorems for q-Gaussians. We also simplify their forms by using escort distributions. Our experiments show that bounded-support distributions can reduce the variance of gradient estimators, which can potentially be useful for Bayesian deep learning and sharpness-aware minimization. Overall, our work simplifies the application of Stein's identity for an important class of non-Gaussian distributions.
△ Less
Submitted 3 March, 2026;
originally announced March 2026.
-
Ethical and Explainable AI in Reusable MLOps Pipelines
Authors:
Rakib Hossain,
Mahmood Menon Khan,
Lisan Al Amin,
Dhruv Parikh,
Farhana Afroz,
Bestoun S. Ahmed
Abstract:
This paper introduces a unified machine learning operations (MLOps) framework that brings ethical artificial intelligence principles into practical use by enforcing fairness, explainability, and governance throughout the machine learning lifecycle. The proposed method reduces bias by lowering the demographic parity difference (DPD) from 0.31 to 0.04 without model retuning, and cross-dataset valida…
▽ More
This paper introduces a unified machine learning operations (MLOps) framework that brings ethical artificial intelligence principles into practical use by enforcing fairness, explainability, and governance throughout the machine learning lifecycle. The proposed method reduces bias by lowering the demographic parity difference (DPD) from 0.31 to 0.04 without model retuning, and cross-dataset validation achieves an area under the curve (AUC) of 0.89 on the Statlog Heart dataset.
The framework maintains fairness metrics within operational limits across all deployments. Model deployment is blocked if the DPD exceeds 0.05 or if equalized odds (EO) exceeds 0.05 on the validation set. After deployment, retraining is automatically triggered if the 30-day Kolmogorov-Smirnov drift statistic exceeds 0.20. In production, the system consistently achieved DPD <= 0.05 and EO <= 0.03, while the KS statistic remained <= 0.20.
Decision-curve analysis indicates a positive net benefit in the 10 to 20 percent operating range, showing that the mitigated model preserves predictive utility while satisfying fairness constraints. These results demonstrate that automated fairness gates and explainability artefacts can be successfully deployed in production without disrupting operational flow, providing organizations with a practical and credible approach to implementing ethical, transparent, and trustworthy AI across diverse datasets and operational settings.
△ Less
Submitted 15 February, 2026;
originally announced March 2026.
-
Non-Minimal Dilaton Inflation from the Effective Gluodynamics
Authors:
Pirzada,
Imtiaz Khan,
Mussawair Khan,
Tianjun Li,
Ali Muhammad
Abstract:
We study single-field inflation in which the inflaton is identified with the lightest scalar (dilaton) excitation of a confining gauge theory. The inflaton potential is not postulated: it follows from the pure effective Gluodynamics Lagrangian tightly constrained by the trace anomaly and the associated infinite tower of Ward identities, yielding a Coleman--Weinberg form with a logarithmic term fix…
▽ More
We study single-field inflation in which the inflaton is identified with the lightest scalar (dilaton) excitation of a confining gauge theory. The inflaton potential is not postulated: it follows from the pure effective Gluodynamics Lagrangian tightly constrained by the trace anomaly and the associated infinite tower of Ward identities, yielding a Coleman--Weinberg form with a logarithmic term fixed by nonperturbative condensates. After coupling to gravity via a non-minimal interaction $ξ\,\varphi^2 R$, the Einstein-frame potential develops a plateau consistent with current CMB observables. In the large-$ξ$ limit the model approaches the standard plateau attractor, while the Migdal--Shifman(MS) logarithmic structure induces a controlled, testable deformation governed by $A/λ$ across the CMB window. We quantify the resulting shifts in $(n_s,r)$ and the running analytically and confirm them with numerical scans over $(ξ,λ,A,μ)$, making the departure from the attractor both microphysically motivated and observationally predictive.
△ Less
Submitted 28 February, 2026;
originally announced March 2026.
-
BiCLIP: Bidirectional and Consistent Language-Image Processing for Robust Medical Image Segmentation
Authors:
Saivan Talaei,
Fatemeh Daneshfar,
Abdulhady Abas Abdullah,
Mustaqeem Khan
Abstract:
Medical image segmentation is a cornerstone of computer-assisted diagnosis and treatment planning. While recent multimodal vision-language models have shown promise in enhancing semantic understanding through textual descriptions, their resilience in "in-the-wild" clinical settings-characterized by scarce annotations and hardware-induced image degradations-remains under-explored.
We introduce Bi…
▽ More
Medical image segmentation is a cornerstone of computer-assisted diagnosis and treatment planning. While recent multimodal vision-language models have shown promise in enhancing semantic understanding through textual descriptions, their resilience in "in-the-wild" clinical settings-characterized by scarce annotations and hardware-induced image degradations-remains under-explored.
We introduce BiCLIP (Bidirectional and Consistent Language-Image Processing), a framework engineered to bolster robustness in medical segmentation. BiCLIP features a bidirectional multimodal fusion mechanism that enables visual features to iteratively refine textual representations, ensuring superior semantic alignment. To further stabilize learning, we implement an augmentation consistency objective that regularizes intermediate representations against perturbed input views.
Evaluation on the QaTa-COV19 and MosMedData+ benchmarks demonstrates that BiCLIP consistently surpasses state-of-the-art image-only and multimodal baselines. Notably, BiCLIP maintains high performance when trained on as little as 1% of labeled data and exhibits significant resistance to clinical artifacts, including motion blur and low-dose CT noise.
△ Less
Submitted 25 February, 2026;
originally announced March 2026.
-
Comb-locked cavity ring-down spectroscopy of CO2 at 2-micron wavelength
Authors:
Muhammad Asad Khan,
Vittorio D'Agostino,
Stefania Gravina,
Livio Gianfrani,
Antonio Castrillo
Abstract:
We report on a comb-locked cavity ring-down spectrometer developed for high-precision, SI-traceable, molecular spectroscopy of air-broadened CO2 gas samples. The experimental setup relies on the use of a singly-resonant optical parametric oscillator that acts as an intermediate link between a 2 micron external-cavity diode laser and an optical frequency comb stabilized against a GPS-disciplined Rb…
▽ More
We report on a comb-locked cavity ring-down spectrometer developed for high-precision, SI-traceable, molecular spectroscopy of air-broadened CO2 gas samples. The experimental setup relies on the use of a singly-resonant optical parametric oscillator that acts as an intermediate link between a 2 micron external-cavity diode laser and an optical frequency comb stabilized against a GPS-disciplined Rb-clock. Absorption spectra of the R(50) ro-vibrational component of the CO2 20012-00001 band have been recorded with high precision and fidelity. As a result of a refined spectral analysis, based on the implementation of the modified Hartmann-Tran profile, line center frequencies, pressure broadening and pressure shifting coefficients have been determined. Finally, we demonstrate the measurement of CO2 mole fractions with a subpromille statistical uncertainty
△ Less
Submitted 27 February, 2026;
originally announced February 2026.
-
Towards Multimodal Domain Generalization with Few Labels
Authors:
Hongzhao Li,
Hao Dong,
Hualei Wan,
Shupan Li,
Mingliang Xu,
Muhammad Haris Khan
Abstract:
Multimodal models ideally should generalize to unseen domains while remaining data-efficient to reduce annotation costs. To this end, we introduce and study a new problem, Semi-Supervised Multimodal Domain Generalization (SSMDG), which aims to learn robust multimodal models from multi-source data with few labeled samples. We observe that existing approaches fail to address this setting effectively…
▽ More
Multimodal models ideally should generalize to unseen domains while remaining data-efficient to reduce annotation costs. To this end, we introduce and study a new problem, Semi-Supervised Multimodal Domain Generalization (SSMDG), which aims to learn robust multimodal models from multi-source data with few labeled samples. We observe that existing approaches fail to address this setting effectively: multimodal domain generalization methods cannot exploit unlabeled data, semi-supervised multimodal learning methods ignore domain shifts, and semi-supervised domain generalization methods are confined to single-modality inputs. To overcome these limitations, we propose a unified framework featuring three key components: Consensus-Driven Consistency Regularization, which obtains reliable pseudo-labels through confident fused-unimodal consensus; Disagreement-Aware Regularization, which effectively utilizes ambiguous non-consensus samples; and Cross-Modal Prototype Alignment, which enforces domain- and modality-invariant representations while promoting robustness under missing modalities via cross-modal translation. We further establish the first SSMDG benchmarks, on which our method consistently outperforms strong baselines in both standard and missing-modality scenarios. Our benchmarks and code are available at https://github.com/lihongzhao99/SSMDG.
△ Less
Submitted 26 February, 2026;
originally announced February 2026.
-
IndicIFEval: A Benchmark for Verifiable Instruction-Following Evaluation in 14 Indic Languages
Authors:
Thanmay Jayakumar,
Mohammed Safi Ur Rahman Khan,
Raj Dabre,
Ratish Puduppully,
Anoop Kunchukuttan
Abstract:
Instruction-following benchmarks remain predominantly English-centric, leaving a critical evaluation gap for the hundreds of millions of Indic language speakers. We introduce IndicIFEval, a benchmark evaluating constrained generation of LLMs across 14 Indic languages using automatically verifiable, rule-based instructions. It comprises around 800 human-verified examples per language spread across…
▽ More
Instruction-following benchmarks remain predominantly English-centric, leaving a critical evaluation gap for the hundreds of millions of Indic language speakers. We introduce IndicIFEval, a benchmark evaluating constrained generation of LLMs across 14 Indic languages using automatically verifiable, rule-based instructions. It comprises around 800 human-verified examples per language spread across two complementary subsets: IndicIFEval-Ground, translated prompts from IFEval (Zhou et al., 2023) carefully localized for Indic contexts, and IndicIFEval-Ground, synthetically generated instructions grounded in native Indic content. We conduct a comprehensive evaluation of major open-weight and proprietary models spanning both reasoning and non-reasoning models. While models maintain strong adherence to formatting constraints, they struggle significantly with lexical and cross-lingual tasks -- and despite progress in high-resource languages, instruction-following across the broader Indic family lags significantly behind English. We release IndicIFEval and its evaluation scripts to support progress on multilingual constrained generation (http://github.com/ai4bharat/IndicIFEval).
△ Less
Submitted 25 February, 2026;
originally announced February 2026.
-
SIMSPINE: A Biomechanics-Aware Simulation Framework for 3D Spine Motion Annotation and Benchmarking
Authors:
Muhammad Saif Ullah Khan,
Didier Stricker
Abstract:
Modeling spinal motion is fundamental to understanding human biomechanics, yet remains underexplored in computer vision due to the spine's complex multi-joint kinematics and the lack of large-scale 3D annotations. We present a biomechanics-aware keypoint simulation framework that augments existing human pose datasets with anatomically consistent 3D spinal keypoints derived from musculoskeletal mod…
▽ More
Modeling spinal motion is fundamental to understanding human biomechanics, yet remains underexplored in computer vision due to the spine's complex multi-joint kinematics and the lack of large-scale 3D annotations. We present a biomechanics-aware keypoint simulation framework that augments existing human pose datasets with anatomically consistent 3D spinal keypoints derived from musculoskeletal modeling. Using this framework, we create the first open dataset, named SIMSPINE, which provides sparse vertebra-level 3D spinal annotations for natural full-body motions in indoor multi-camera capture without external restraints. With 2.14 million frames, this enables data-driven learning of vertebral kinematics from subtle posture variations and bridges the gap between musculoskeletal simulation and computer vision. In addition, we release pretrained baselines covering fine-tuned 2D detectors, monocular 3D pose lifting models, and multi-view reconstruction pipelines, establishing a unified benchmark for biomechanically valid spine motion estimation. Specifically, our 2D spine baselines improve the state-of-the-art from 0.63 to 0.80 AUC in controlled environments, and from 0.91 to 0.93 AP for in-the-wild spine tracking. Together, the simulation framework and SIMSPINE dataset advance research in vision-based biomechanics, motion analysis, and digital human modeling by enabling reproducible, anatomically grounded 3D spine estimation under natural conditions.
△ Less
Submitted 12 March, 2026; v1 submitted 24 February, 2026;
originally announced February 2026.
-
A variance reduced framework for (non)smooth nonconvex-nonconcave stochastic minimax problems with extended Kurdyka-Lojasiewicz property
Authors:
Muhammad Khan,
Yangyang Xu
Abstract:
In this paper, we study stochastic constrained minimax optimization problems with nonconvex-nonconcave structure, a central problem in modern machine learning, for which reliable and efficient algorithms remain largely unexplored due to its inherent challenges. Prior approaches for nonconvex minimax optimization often require (strong) concavity on the maximization part, or certain restrictive geom…
▽ More
In this paper, we study stochastic constrained minimax optimization problems with nonconvex-nonconcave structure, a central problem in modern machine learning, for which reliable and efficient algorithms remain largely unexplored due to its inherent challenges. Prior approaches for nonconvex minimax optimization often require (strong) concavity on the maximization part, or certain restrictive geometric assumptions on the joint objective to have guaranteed convergence. In contrast, our method only assumes weak convexity in the primal variable and the extended Kurdyka-Lojasiewicz (KL) property, with exponent $θ\in [0,1]$, in the dual variable, significantly broadening the class of tractable problems. To this end, we propose a variance reduced algorithm that provably handles this general setting and achieves an $\varepsilon$-stationary solution with state-of-the-art sample complexity: in the smooth finite-sum setting, the sample complexity is $\mathcal{O}\left(\sqrt{N}\,\varepsilon^{-\max\{4θ,2\}}\right)$, where $N$ is the number of total samples, and in the online smooth setting, it is $\mathcal{O}\Big(\varepsilon^{-\max\{6θ,3\}}\Big)$. For the structured nonsmooth problem, the sample complexity is $\mathcal{O}\left(\sqrt{N}\,\max\Big\{\varepsilon^{-3}, \varepsilon^{-5θ}, \varepsilon^{-\frac{11θ-3}{2θ}}\Big\}\right)$ and $\mathcal{O}\left(\max\left\{\varepsilon^{-4}, \varepsilon^{-\frac{15θ-1}{2}}, \varepsilon^{-\frac{31θ-9}{4θ}}\right\}\right)$ respectively for the two settings. To the best of our knowledge, this is the first unified framework that jointly accommodates weak convexity, the extended KL property, and variance-reduced stochastic updates, making it highly suitable for large-scale applications.
△ Less
Submitted 23 February, 2026;
originally announced February 2026.
-
Pixels Don't Lie (But Your Detector Might): Bootstrapping MLLM-as-a-Judge for Trustworthy Deepfake Detection and Reasoning Supervision
Authors:
Kartik Kuckreja,
Parul Gupta,
Muhammad Haris Khan,
Abhinav Dhall
Abstract:
Deepfake detection models often generate natural-language explanations, yet their reasoning is frequently ungrounded in visual evidence, limiting reliability. Existing evaluations measure classification accuracy but overlook reasoning fidelity. We propose DeepfakeJudge, a framework for scalable reasoning supervision and evaluation, that integrates an out-of-distribution benchmark containing recent…
▽ More
Deepfake detection models often generate natural-language explanations, yet their reasoning is frequently ungrounded in visual evidence, limiting reliability. Existing evaluations measure classification accuracy but overlook reasoning fidelity. We propose DeepfakeJudge, a framework for scalable reasoning supervision and evaluation, that integrates an out-of-distribution benchmark containing recent generative and editing forgeries, a human-annotated subset with visual reasoning labels, and a suite of evaluation models, that specialize in evaluating reasoning rationales without the need for explicit ground truth reasoning rationales. The Judge is optimized through a bootstrapped generator-evaluator process that scales human feedback into structured reasoning supervision and supports both pointwise and pairwise evaluation. On the proposed meta-evaluation benchmark, our reasoning-bootstrapped model achieves an accuracy of 96.2\%, outperforming \texttt{30x} larger baselines. The reasoning judge attains very high correlation with human ratings and 98.9\% percent pairwise agreement on the human-annotated meta-evaluation subset. These results establish reasoning fidelity as a quantifiable dimension of deepfake detection and demonstrate scalable supervision for interpretable deepfake reasoning. Our user study shows that participants preferred the reasonings generated by our framework 70\% of the time, in terms of faithfulness, groundedness, and usefulness, compared to those produced by other models and datasets. All of our datasets, models, and codebase are \href{https://github.com/KjAeRsTuIsK/DeepfakeJudge}{open-sourced}.
△ Less
Submitted 23 February, 2026;
originally announced February 2026.
-
CORVET: A CORDIC-Powered, Resource-Frugal Mixed-Precision Vector Processing Engine for High-Throughput AIoT applications
Authors:
Sonu Kumar,
Mohd Faisal Khan,
Mukul Lokhande,
Santosh Kumar Vishvakarma
Abstract:
This brief presents a runtime-adaptive, performance-enhanced vector engine featuring a low-resource, iterative CORDIC-based MAC unit for edge AI acceleration. The proposed design enables dynamic reconfiguration between approximate and accurate modes, exploiting the latency-accuracy trade-off for a wide range of workloads. Its resource-efficient approach further enables up to 4x throughput improvem…
▽ More
This brief presents a runtime-adaptive, performance-enhanced vector engine featuring a low-resource, iterative CORDIC-based MAC unit for edge AI acceleration. The proposed design enables dynamic reconfiguration between approximate and accurate modes, exploiting the latency-accuracy trade-off for a wide range of workloads. Its resource-efficient approach further enables up to 4x throughput improvement within the same hardware resources by leveraging vectorised, time-multiplexed execution and flexible precision scaling. With a time-multiplexed multi-AF block and a lightweight pooling and normalisation unit, the proposed vector engine supports flexible precision (4/8/16-bit) and high MAC density. The ASIC implementation results show that each MAC stage can save up to 33% of time and 21% of power, with a 256-PE configuration that achieves higher compute density (4.83 TOPS/mm2 ) and energy efficiency (11.67 TOPS/W) than previous state-of-the-art work. A detailed hardware-software co-design methodology for object detection and classification tasks on Pynq-Z2 is discussed to assess the proposed architecture, demonstrating a scalable, energy-efficient solution for edge AI applications.
△ Less
Submitted 22 February, 2026;
originally announced February 2026.
-
Towards Calibrating Prompt Tuning of Vision-Language Models
Authors:
Ashshak Sharifdeen,
Fahad Shamshad,
Muhammad Akhtar Munir,
Abhishek Basu,
Mohamed Insaf Ismithdeen,
Jeyapriyan Jeyamohan,
Chathurika Sewwandi Silva,
Karthik Nandakumar,
Muhammad Haris Khan
Abstract:
Prompt tuning of large-scale vision-language models such as CLIP enables efficient task adaptation without updating model weights. However, it often leads to poor confidence calibration and unreliable predictive uncertainty. We address this problem by proposing a calibration framework that enhances predictive reliability while preserving the geometry of the pretrained CLIP embedding space, which i…
▽ More
Prompt tuning of large-scale vision-language models such as CLIP enables efficient task adaptation without updating model weights. However, it often leads to poor confidence calibration and unreliable predictive uncertainty. We address this problem by proposing a calibration framework that enhances predictive reliability while preserving the geometry of the pretrained CLIP embedding space, which is required for robust generalization. Our approach extends the standard cross-entropy loss with two complementary regularizers: (1) a mean-variance margin penalty that stabilizes inter-class logit margins by maximizing their average while minimizing dispersion, mitigating underconfidence and overconfidence spikes; and (2) a text moment-matching loss that aligns the first and second moments of tuned text embeddings with their frozen CLIP counterparts, preserving semantic dispersion crucial for generalization. Through extensive experiments across 7 prompt-tuning methods and 11 diverse datasets, we demonstrate that our approach significantly reduces the Expected Calibration Error (ECE) compared to competitive calibration techniques on both base and novel classes
△ Less
Submitted 21 February, 2026;
originally announced February 2026.
-
ArabicNumBench: Evaluating Arabic Number Reading in Large Language Models
Authors:
Anas Alhumud,
Abdulaziz Alhammadi,
Muhammad Badruddin Khan
Abstract:
We present ArabicNumBench, a comprehensive benchmark for evaluating large language models on Arabic number reading tasks across Eastern Arabic-Indic numerals (0-9 in Arabic script) and Western Arabic numerals (0-9). We evaluate 71 models from 10 providers using four prompting strategies (zero-shot, zero-shot CoT, few-shot, few-shot CoT) on 210 number reading tasks spanning six contextual categorie…
▽ More
We present ArabicNumBench, a comprehensive benchmark for evaluating large language models on Arabic number reading tasks across Eastern Arabic-Indic numerals (0-9 in Arabic script) and Western Arabic numerals (0-9). We evaluate 71 models from 10 providers using four prompting strategies (zero-shot, zero-shot CoT, few-shot, few-shot CoT) on 210 number reading tasks spanning six contextual categories: pure numerals, addresses, dates, quantities, and prices. Our evaluation comprises 59,010 individual test cases and tracks extraction methods to measure structured output generation. Evaluation reveals substantial performance variation, with accuracy ranging from 14.29\% to 99.05\% across models and strategies. Few-shot Chain-of-Thought prompting achieves 2.8x higher accuracy than zero-shot approaches (80.06\% vs 28.76\%). A striking finding emerges: models achieving elite accuracy (98-99\%) often produce predominantly unstructured output, with most responses lacking Arabic CoT markers. Only 6 models consistently generate structured output across all test cases, while the majority require fallback extraction methods despite high numerical accuracy. Comprehensive evaluation of 281 model-strategy combinations demonstrates that numerical accuracy and instruction-following represent distinct capabilities, establishing baselines for Arabic number comprehension and providing actionable guidance for model selection in production Arabic NLP systems.
△ Less
Submitted 21 February, 2026;
originally announced February 2026.