-
Accelerating CRONet on AMD Versal AIE-ML Engines
Authors:
Kaustubh Mhatre,
Vedant Tewari,
Aditya Ray,
Farhan Khan,
Ridwan Olabiyi,
Ashif Iquebal,
Aman Arora
Abstract:
Topology optimization is a computational method used to determine the optimal material distribution within a prescribed design domain, aiming to minimize structural weight while satisfying load and boundary conditions. For critical infrastructure applications, such as structural health monitoring of bridges and buildings, particularly in digital twin contexts, low-latency energy-efficient topology…
▽ More
Topology optimization is a computational method used to determine the optimal material distribution within a prescribed design domain, aiming to minimize structural weight while satisfying load and boundary conditions. For critical infrastructure applications, such as structural health monitoring of bridges and buildings, particularly in digital twin contexts, low-latency energy-efficient topology optimization is essential. Traditionally, topology optimization relies on finite element analysis (FEA), a computationally intensive process. Recent advances in deep neural networks (DNNs) have introduced data driven alternatives to FEA, substantially reducing computation time while maintaining solution quality. These DNNs have complex architectures and implementing them on inference-class GPUs results in high latency and poor energy efficiency. To address this challenge, we present a hardware accelerated implementation of a topology optimization neural network (CRONet) on the AMD Versal AI Engine-ML (AIE-ML) architecture. Our approach efficiently exploits the parallelism and memory hierarchy of AIE-ML engines to optimize the execution of various neural network operators. We are the first to implement an end-to-end neural network fully realized on the AIE-ML array, where all intermediate activations and network weights reside on-chip throughout inference, eliminating any reliance on DRAM for intermediate data movement. Experimental results demonstrate that our implementation achieves up to 2.49x improvement in latency and up to 4.18x improvement in energy efficiency compared to an inference-class ML-optimized GPU in the same power budget (Nvidia T4) after scaling for technology node. These results highlight the potential of Versal AIE-ML based acceleration for enabling low-latency energy-efficient topology optimization.
△ Less
Submitted 16 April, 2026;
originally announced April 2026.
-
GCA Framework: A Gulf-Grounded Dataset and Agentic Pipeline for Climate Decision Support
Authors:
Muhammad Umer Sheikh,
Khawar Shehzad,
Salman Khan,
Fahad Shahbaz Khan,
Muhammad Haris Khan
Abstract:
Climate decision-making in the Gulf increasingly demands systems that can translate heterogeneous scientific and policy evidence into actionable guidance, yet general-purpose large language models (LLMs) remain weak both in region-specific climate knowledge and grounded interaction with geospatial and forecasting tools. We present the GCA framework, which unifies (i) GCA-DS, a curated Gulf-focused…
▽ More
Climate decision-making in the Gulf increasingly demands systems that can translate heterogeneous scientific and policy evidence into actionable guidance, yet general-purpose large language models (LLMs) remain weak both in region-specific climate knowledge and grounded interaction with geospatial and forecasting tools. We present the GCA framework, which unifies (i) GCA-DS, a curated Gulf-focused multimodal dataset, and (ii) Gulf Climate Agent (GCA), a tool-augmented agent for climate analysis. GCA-DS comprises ~200k question-answer pairs spanning governmental policies and adaptation plans, NGO and international frameworks, academic literature, and event-driven reporting on heatwaves, dust storms, and floods, complemented with remote-sensing inputs that couple imagery with textual evidence. Building on this foundation, the GCA agent orchestrates a modular tool pipeline grounded in real-time and historical signals and geospatial processing that produces derived indices and interpretable visualizations. Finally, we benchmark open and proprietary LLMs on Gulf climate tasks and show that domain fine-tuning and tool integration substantially improve reliability over general-purpose baselines.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework
Authors:
Komal Kumar,
Aman Chadha,
Salman Khan,
Fahad Shahbaz Khan,
Hisham Cholakkal
Abstract:
The rapid growth of scientific literature has made it increasingly difficult for researchers to efficiently discover, evaluate, and synthesize relevant work. Recent advances in multi-agent large language models (LLMs) have demonstrated strong potential for understanding user intent and are being trained to utilize various tools. In this paper, we introduce Paper Circle, a multi-agent research disc…
▽ More
The rapid growth of scientific literature has made it increasingly difficult for researchers to efficiently discover, evaluate, and synthesize relevant work. Recent advances in multi-agent large language models (LLMs) have demonstrated strong potential for understanding user intent and are being trained to utilize various tools. In this paper, we introduce Paper Circle, a multi-agent research discovery and analysis system designed to reduce the effort required to find, assess, organize, and understand academic literature. The system comprises two complementary pipelines: (1) a Discovery Pipeline that integrates offline and online retrieval from multiple sources, multi-criteria scoring, diversity-aware ranking, and structured outputs; and (2) an Analysis Pipeline that transforms individual papers into structured knowledge graphs with typed nodes such as concepts, methods, experiments, and figures, enabling graph-aware question answering and coverage verification. Both pipelines are implemented within a coder LLM-based multi-agent orchestration framework and produce fully reproducible, synchronized outputs including JSON, CSV, BibTeX, Markdown, and HTML at each agent step. This paper describes the system architecture, agent roles, retrieval and scoring methods, knowledge graph schema, and evaluation interfaces that together form the Paper Circle research workflow. We benchmark Paper Circle on both paper retrieval and paper review generation, reporting hit rate, MRR, and Recall at K. Results show consistent improvements with stronger agent models. We have publicly released the website at https://papercircle.vercel.app/ and the code at https://github.com/MAXNORM8650/papercircle.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning
Authors:
Ankan Deria,
Komal Kumar,
Xilin He,
Imran Razzak,
Hisham Cholakkal,
Fahad Shahbaz Khan,
Salman Khan
Abstract:
Recent vision-language models (VLMs) typically rely on a single vision encoder trained with contrastive image-text objectives, such as CLIP-style pretraining. While contrastive encoders are effective for cross-modal alignment and retrieval, self-supervised visual encoders often capture richer dense semantics and exhibit stronger robustness on recognition and understanding tasks. In this work, we i…
▽ More
Recent vision-language models (VLMs) typically rely on a single vision encoder trained with contrastive image-text objectives, such as CLIP-style pretraining. While contrastive encoders are effective for cross-modal alignment and retrieval, self-supervised visual encoders often capture richer dense semantics and exhibit stronger robustness on recognition and understanding tasks. In this work, we investigate how to scale the fusion of these complementary visual representations for vision-language modeling. We propose CoME-VL: Complementary Multi-Encoder Vision-Language, a modular fusion framework that integrates a contrastively trained vision encoder with a self-supervised DINO encoder. Our approach performs representation-level fusion by (i) entropy-guided multi-layer aggregation with orthogonality-constrained projections to reduce redundancy, and (ii) RoPE-enhanced cross-attention to align heterogeneous token grids and produce compact fused visual tokens. The fused tokens can be injected into a decoder-only LLM with minimal changes to standard VLM pipelines. Extensive experiments across diverse vision-language benchmarks demonstrate that CoME-VL consistently outperforms single-encoder baselines. In particular, we observe an average improvement of 4.9% on visual understanding tasks and 5.4% on grounding tasks. Our method achieves state-of-the-art performance on RefCOCO for detection while improving over the baseline by a large margin. Finally, we conduct ablation studies on layer merging, non-redundant feature mixing, and fusion capacity to evaluate how complementary contrastive and self-supervised signals affect VLM performance.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
The Eleventh NTIRE 2026 Efficient Super-Resolution Challenge Report
Authors:
Bin Ren,
Hang Guo,
Yan Shu,
Jiaqi Ma,
Ziteng Cui,
Shuhong Liu,
Guofeng Mei,
Lei Sun,
Zongwei Wu,
Fahad Shahbaz Khan,
Salman Khan,
Radu Timofte,
Yawei Li,
Hongyuan Yu,
Pufan Xu,
Chen Wu,
Long Peng,
Jiaojiao Yi,
Siyang Yi,
Yuning Cui,
Jingyuan Xia,
Xing Mou,
Keji He,
Jinlin Wu,
Zongang Gao
, et al. (38 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2026 challenge on efficient single-image super-resolution with a focus on the proposed solutions and results. The aim of this challenge is to devise a network that reduces one or several aspects, such as runtime, parameters, and FLOPs, while maintaining PSNR of around 26.90 dB on the DIV2K_LSDIR_valid dataset, and 26.99 dB on the DIV2K_LSDIR_test dataset. The challenge…
▽ More
This paper reviews the NTIRE 2026 challenge on efficient single-image super-resolution with a focus on the proposed solutions and results. The aim of this challenge is to devise a network that reduces one or several aspects, such as runtime, parameters, and FLOPs, while maintaining PSNR of around 26.90 dB on the DIV2K_LSDIR_valid dataset, and 26.99 dB on the DIV2K_LSDIR_test dataset. The challenge had 95 registered participants, and 15 teams made valid submissions. They gauge the state-of-the-art results for efficient single-image super-resolution.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
Narrowband searches for continuous gravitational waves from known pulsars in the first two parts of the fourth LIGO--Virgo--KAGRA observing run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
A. Adam,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith
, et al. (1831 additional authors not shown)
Abstract:
Rotating non-axisymmetric neutron stars (NSs) are promising sources for continuous gravitational waves (CWs). Such CWs can, if detected, inform us about the internal structure and equation of state of NSs. Here, we present a narrowband search for CWs from known pulsars, for which an efficient and sensitive matched-filter search can be applied. Narrowband searches are designed to be robust to misma…
▽ More
Rotating non-axisymmetric neutron stars (NSs) are promising sources for continuous gravitational waves (CWs). Such CWs can, if detected, inform us about the internal structure and equation of state of NSs. Here, we present a narrowband search for CWs from known pulsars, for which an efficient and sensitive matched-filter search can be applied. Narrowband searches are designed to be robust to mismatches between the electromagnetic (EM) and gravitational emissions, in contrast to fully targeted searches where the CW emission is assumed to be phase-locked to the EM one. In this work, we search for the CW counterparts emitted by 34 pulsars using data from the first and second parts of the fourth LIGO--Virgo--KAGRA observing run. This is the largest number of pulsars so far targeted for narrowband searches in the advanced detector era. We use the 5n-vector narrowband pipeline, which applies frequency-domain matched filtering. In previous searches, it covered a narrow range in the frequency -- frequency time derivative ($f$ -- $\dot{f}$) space. Here, we also explore a range in the second time derivative of the frequency $\ddot{f}$ around the value indicated by EM observations. Additionally, for the first time, we target sources in a binary system with this kind of search. We find no evidence for CWs and therefore set upper limits on the strain amplitude emitted by each pulsar, using simulated signals added in real data. For 20 analyses, we report an upper limit below the theoretical spin-down limit. The tightest constraint is for pulsar PSR J0534+2200 (the Crab pulsar), for which our strain upper limit on the CW amplitude is $\lesssim 2\%$ of its spin-down limit, corresponding to less than $0.04\%$ of the spin-down power being radiated in the CW channel.
△ Less
Submitted 26 March, 2026;
originally announced March 2026.
-
Searches for Continuous Gravitational Waves from Supernova Remnants in the first part of the LIGO-Virgo-KAGRA Fourth Observing run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1742 additional authors not shown)
Abstract:
We present results from directed searches for continuous gravitational waves from a sample of 15 nearby supernova remnants, likely hosting young neutron star candidates, using data from the first eight months of the fourth observing run (O4) of the LIGO-Virgo-KAGRA Collaboration. The analysis employs five pipelines: four semi-coherent methods -- the Band-Sampled-Data directed pipeline, Weave and t…
▽ More
We present results from directed searches for continuous gravitational waves from a sample of 15 nearby supernova remnants, likely hosting young neutron star candidates, using data from the first eight months of the fourth observing run (O4) of the LIGO-Virgo-KAGRA Collaboration. The analysis employs five pipelines: four semi-coherent methods -- the Band-Sampled-Data directed pipeline, Weave and two Viterbi pipelines (single- and dual-harmonic) -- and PyStoch, a cross-correlation-based pipeline. These searches cover wide frequency bands and do not assume prior knowledge of the targets' ephemerides. No evidence of a signal is found from any of the 15 sources. We set 95\% confidence-level upper limits on the intrinsic strain amplitude, with the most stringent constraints reaching $\sim 4 \times 10^{-26}$ near 300 Hz for the nearby source G266.2$-$1.2 (Vela Jr.). We also derive limits on neutron star ellipticity and $r$-mode amplitudes for the same source, with the best constraints reaching $\lesssim 10^{-7}$ and $\lesssim 10^{-5}$, respectively, at frequencies above 400 Hz. These results represent the most sensitive wide-band directed searches for continuous gravitational waves from supernova remnants to date.
△ Less
Submitted 2 April, 2026; v1 submitted 26 March, 2026;
originally announced March 2026.
-
Shape-Dependent, Deep-Learning-Assisted Metamaterial Solid Immersion Lens (mSIL) Super-Resolution Imaging
Authors:
Baidong Wu,
Fiza Khan,
Lingya Yu,
Zengbo Wang
Abstract:
We present the first systematic comparison of three TiO2 metamaterial solid immersion lens geometries - sub-hemispherical, super-hemispherical, and full-spherical - for label-free super-resolution imaging. Using SEM, we characterised both the cap profiles and the nanoparticle-fluid immersion at the lens-sample interface, revealing that super-hemispherical lenses achieve the deepest immersion and c…
▽ More
We present the first systematic comparison of three TiO2 metamaterial solid immersion lens geometries - sub-hemispherical, super-hemispherical, and full-spherical - for label-free super-resolution imaging. Using SEM, we characterised both the cap profiles and the nanoparticle-fluid immersion at the lens-sample interface, revealing that super-hemispherical lenses achieve the deepest immersion and closest contact with sample features. Imaging experiments under wide-field and laser confocal microscopes show that this enhanced immersion drives superior resolution and contrast. In addition, we introduce a deep learning approach based on a SinCUT image translation model to establish a cross-modal mapping between SEM morphology and optical imaging response, enabling virtual optical predictions and providing a first step toward a digital twin representation of mSIL imaging behaviour. Electromagnetic simulations further confirm a direct correlation between immersion depth and far-field main lobe intensity. Our findings demonstrate that careful control of lens shape and nanoparticle-fluid penetration, together with data-driven modelling, is essential to maximise super-resolution performance in TiO2 mSILs.
△ Less
Submitted 25 March, 2026;
originally announced March 2026.
-
WorldCache: Content-Aware Caching for Accelerated Video World Models
Authors:
Umair Nawaz,
Ahmed Heakl,
Ufaq Khan,
Abdelrahman Shaker,
Salman Khan,
Fahad Shahbaz Khan
Abstract:
Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature caching accelerates inference by reusing intermediate activations across denoising steps; however, existing methods largely rely on a Zero-Order Hold assumption i.e., reusing cached features as static snaps…
▽ More
Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature caching accelerates inference by reusing intermediate activations across denoising steps; however, existing methods largely rely on a Zero-Order Hold assumption i.e., reusing cached features as static snapshots when global drift is small. This often leads to ghosting artifacts, blur, and motion inconsistencies in dynamic scenes. We propose \textbf{WorldCache}, a Perception-Constrained Dynamical Caching framework that improves both when and how to reuse features. WorldCache introduces motion-adaptive thresholds, saliency-weighted drift estimation, optimal approximation via blending and warping, and phase-aware threshold scheduling across diffusion steps. Our cohesive approach enables adaptive, motion-consistent feature reuse without retraining. On Cosmos-Predict2.5-2B evaluated on PAI-Bench, WorldCache achieves \textbf{2.3$\times$} inference speedup while preserving \textbf{99.4\%} of baseline quality, substantially outperforming prior training-free caching approaches. Our code can be accessed on \href{https://umair1221.github.io/World-Cache/}{World-Cache}.
△ Less
Submitted 23 March, 2026;
originally announced March 2026.
-
CoVR-R:Reason-Aware Composed Video Retrieval
Authors:
Omkar Thawakar,
Dmitry Demidov,
Vaishnav Potlapalli,
Sai Prasanna Teja Reddy Bogireddy,
Viswanatha Reddy Gajjala,
Alaa Mostafa Lasheen,
Rao Muhammad Anwer,
Fahad Khan
Abstract:
Composed Video Retrieval (CoVR) aims to find a target video given a reference video and a textual modification. Prior work assumes the modification text fully specifies the visual changes, overlooking after-effects and implicit consequences (e.g., motion, state transitions, viewpoint or duration cues) that emerge from the edit. We argue that successful CoVR requires reasoning about these after-eff…
▽ More
Composed Video Retrieval (CoVR) aims to find a target video given a reference video and a textual modification. Prior work assumes the modification text fully specifies the visual changes, overlooking after-effects and implicit consequences (e.g., motion, state transitions, viewpoint or duration cues) that emerge from the edit. We argue that successful CoVR requires reasoning about these after-effects. We introduce a reasoning-first, zero-shot approach that leverages large multimodal models to (i) infer causal and temporal consequences implied by the edit, and (ii) align the resulting reasoned queries to candidate videos without task-specific finetuning. To evaluate reasoning in CoVR, we also propose CoVR-Reason, a benchmark that pairs each (reference, edit, target) triplet with structured internal reasoning traces and challenging distractors that require predicting after-effects rather than keyword matching. Experiments show that our zero-shot method outperforms strong retrieval baselines on recall at K and particularly excels on implicit-effect subsets. Our automatic and human analysis confirm higher step consistency and effect factuality in our retrieved results. Our findings show that incorporating reasoning into general-purpose multimodal models enables effective CoVR by explicitly accounting for causal and temporal after-effects. This reduces dependence on task-specific supervision, improves generalization to challenging implicit-effect cases, and enhances interpretability of retrieval outcomes. These results point toward a scalable and principled framework for explainable video search. The model, code, and benchmark are available at https://github.com/mbzuai-oryx/CoVR-R.
△ Less
Submitted 20 March, 2026;
originally announced March 2026.
-
GWTC-4.0: Tests of General Relativity. III. Tests of the Remnants
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1757 additional authors not shown)
Abstract:
This is the third paper of the set recording the results of the suite of tests of general relativity (GR) performed on the signals from the fourth Gravitational-Wave Transient Catalog (GWTC-4.0), where we focus on the remnants of the binary mergers. We examine for the first time 42 events from the first part of the fourth observing run of the LIGO-Virgo-KAGRA detectors, alongside events from the p…
▽ More
This is the third paper of the set recording the results of the suite of tests of general relativity (GR) performed on the signals from the fourth Gravitational-Wave Transient Catalog (GWTC-4.0), where we focus on the remnants of the binary mergers. We examine for the first time 42 events from the first part of the fourth observing run of the LIGO-Virgo-KAGRA detectors, alongside events from the previous observation runs, restricting our analysis to the confident signals, which were measured in at least two detectors and that have false alarm rates $\le 10^{-3} \mathrm{yr}^{-1}$. This paper focuses on seven tests of the coalescence remnants. Three of these are tests of the ringdown and its consistency with the expected quasinormal mode spectrum of a Kerr black hole. Specifically, two tests analyze just the ringdown in the time domain, and the third test analyzes the entire signal in the frequency domain. Four tests allow for the existence of possible echoes arriving after the end of the ringdown, which are not expected in GR. We find overall consistency of the remnants with GR. When combining events by multiplying likelihoods (hierarchically), one analysis finds that the GR prediction lies at the boundary of the $98.6^{+1.4}_{-9.4}\%$ ($99.3^{+0.7}_{-4.5}\%$) credible region, an increase from $93.8^{+6.1}_{-20.0}\%$ ($94.9^{+4.4}_{-18.2}\%$) for GWTC-3.0. Here the ranges of values comes from bootstrapping to account for the finite number of events analyzed and suggest that some of the apparently significant deviation could be attributed to variance due to the finite catalog. Since the significance also decreases to 92.2% (96.2%) when including the more recent very loud event GW250114, there is no strong evidence for a GR deviation. We find no evidence for post-merger echoes in the events that were analyzed. (Abridged)
△ Less
Submitted 19 March, 2026;
originally announced March 2026.
-
GWTC-4.0: Tests of General Relativity. II. Parameterized Tests
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1761 additional authors not shown)
Abstract:
In this second of three papers on tests of general relativity (GR) applied to the compact binary coalescence signals in the fourth Gravitational-Wave Transient Catalog (GWTC-4.0), we present the results of the parameterized tests of GR and constraints on line-of-sight acceleration. We include events up to and including the first part of the fourth observing run (O4a) of the LIGO Virgo KAGRA detect…
▽ More
In this second of three papers on tests of general relativity (GR) applied to the compact binary coalescence signals in the fourth Gravitational-Wave Transient Catalog (GWTC-4.0), we present the results of the parameterized tests of GR and constraints on line-of-sight acceleration. We include events up to and including the first part of the fourth observing run (O4a) of the LIGO Virgo KAGRA detectors. As in the other two papers in this series, we restrict our analysis to the 42 confident signals, measured by at least two detectors, that have false alarm rates $\le 10^{-3} \mathrm{yr}^{-1}$ from O4a, in addition to the 49 such events from previous observing runs. This paper focuses on the eight tests that constrain parameterized deviations from the expected GR (or unaccelerated) values. These include modifications of post-Newtonian (PN) parameters, spin-induced quadrupole moments different from those of a binary black hole, and possible dispersive or birefringent propagation effects. Overall, we find no evidence for physics beyond GR, for spin-induced quadrupole moments different from those of a Kerr black hole in GR, or for line of sight acceleration, with more than 90% of the events including the null result (no deviation) within their 90% credible intervals. We discuss possible systematics affecting the other events and tests, even though they are statistically not surprising, given noise. We improve the bounds on deviations from the GR PN coefficients by factors of 1.2-5.5 and provide illustrative translations to constraints on some modified theories. Also, we update the bound on the mass of the graviton, at 90% credibility, to $m_g \leq 1.92\times 10^{-23} \mathrm{eV}/c^2$. Thus, we see that GR holds, and many of the bounds on possible deviations derived from our events are the best to date.
△ Less
Submitted 19 March, 2026;
originally announced March 2026.
-
GWTC-4.0: Tests of General Relativity. I. Overview and General Tests
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1759 additional authors not shown)
Abstract:
The worldwide LIGO-Virgo-KAGRA network of gravitational-wave (GW) detectors continues to increase in sensitivity, thus increasing the quantity and quality of the detected GW signals from compact binary coalescences. These signals allow us to perform ever-more sensitive tests of general relativity (GR) in the dynamical and strong-field regime of gravity. This paper is the first of three, where we p…
▽ More
The worldwide LIGO-Virgo-KAGRA network of gravitational-wave (GW) detectors continues to increase in sensitivity, thus increasing the quantity and quality of the detected GW signals from compact binary coalescences. These signals allow us to perform ever-more sensitive tests of general relativity (GR) in the dynamical and strong-field regime of gravity. This paper is the first of three, where we present the results of a suite of tests of GR using the binary signals included in the fourth GW Transient Catalog (GWTC-4.0), i.e., up to and including the first part of the fourth observing run of the detectors (O4a). We restrict our analysis to the 91 confident signals, henceforth called events, that were measured by at least two detectors, and have false alarm rates $\le 10^{-3} \mathrm{yr}^{-1}$. These include 42 events from O4a. This first paper presents an overview of the methods, selection of events and GR tests, and serves as a guidemap for all three papers. Here we focus on the four general tests of consistency, where we find no evidence for deviations from our models. Specifically, for all the events considered, we find consistency of the residuals with noise. The final mass and final spin as inferred from the low- and high-frequency parts of the waveform are consistent with each other. We also find no evidence for deviations from the GR predictions for the amplitudes of subdominant GW multipole moments, or for non-GR modes of polarization. We thus find that GR, without new physics beyond it, is still consistent with these GW events. The results of the two additional papers in this trio also find overall consistency with vacuum GR, with more than 90% of the events being consistent with GR at the 90% credible level. While one of the ringdown analyses finds the GR value in the tails for its combined results, this may be due in part to catalog variance.
△ Less
Submitted 19 March, 2026;
originally announced March 2026.
-
Structure-preserving preconditioning of discrete space-fractional diffusion equations with variable coefficient and θ-Method
Authors:
Muhammad Faisal Khan,
Asim Ilyas,
Rolf Krause,
Stefano Serra-Capizzano,
Cristina Tablino-Possio
Abstract:
This paper studies the spectral properties of large matrices and the preconditioning of linear systems, arising from the finite difference discretization of a time-dependent space-fractional diffusion equation with a variable coefficient $a(x)$ defined on $Ω\subset \mathbb{R}^d$, $d=1,2$. The model involves a one-sided Riemann-Liouville fractional derivative multiplied by the function $a(x)$, disc…
▽ More
This paper studies the spectral properties of large matrices and the preconditioning of linear systems, arising from the finite difference discretization of a time-dependent space-fractional diffusion equation with a variable coefficient $a(x)$ defined on $Ω\subset \mathbb{R}^d$, $d=1,2$. The model involves a one-sided Riemann-Liouville fractional derivative multiplied by the function $a(x)$, discretized by the shifted Gr"unwald formula in space and the $θ$-method in time. The resulting all-at-once linear systems exhibit a $(d+1)$-level Toeplitz-like matrix structure, with $d=1,2$ denoting the space dimension, while the additional level is due to the time variable.
A preconditioning strategy is developed based on the structural properties of the discretized operator. Using the generalized locally Toeplitz (GLT) theory, we analyze the spectral distribution of the unpreconditioned and preconditioned matrix sequences. The main novelty is that the analysis fully covers the case where the variable coefficient $a$ is nonconstant. Numerical results are provided to support the GLT based theoretical findings, and some possible extensions are briefly discussed.
△ Less
Submitted 16 March, 2026;
originally announced March 2026.
-
All-sky Searches for Continuous Gravitational Waves from Isolated Neutron Stars in the Data from the First Part of the Fourth LIGO-Virgo-KAGRA Observing Run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
A. Adam,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith
, et al. (1804 additional authors not shown)
Abstract:
We present results from an all-sky search for continuous gravitational waves, using three different methods applied to the first eight months of LIGO data from the fourth LIGO-Virgo-KAGRA Collaboration s observing run. We aim at signals potentially emitted by rotating, non-axisymmetric isolated neutron star in the Milky Way. The analysis spans a frequency range from 20 Hz to 2000 Hz and accommodat…
▽ More
We present results from an all-sky search for continuous gravitational waves, using three different methods applied to the first eight months of LIGO data from the fourth LIGO-Virgo-KAGRA Collaboration s observing run. We aim at signals potentially emitted by rotating, non-axisymmetric isolated neutron star in the Milky Way. The analysis spans a frequency range from 20 Hz to 2000 Hz and accommodates frequency derivative magnitudes up to $10^{-8}$ Hz/s. No statistically significant periodic gravitational wave signals were detected. We establish 95% confidence-level (CL) frequentist upper limits on the dimensionless strain amplitudes. The most stringent population-averaged strain upper limits reach 9.7 $\times$ $10^{-26}$ near 290 Hz, matching the best previous constraints from 250 to $\sim$1700 Hz while extending coverage to a much broader spin-down range. At higher frequencies, the new limits improve upon previous results by factors of approximately $\sim$1.6. These constraints are applied to three astrophysical scenarios: 1) the distribution of galactic neutron stars as a function of spin frequency and ellipticity; 2) the contribution of millisecond pulsars to the GeV excess near the galactic center; and 3) the possible dark matter fraction composed of nearby inspiraling primordial binary black holes with asteroid-scale masses.
△ Less
Submitted 14 March, 2026;
originally announced March 2026.
-
MAviS: A Multimodal Conversational Assistant For Avian Species
Authors:
Yevheniia Kryklyvets,
Mohammed Irfan Kurpath,
Sahal Shaji Mullappilly,
Jinxing Zhou,
Fahad Shabzan Khan,
Rao Anwer,
Salman Khan,
Hisham Cholakkal
Abstract:
Fine-grained understanding and species-specific multimodal question answering are vital for advancing biodiversity conservation and ecological monitoring. However, existing multimodal large language models face challenges when it comes to specialized topics like avian species, making it harder to provide accurate and contextually relevant information in these areas. To address this limitation, we…
▽ More
Fine-grained understanding and species-specific multimodal question answering are vital for advancing biodiversity conservation and ecological monitoring. However, existing multimodal large language models face challenges when it comes to specialized topics like avian species, making it harder to provide accurate and contextually relevant information in these areas. To address this limitation, we introduce the MAviS-Dataset, a large-scale multimodal avian species dataset that integrates image, audio, and text modalities for over 1,000 bird species, comprising both pretraining and instruction-tuning subsets enriched with structured question-answer pairs. Building on the MAviS-Dataset, we introduce MAviS-Chat, a multimodal LLM that supports audio, vision, and text and is designed for fine-grained species understanding, multimodal question answering, and scene-specific description generation. Finally, for quantitative evaluation, we present MAviS-Bench, a benchmark of over 25,000 QA pairs designed to assess avian species-specific perceptual and reasoning abilities across modalities. Experimental results show that MAviS-Chat outperforms the baseline MiniCPM-o-2.6 by a large margin, achieving state-of-the-art open-source results and demonstrating the effectiveness of our instruction-tuned MAviS-Dataset. Our findings highlight the necessity of domain-adaptive multimodal LLMs for ecological applications.
△ Less
Submitted 7 March, 2026;
originally announced March 2026.
-
MediX-R1: Open Ended Medical Reinforcement Learning
Authors:
Sahal Shaji Mullappilly,
Mohammed Irfan Kurpath,
Omair Mohamed,
Mohamed Zidan,
Fahad Khan,
Salman Khan,
Rao Anwer,
Hisham Cholakkal
Abstract:
We introduce MediX-R1, an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes a baseline vision-language backbone with Group Based RL and a composite reward tailored for medical reasoning: an LLM-based accuracy reward that judges semantic correct…
▽ More
We introduce MediX-R1, an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes a baseline vision-language backbone with Group Based RL and a composite reward tailored for medical reasoning: an LLM-based accuracy reward that judges semantic correctness with a strict YES/NO decision, a medical embedding-based semantic reward to capture paraphrases and terminology variants, and lightweight format and modality rewards that enforce interpretable reasoning and modality recognition. This multi-signal design provides stable, informative feedback for open-ended outputs where traditional verifiable or MCQ-only rewards fall short. To measure progress, we propose a unified evaluation framework for both text-only and image+text tasks that uses a Reference-based LLM-as-judge in place of brittle string-overlap metrics, capturing semantic correctness, reasoning, and contextual alignment. Despite using only $\sim51$K instruction examples, MediX-R1 achieves excellent results across standard medical LLM (text-only) and VLM (image + text) benchmarks, outperforming strong open-source baselines and delivering particularly large gains on open-ended clinical tasks. Our results demonstrate that open-ended RL with comprehensive reward signals and LLM-based evaluation is a practical path toward reliable medical reasoning in multimodal models. Our trained models, curated datasets and source code are available at https://medix.cvmbzuai.com
△ Less
Submitted 26 February, 2026;
originally announced February 2026.
-
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
Authors:
Abdelrahman Shaker,
Ahmed Heakl,
Jaseel Muhammad,
Ritesh Thawkar,
Omkar Thawakar,
Senmao Li,
Hisham Cholakkal,
Ian Reid,
Eric P. Xing,
Salman Khan,
Fahad Shahbaz Khan
Abstract:
Unified multimodal models can both understand and generate visual content within a single architecture. Existing models, however, remain data-hungry and too heavy for deployment on edge devices. We present Mobile-O, a compact vision-language-diffusion model that brings unified multimodal intelligence to a mobile device. Its core module, the Mobile Conditioning Projector (MCP), fuses vision-languag…
▽ More
Unified multimodal models can both understand and generate visual content within a single architecture. Existing models, however, remain data-hungry and too heavy for deployment on edge devices. We present Mobile-O, a compact vision-language-diffusion model that brings unified multimodal intelligence to a mobile device. Its core module, the Mobile Conditioning Projector (MCP), fuses vision-language features with a diffusion generator using depthwise-separable convolutions and layerwise alignment. This design enables efficient cross-modal conditioning with minimal computational cost. Trained on only a few million samples and post-trained in a novel quadruplet format (generation prompt, image, question, answer), Mobile-O jointly enhances both visual understanding and generation capabilities. Despite its efficiency, Mobile-O attains competitive or superior performance compared to other unified models, achieving 74% on GenEval and outperforming Show-O and JanusFlow by 5% and 11%, while running 6x and 11x faster, respectively. For visual understanding, Mobile-O surpasses them by 15.3% and 5.1% averaged across seven benchmarks. Running in only ~3s per 512x512 image on an iPhone, Mobile-O establishes the first practical framework for real-time unified multimodal understanding and generation on edge devices. We hope Mobile-O will ease future research in real-time unified multimodal intelligence running entirely on-device with no cloud dependency. Our code, models, datasets, and mobile application are publicly available at https://amshaker.github.io/Mobile-O/
△ Less
Submitted 24 February, 2026; v1 submitted 23 February, 2026;
originally announced February 2026.
-
CORVET: A CORDIC-Powered, Resource-Frugal Mixed-Precision Vector Processing Engine for High-Throughput AIoT applications
Authors:
Sonu Kumar,
Mohd Faisal Khan,
Mukul Lokhande,
Santosh Kumar Vishvakarma
Abstract:
This brief presents a runtime-adaptive, performance-enhanced vector engine featuring a low-resource, iterative CORDIC-based MAC unit for edge AI acceleration. The proposed design enables dynamic reconfiguration between approximate and accurate modes, exploiting the latency-accuracy trade-off for a wide range of workloads. Its resource-efficient approach further enables up to 4x throughput improvem…
▽ More
This brief presents a runtime-adaptive, performance-enhanced vector engine featuring a low-resource, iterative CORDIC-based MAC unit for edge AI acceleration. The proposed design enables dynamic reconfiguration between approximate and accurate modes, exploiting the latency-accuracy trade-off for a wide range of workloads. Its resource-efficient approach further enables up to 4x throughput improvement within the same hardware resources by leveraging vectorised, time-multiplexed execution and flexible precision scaling. With a time-multiplexed multi-AF block and a lightweight pooling and normalisation unit, the proposed vector engine supports flexible precision (4/8/16-bit) and high MAC density. The ASIC implementation results show that each MAC stage can save up to 33% of time and 21% of power, with a 256-PE configuration that achieves higher compute density (4.83 TOPS/mm2 ) and energy efficiency (11.67 TOPS/W) than previous state-of-the-art work. A detailed hardware-software co-design methodology for object detection and classification tasks on Pynq-Z2 is discussed to assess the proposed architecture, demonstrating a scalable, energy-efficient solution for edge AI applications.
△ Less
Submitted 22 February, 2026;
originally announced February 2026.
-
OpenEarthAgent: A Unified Framework for Tool-Augmented Geospatial Agents
Authors:
Akashah Shabbir,
Muhammad Umer Sheikh,
Muhammad Akhtar Munir,
Hiyam Debary,
Mustansar Fiaz,
Muhammad Zaigham Zaheer,
Paolo Fraccaro,
Fahad Shahbaz Khan,
Muhammad Haris Khan,
Xiao Xiang Zhu,
Salman Khan
Abstract:
Recent progress in multimodal reasoning has enabled agents that interpret imagery, connect it with language, and execute structured analytical tasks. Extending these capabilities to remote sensing remains challenging, as models must reason over spatial scale, geographic structures, and multispectral indices while maintaining coherent multi-step logic. To address this gap, we introduce \textit{Open…
▽ More
Recent progress in multimodal reasoning has enabled agents that interpret imagery, connect it with language, and execute structured analytical tasks. Extending these capabilities to remote sensing remains challenging, as models must reason over spatial scale, geographic structures, and multispectral indices while maintaining coherent multi-step logic. To address this gap, we introduce \textit{OpenEarthAgent}, a unified framework for tool-augmented geospatial reasoning trained on satellite imagery, natural-language queries, and structured reasoning traces. Beyond serving as a benchmark, OpenEarthAgent establishes a cohesive agentic architecture built around a unified executable tool registry and trajectory-based policy learning. The framework standardizes heterogeneous visual, spectral, GIS, and georeferenced raster operations under a consistent callable schema, enabling modular orchestration and deterministic execution. Training is performed via supervised fine-tuning on structured reasoning trajectories with deterministic replay validation to ensure executability and spatial correctness. The accompanying corpus comprises 14,538 training and 1,169 evaluation instances with over 107K reasoning steps, spanning urban, environmental, disaster, and infrastructure domains and incorporating GIS operations alongside index analyses such as NDVI, NBR, and NDBI. Grounded in explicit reasoning traces, the learned agent demonstrates structured reasoning, stable spatial understanding, and interpretable tool-driven behaviour across diverse EO scenarios. We report consistent improvements over a strong baseline and competitive performance against recent open and closed-source models. Our code and trained models will be publicly available.
△ Less
Submitted 25 March, 2026; v1 submitted 19 February, 2026;
originally announced February 2026.
-
Flow Matching for Offline Reinforcement Learning with Discrete Actions
Authors:
Fairoz Nower Khan,
Nabuat Zaman Nahim,
Ruiquan Huang,
Haibo Yang,
Peizhong Ju
Abstract:
Generative policies based on diffusion models and flow matching have shown strong promise for offline reinforcement learning (RL), but their applicability remains largely confined to continuous action spaces. To address a broader range of offline RL settings, we extend flow matching to a general framework that supports discrete action spaces with multiple objectives. Specifically, we replace conti…
▽ More
Generative policies based on diffusion models and flow matching have shown strong promise for offline reinforcement learning (RL), but their applicability remains largely confined to continuous action spaces. To address a broader range of offline RL settings, we extend flow matching to a general framework that supports discrete action spaces with multiple objectives. Specifically, we replace continuous flows with continuous-time Markov chains, trained using a Q-weighted flow matching objective. We then extend our design to multi-agent settings, mitigating the exponential growth of joint action spaces via a factorized conditional path. We theoretically show that, under idealized conditions, optimizing this objective recovers the optimal policy. Extensive experiments further demonstrate that our method performs robustly in practical scenarios, including high-dimensional control, multi-modal decision-making, and dynamically changing preferences over multiple objectives. Our discrete framework can also be applied to continuous-control problems through action quantization, providing a flexible trade-off between representational complexity and performance.
△ Less
Submitted 5 February, 2026;
originally announced February 2026.
-
EoCD: Encoder only Remote Sensing Change Detection
Authors:
Mubashir Noman,
Mustansar Fiaz,
Hiyam Debary,
Abdul Hannan,
Shah Nawaz,
Fahad Shahbaz Khan,
Salman Khan
Abstract:
Being a cornerstone of temporal analysis, change detection has been playing a pivotal role in modern earth observation. Existing change detection methods rely on the Siamese encoder to individually extract temporal features followed by temporal fusion. Subsequently, these methods design sophisticated decoders to improve the change detection performance without taking into consideration the complex…
▽ More
Being a cornerstone of temporal analysis, change detection has been playing a pivotal role in modern earth observation. Existing change detection methods rely on the Siamese encoder to individually extract temporal features followed by temporal fusion. Subsequently, these methods design sophisticated decoders to improve the change detection performance without taking into consideration the complexity of the model. These aforementioned issues intensify the overall computational cost as well as the network's complexity which is undesirable. Alternatively, few methods utilize the early fusion scheme to combine the temporal images. These methods prevent the extra overhead of Siamese encoder, however, they also rely on sophisticated decoders for better performance. In addition, these methods demonstrate inferior performance as compared to late fusion based methods. To bridge these gaps, we introduce encoder only change detection (EoCD) that is a simple and effective method for the change detection task. The proposed method performs the early fusion of the temporal data and replaces the decoder with a parameter-free multiscale feature fusion module thereby significantly reducing the overall complexity of the model. EoCD demonstrate the optimal balance between the change detection performance and the prediction speed across a variety of encoder architectures. Additionally, EoCD demonstrate that the performance of the model is predominantly dependent on the encoder network, making the decoder an additional component. Extensive experimentation on four challenging change detection datasets reveals the effectiveness of the proposed method.
△ Less
Submitted 5 February, 2026;
originally announced February 2026.
-
Assessing the informative value of macroeconomic indicators for public health forecasting
Authors:
Shome Chakraborty,
Fardil Khan,
Soutik Ghosal
Abstract:
Macroeconomic conditions influence the environments in which health systems operate, yet their value as leading signals of health system capacity has not been systematically evaluated. In this study, we examine whether selected macroeconomic indicators contain predictive information for several capacity-related public health targets, including employment in the health and social assistance workfor…
▽ More
Macroeconomic conditions influence the environments in which health systems operate, yet their value as leading signals of health system capacity has not been systematically evaluated. In this study, we examine whether selected macroeconomic indicators contain predictive information for several capacity-related public health targets, including employment in the health and social assistance workforce, new business applications in the sector, and health care construction spending. Using monthly U.S. time series data, we evaluate multiple forecasting approaches, including neural network models with different optimization strategies, generalized additive models, random forests, and time series models with exogenous macroeconomic indicators, under alternative model fitting designs. Across evaluation settings, we find that macroeconomic indicators provide a consistent and reproducible predictive signal for some public health targets, particularly workforce and infrastructure measures, while other targets exhibit weaker or less stable predictability. Models emphasizing stability and implicit regularization tend to perform more reliably during periods of economic volatility. These findings suggest that macroeconomic indicators may serve as useful upstream signals for digital public health monitoring, while underscoring the need for careful model selection and validation when translating economic trends into health system forecasting tools.
△ Less
Submitted 21 January, 2026;
originally announced January 2026.
-
Bio-RV: Low-Power Resource-Efficient RISC-V Processor for Biomedical Applications
Authors:
Vijay Pratap Sharma,
Annu Kumar,
Mohd Faisal Khan,
Mukul Lokhande,
Santosh Kumar Vishvakarma
Abstract:
This work presents Bio-RV, a compact and resource-efficient RISC-V processor intended for biomedical control applications, such as accelerator-based biomedical SoCs and implantable pacemaker systems. The proposed Bio-RV is a multi-cycle RV32I core that provides explicit execution control and external instruction loading with capabilities that enable controlled firmware deployment, ASIC bring-up, a…
▽ More
This work presents Bio-RV, a compact and resource-efficient RISC-V processor intended for biomedical control applications, such as accelerator-based biomedical SoCs and implantable pacemaker systems. The proposed Bio-RV is a multi-cycle RV32I core that provides explicit execution control and external instruction loading with capabilities that enable controlled firmware deployment, ASIC bring-up, and post-silicon testing. In addition to coordinating accelerator configuration and data transmission in heterogeneous systems, Bio-RV is designed to function as a lightweight host controller, handling interfaces with pacing, sensing, electrogram (EGM), telemetry, and battery management modules. With 708 LUTs and 235 flip-flops on FPGA prototypes, Bio-RV, implemented in a 180 nm CMOS technology, operate at 50 MHz and feature a compact hardware footprint. According to post-layout results, the proposed architectural decisions align with minimal energy use. Ultimately, Bio-RV prioritises deterministic execution, minimal hardware complexity, and integration flexibility over peak computing speed to meet the demands of ultra-low-power, safety-critical biomedical systems.
△ Less
Submitted 13 January, 2026;
originally announced January 2026.
-
Deep Search for Joint Sources of Gravitational Waves and High-Energy Neutrinos with IceCube During the Third Observing Run of LIGO and Virgo
Authors:
The IceCube Collaboration,
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
S. Ali,
N. M. Amin,
K. Andeen,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
S. N. Axani,
R. Babu,
X. Bai,
J. Baines-Holmes,
A. Balagopal V.,
S. W. Barwick,
S. Bash,
V. Basu,
R. Bay,
J. J. Beatty,
J. Becker Tjus
, et al. (2193 additional authors not shown)
Abstract:
The discovery of joint sources of high-energy neutrinos and gravitational waves has been a primary target for the LIGO, Virgo, KAGRA, and IceCube observatories. The joint detection of high-energy neutrinos and gravitational waves would provide insight into cosmic processes, from the dynamics of compact object mergers and stellar collapses to the mechanisms driving relativistic outflows. The joint…
▽ More
The discovery of joint sources of high-energy neutrinos and gravitational waves has been a primary target for the LIGO, Virgo, KAGRA, and IceCube observatories. The joint detection of high-energy neutrinos and gravitational waves would provide insight into cosmic processes, from the dynamics of compact object mergers and stellar collapses to the mechanisms driving relativistic outflows. The joint detection of multiple cosmic messengers can also elevate the significance of the common observation even when some or all of the constituent messengers are sub-threshold, i.e. not significant enough to declare their detection individually. Using data from the LIGO, Virgo, and IceCube observatories, including sub-threshold events, we searched for common sources of gravitational waves and high-energy neutrinos during the third observing run of Advanced LIGO and Advanced Virgo detectors. Our search did not identify significant joint sources. We derive constraints on the rate densities of joint sources. Our results constrain the isotropic neutrino emission from gravitational-wave sources for very high values of the total energy emitted in neutrinos (> $10^{52} - 10^{54}$ erg).
△ Less
Submitted 28 January, 2026; v1 submitted 12 January, 2026;
originally announced January 2026.
-
A Dataset of Low-Rated Applications from the Amazon Appstore for User Feedback Analysis
Authors:
Nek Dil Khan,
Javed Ali Khan,
Darvesh Khan,
Jianqiang Li,
Mumrez Khan,
Shah Fahad Khan
Abstract:
In todays digital landscape, end-user feedback plays a crucial role in the evolution of software applications, particularly in addressing issues that hinder user experience. While much research has focused on high-rated applications, low-rated applications often remain unexplored, despite their potential to reveal valuable insights. This study introduces a novel dataset curated from 64 low-rated a…
▽ More
In todays digital landscape, end-user feedback plays a crucial role in the evolution of software applications, particularly in addressing issues that hinder user experience. While much research has focused on high-rated applications, low-rated applications often remain unexplored, despite their potential to reveal valuable insights. This study introduces a novel dataset curated from 64 low-rated applications sourced from the Amazon Software Appstore (ASA), containing 79,821 user reviews. The dataset is designed to capture the most frequent issues identified by users, which are critical for improving software quality. To further enhance the dataset utility, a subset of 6000 reviews was manually annotated to classify them into six district issue categories: user interface (UI) and user experience (UX), functionality and features, compatibility and device specificity, performance and stability, customer support and responsiveness, and security and privacy issues. This annotated dataset is a valuable resource for developing machine learning-based approaches aiming to automate the classification of user feedback into various issue types. Making both the annotated and raw datasets publicly available provides researchers and developers with a crucial tool to understand common issues in low-rated apps and inform software improvements. The comprehensive analysis and availability of this dataset lay the groundwork for data-derived solutions to improve software quality based on user feedback. Additionally, the dataset can provide opportunities for software vendors and researchers to explore various software evolution-related activities, including frequently missing features, sarcasm, and associated emotions, which will help better understand the reasons for comparatively low app ratings.
△ Less
Submitted 6 January, 2026;
originally announced January 2026.
-
SwinTF3D: A Lightweight Multimodal Fusion Approach for Text-Guided 3D Medical Image Segmentation
Authors:
Hasan Faraz Khan,
Noor Fatima,
Muzammil Behzad
Abstract:
The recent integration of artificial intelligence into medical imaging has driven remarkable advances in automated organ segmentation. However, most existing 3D segmentation frameworks rely exclusively on visual learning from large annotated datasets restricting their adaptability to new domains and clinical tasks. The lack of semantic understanding in these models makes them ineffective in addres…
▽ More
The recent integration of artificial intelligence into medical imaging has driven remarkable advances in automated organ segmentation. However, most existing 3D segmentation frameworks rely exclusively on visual learning from large annotated datasets restricting their adaptability to new domains and clinical tasks. The lack of semantic understanding in these models makes them ineffective in addressing flexible, user-defined segmentation objectives. To overcome these limitations, we propose SwinTF3D, a lightweight multimodal fusion approach that unifies visual and linguistic representations for text-guided 3D medical image segmentation. The model employs a transformer-based visual encoder to extract volumetric features and integrates them with a compact text encoder via an efficient fusion mechanism. This design allows the system to understand natural-language prompts and correctly align semantic cues with their corresponding spatial structures in medical volumes, while producing accurate, context-aware segmentation results with low computational overhead. Extensive experiments on the BTCV dataset demonstrate that SwinTF3D achieves competitive Dice and IoU scores across multiple organs, despite its compact architecture. The model generalizes well to unseen data and offers significant efficiency gains compared to conventional transformer-based segmentation networks. Bridging visual perception with linguistic understanding, SwinTF3D establishes a practical and interpretable paradigm for interactive, text-driven 3D medical image segmentation, opening perspectives for more adaptive and resource-efficient solutions in clinical imaging.
△ Less
Submitted 28 December, 2025;
originally announced December 2025.
-
Attack-Aware Deepfake Detection under Counter-Forensic Manipulations
Authors:
Noor Fatima,
Hasan Faraz Khan,
Muzammil Behzad
Abstract:
This work presents an attack-aware deepfake and image-forensics detector designed for robustness, well-calibrated probabilities, and transparent evidence under realistic deployment conditions. The method combines red-team training with randomized test-time defense in a two-stream architecture, where one stream encodes semantic content using a pretrained backbone and the other extracts forensic res…
▽ More
This work presents an attack-aware deepfake and image-forensics detector designed for robustness, well-calibrated probabilities, and transparent evidence under realistic deployment conditions. The method combines red-team training with randomized test-time defense in a two-stream architecture, where one stream encodes semantic content using a pretrained backbone and the other extracts forensic residuals, fused via a lightweight residual adapter for classification, while a shallow Feature Pyramid Network style head produces tamper heatmaps under weak supervision. Red-team training applies worst-of-K counter-forensics per batch, including JPEG realign and recompress, resampling warps, denoise-to-regrain operations, seam smoothing, small color and gamma shifts, and social-app transcodes, while test-time defense injects low-cost jitters such as resize and crop phase changes, mild gamma variation, and JPEG phase shifts with aggregated predictions. Heatmaps are guided to concentrate within face regions using face-box masks without strict pixel-level annotations. Evaluation on existing benchmarks, including standard deepfake datasets and a surveillance-style split with low light and heavy compression, reports clean and attacked performance, AUC, worst-case accuracy, reliability, abstention quality, and weak-localization scores. Results demonstrate near-perfect ranking across attacks, low calibration error, minimal abstention risk, and controlled degradation under regrain, establishing a modular, data-efficient, and practically deployable baseline for attack-aware detection with calibrated probabilities and actionable heatmaps.
△ Less
Submitted 25 December, 2025;
originally announced December 2025.
-
Constraints on gravitational waves from the 2024 Vela pulsar glitch
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1752 additional authors not shown)
Abstract:
Among known neutron stars, the Vela pulsar is one of the best targets for gravitational-wave searches. It is also one of the most prolific in terms of glitches, sudden frequency changes in a pulsar's rotation. Such glitches could cause a variety of transient gravitational-wave signals. Here we search for signals associated with a Vela glitch on 29 April 2024 in data of the two LIGO detectors from…
▽ More
Among known neutron stars, the Vela pulsar is one of the best targets for gravitational-wave searches. It is also one of the most prolific in terms of glitches, sudden frequency changes in a pulsar's rotation. Such glitches could cause a variety of transient gravitational-wave signals. Here we search for signals associated with a Vela glitch on 29 April 2024 in data of the two LIGO detectors from the fourth LIGO--Virgo--KAGRA observing run. We search both for seconds-scale burst-like emission, primarily from fundamental (f-)mode oscillations, and for longer quasi-monochromatic transients up to four months in duration, primarily from quasi-static quadrupolar deformations. We find no significant detection candidates, but for the first time we set direct observational upper limits on gravitational strain amplitude that are stricter than what can be indirectly inferred from the overall glitch energy scale. We discuss the short- and long-duration observational constraints in the context of specific emission models. These results demonstrate the potential of gravitational-wave probes of glitching pulsars as detector sensitivity continues to improve.
△ Less
Submitted 21 January, 2026; v1 submitted 19 December, 2025;
originally announced December 2025.
-
Bangla MedER: Multi-BERT Ensemble Approach for the Recognition of Bangla Medical Entity
Authors:
Tanjim Taharat Aurpa,
Farzana Akter,
Md. Mehedi Hasan,
Shakil Ahmed,
Shifat Ara Rafiq,
Fatema Khan
Abstract:
Medical Entity Recognition (MedER) is an essential NLP task for extracting meaningful entities from the medical corpus. Nowadays, MedER-based research outcomes can remarkably contribute to the development of automated systems in the medical sector, ultimately enhancing patient care and outcomes. While extensive research has been conducted on MedER in English, low-resource languages like Bangla rem…
▽ More
Medical Entity Recognition (MedER) is an essential NLP task for extracting meaningful entities from the medical corpus. Nowadays, MedER-based research outcomes can remarkably contribute to the development of automated systems in the medical sector, ultimately enhancing patient care and outcomes. While extensive research has been conducted on MedER in English, low-resource languages like Bangla remain underexplored. Our work aims to bridge this gap. For Bangla medical entity recognition, this study first examined a number of transformer models, including BERT, DistilBERT, ELECTRA, and RoBERTa. We also propose a novel Multi-BERT Ensemble approach that outperformed all baseline models with the highest accuracy of 89.58%. Notably, it provides an 11.80% accuracy improvement over the single-layer BERT model, demonstrating its effectiveness for this task. A major challenge in MedER for low-resource languages is the lack of annotated datasets. To address this issue, we developed a high-quality dataset tailored for the Bangla MedER task. The dataset was used to evaluate the effectiveness of our model through multiple performance metrics, demonstrating its robustness and applicability. Our findings highlight the potential of Multi-BERT Ensemble models in improving MedER for Bangla and set the foundation for further advancements in low-resource medical NLP.
△ Less
Submitted 19 December, 2025;
originally announced December 2025.
-
A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos
Authors:
Mohammed Irfan Kurpath,
Jaseel Muhammad Kaithakkodan,
Jinxing Zhou,
Sahal Shaji Mullappilly,
Mohammad Almansoori,
Noor Ahsan,
Beknur Kalmakhanbet,
Sambal Shikhar,
Rishabh Lalla,
Jean Lahoud,
Mariette Awad,
Fahad Shahbaz Khan,
Salman Khan,
Rao Muhammad Anwer,
Hisham Cholakkal
Abstract:
Long-form multimodal video understanding requires integrating vision, speech, and ambient audio with coherent long-range reasoning. Existing benchmarks emphasize either temporal length or multimodal richness, but rarely both and while some incorporate open-ended questions and advanced metrics, they mostly rely on single-score accuracy, obscuring failure modes. We introduce LongShOTBench, a diagnos…
▽ More
Long-form multimodal video understanding requires integrating vision, speech, and ambient audio with coherent long-range reasoning. Existing benchmarks emphasize either temporal length or multimodal richness, but rarely both and while some incorporate open-ended questions and advanced metrics, they mostly rely on single-score accuracy, obscuring failure modes. We introduce LongShOTBench, a diagnostic benchmark with open-ended, intent-driven questions; single- and multi-turn dialogues; and tasks requiring multimodal reasoning and agentic tool use across video, audio, and speech. Each item includes a reference answer and graded rubric for interpretable, and traceable evaluation. LongShOTBench is produced via a scalable, human-validated pipeline to ensure coverage and reproducibility. All samples in our LongShOTBench are human-verified and corrected. Furthermore, we present LongShOTAgent, an agentic system that analyzes long videos via preprocessing, search, and iterative refinement. On LongShOTBench, state-of-the-art MLLMs show large gaps: Gemini-2.5-Flash achieves 52.95%, open-source models remain below 30%, and LongShOTAgent attains 44.66%. These results underscore the difficulty of real-world long-form video understanding. LongShOTBench provides a practical, reproducible foundation for evaluating and improving MLLMs. All resources are available on GitHub: https://github.com/mbzuai-oryx/longshot.
△ Less
Submitted 18 December, 2025;
originally announced December 2025.
-
StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models
Authors:
Senmao Li,
Kai Wang,
Salman Khan,
Fahad Shahbaz Khan,
Jian Yang,
Yaxing Wang
Abstract:
Visual Autoregressive (VAR) modeling departs from the next-token prediction paradigm of traditional Autoregressive (AR) models through next-scale prediction, enabling high-quality image generation. However, the VAR paradigm suffers from sharply increased computational complexity and running time at large-scale steps. Although existing acceleration methods reduce runtime for large-scale steps, but…
▽ More
Visual Autoregressive (VAR) modeling departs from the next-token prediction paradigm of traditional Autoregressive (AR) models through next-scale prediction, enabling high-quality image generation. However, the VAR paradigm suffers from sharply increased computational complexity and running time at large-scale steps. Although existing acceleration methods reduce runtime for large-scale steps, but rely on manual step selection and overlook the varying importance of different stages in the generation process. To address this challenge, we present StageVAR, a systematic study and stage-aware acceleration framework for VAR models. Our analysis shows that early steps are critical for preserving semantic and structural consistency and should remain intact, while later steps mainly refine details and can be pruned or approximated for acceleration. Building on these insights, StageVAR introduces a plug-and-play acceleration strategy that exploits semantic irrelevance and low-rank properties in late-stage computations, without requiring additional training. Our proposed StageVAR achieves up to 3.4x speedup with only a 0.01 drop on GenEval and a 0.26 decrease on DPG, consistently outperforming existing acceleration baselines. These results highlight stage-aware design as a powerful principle for efficient visual autoregressive image generation.
△ Less
Submitted 18 December, 2025;
originally announced December 2025.
-
GWTC-4.0: Searches for Gravitational-Wave Lensing Signatures
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1744 additional authors not shown)
Abstract:
Gravitational waves can be gravitationally lensed by massive objects along their path. Depending on the lens mass and the lens--source geometry, this can lead to the observation of a single distorted signal or multiple repeated events with the same frequency evolution. We present the results for gravitational-wave lensing searches on the data from the first part of the fourth LIGO--Virgo--KAGRA ob…
▽ More
Gravitational waves can be gravitationally lensed by massive objects along their path. Depending on the lens mass and the lens--source geometry, this can lead to the observation of a single distorted signal or multiple repeated events with the same frequency evolution. We present the results for gravitational-wave lensing searches on the data from the first part of the fourth LIGO--Virgo--KAGRA observing run (O4a). We search for strongly lensed events in the newly acquired data by (1) searching for an overall phase shift present in an image formed at a saddle point of the lens potential, (2) looking for pairs of detected candidates with consistent frequency evolution, and (3) identifying sub-threshold counterpart candidates to the detected signals. Beyond strong lensing, we also look for lensing-induced distortions in all detected signals using an isolated point-mass model. We do not find evidence for strongly lensed gravitational-wave signals and use this result to constrain the rate of detectable strongly lensed events and the merger rate density of binary black holes at high redshift. In the search for single distorted lensed signals, we find one outlier: GW231123_135430, for which we report more detailed investigations. While this event is interesting, the associated waveform uncertainties make its interpretation complicated, and future observations of the populations of binary black holes and of gravitational lenses will help determine the probability that this event could be lensed.
△ Less
Submitted 4 February, 2026; v1 submitted 18 December, 2025;
originally announced December 2025.
-
A Bayesian latent class reinforcement learning framework to capture adaptive, feedback-driven travel behaviour
Authors:
Georges Sfeir,
Stephane Hess,
Thomas O. Hancock,
Filipe Rodrigues,
Jamal Amani Rad,
Michiel Bliemer,
Matthew Beck,
Fayyaz Khan
Abstract:
Many travel decisions involve a degree of experience formation, where individuals learn their preferences over time. At the same time, there is extensive scope for heterogeneity across individual travellers, both in their underlying preferences and in how these evolve. The present paper puts forward a Latent Class Reinforcement Learning (LCRL) model that allows analysts to capture both of these ph…
▽ More
Many travel decisions involve a degree of experience formation, where individuals learn their preferences over time. At the same time, there is extensive scope for heterogeneity across individual travellers, both in their underlying preferences and in how these evolve. The present paper puts forward a Latent Class Reinforcement Learning (LCRL) model that allows analysts to capture both of these phenomena. We apply the model to a driving simulator dataset and estimate the parameters through Variational Bayes. We identify three distinct classes of individuals that differ markedly in how they adapt their preferences: the first displays context-dependent preferences with context-specific exploitative tendencies; the second follows a persistent exploitative strategy regardless of context; and the third engages in an exploratory strategy combined with context-specific preferences.
△ Less
Submitted 8 December, 2025;
originally announced December 2025.
-
JWST Observations of the Double Nucleus in NGC 4486B: Possible Evidence for a Recent Binary SMBH Merger and Recoil
Authors:
Behzad Tahmasebzadeh,
Monica Valluri,
Shashank Dattathri,
Tatsuya Akiba,
Fazeel Mahmood Khan,
Matthew A. Taylor,
Haruka Yoshino,
Solveig Thompson,
Ann-Marie Madigan,
Frank C. van den Bosch,
Kelly holley-bockelmann,
Patrick Côté,
Laura Ferrarese,
Michael J. Drinkwater,
Holger Baumgardt,
Misty C. Bentz,
Kristen Dage,
Eric W. Peng,
Somya Jha,
Andrea V. Macciò,
Chengze Liu,
Tyrone E. Woods
Abstract:
A recent study of the compact elliptical galaxy NGC 4486B using JWST-NIRSpec IFU kinematics confirmed a supermassive black hole (SMBH) of mass $M_{BH}=3.6\pm0.7\times10^8$ (~8% of the stellar mass). In addition to its double nucleus, the nuclear kinematics show pronounced asymmetries: a velocity-dispersion peak displaced by 6 pc from the galaxy center and a ~16 km/s offset in the mean stellar line…
▽ More
A recent study of the compact elliptical galaxy NGC 4486B using JWST-NIRSpec IFU kinematics confirmed a supermassive black hole (SMBH) of mass $M_{BH}=3.6\pm0.7\times10^8$ (~8% of the stellar mass). In addition to its double nucleus, the nuclear kinematics show pronounced asymmetries: a velocity-dispersion peak displaced by 6 pc from the galaxy center and a ~16 km/s offset in the mean stellar line-of-sight velocity near the SMBH. We examine the origin of the 12 pc double nucleus and these asymmetries and show that the observations favor an SMBH surrounded by an eccentric nuclear disk (END). END formation models require the SMBH to experience a gravitational wave (GW) recoil following a binary SMBH merger. Our orbit-superposition models contain ~50% retrograde stars at the edge of the nuclear region, in striking agreement with END-formation simulations. We infer a pre-merger mass ratio q>0.15 and a recoil kick of ~340 km/s. Our N-body simulations show that with such a kick, the SMBH returns to the center within ~30 Myr. Its flat central core is also consistent with earlier binary black hole scouring. We test two alternative mechanisms-buoyancy-driven oscillations and a pre-merger SMBH binary-but neither reproduces the observed offsets, favoring the GW-kick scenario. Our direct N-body simulations further show that a prograde SMBH binary in a rotating host can stall in a corotation resonance, delaying coalescence. Thus, although NGC 4486B is an old, relaxed galaxy near the Virgo cluster center, its SMBH appears to have merged only recently, making its nucleus a rare nearby laboratory for studying post-merger SMBH dynamics.
△ Less
Submitted 13 March, 2026; v1 submitted 16 December, 2025;
originally announced December 2025.
-
PRIVEE: Privacy-Preserving Vertical Federated Learning Against Feature Inference Attacks
Authors:
Sindhuja Madabushi,
Ahmad Faraz Khan,
Haider Ali,
Ananthram Swami,
Rui Ning,
Hongyi Wu,
Jin-Hee Cho
Abstract:
Vertical Federated Learning (VFL) enables collaborative model training across organizations that share common user samples but hold disjoint feature spaces. Despite its potential, VFL is susceptible to feature inference attacks, in which adversarial parties exploit shared confidence scores (i.e., prediction probabilities) during inference to reconstruct private input features of other participants…
▽ More
Vertical Federated Learning (VFL) enables collaborative model training across organizations that share common user samples but hold disjoint feature spaces. Despite its potential, VFL is susceptible to feature inference attacks, in which adversarial parties exploit shared confidence scores (i.e., prediction probabilities) during inference to reconstruct private input features of other participants. To counter this threat, we propose PRIVEE (PRIvacy-preserving Vertical fEderated lEarning), a novel defense mechanism named after the French word privée, meaning "private." PRIVEE obfuscates confidence scores while preserving critical properties such as relative ranking and inter-score distances. Rather than exposing raw scores, PRIVEE shares only the transformed representations, mitigating the risk of reconstruction attacks without degrading model prediction accuracy. Extensive experiments show that PRIVEE achieves a threefold improvement in privacy protection compared to state-of-the-art defenses, while preserving full predictive performance against advanced feature inference attacks.
△ Less
Submitted 14 December, 2025;
originally announced December 2025.
-
Detecting Prompt Injection Attacks Against Application Using Classifiers
Authors:
Safwan Shaheer,
G. M. Refatul Islam,
Mohammad Rafid Hamid,
Md. Abrar Faiaz Khan,
Md. Omar Faruk,
Yaseen Nur
Abstract:
Prompt injection attacks can compromise the security and stability of critical systems, from infrastructure to large web applications. This work curates and augments a prompt injection dataset based on the HackAPrompt Playground Submissions corpus and trains several classifiers, including LSTM, feed forward neural networks, Random Forest, and Naive Bayes, to detect malicious prompts in LLM integra…
▽ More
Prompt injection attacks can compromise the security and stability of critical systems, from infrastructure to large web applications. This work curates and augments a prompt injection dataset based on the HackAPrompt Playground Submissions corpus and trains several classifiers, including LSTM, feed forward neural networks, Random Forest, and Naive Bayes, to detect malicious prompts in LLM integrated web applications. The proposed approach improves prompt injection detection and mitigation, helping protect targeted applications and systems.
△ Less
Submitted 14 December, 2025;
originally announced December 2025.
-
VLM2GeoVec: Toward Universal Multimodal Embeddings for Remote Sensing
Authors:
Emanuel Sánchez Aimar,
Gulnaz Zhambulova,
Fahad Shahbaz Khan,
Yonghao Xu,
Michael Felsberg
Abstract:
Satellite imagery differs fundamentally from natural images: its aerial viewpoint, very high resolution, diverse scale variations, and abundance of small objects demand both region-level spatial reasoning and holistic scene understanding. Current remote-sensing approaches remain fragmented between dual-encoder retrieval models, which excel at large-scale cross-modal search but cannot interleave mo…
▽ More
Satellite imagery differs fundamentally from natural images: its aerial viewpoint, very high resolution, diverse scale variations, and abundance of small objects demand both region-level spatial reasoning and holistic scene understanding. Current remote-sensing approaches remain fragmented between dual-encoder retrieval models, which excel at large-scale cross-modal search but cannot interleave modalities, and generative assistants, which support region-level interpretation but lack scalable retrieval capabilities. We propose $\textbf{VLM2GeoVec}$, an instruction-following, single-encoder vision-language model trained contrastively to embed interleaved inputs (images, text, bounding boxes, and geographic coordinates) in a unified vector space. Our single encoder interleaves all inputs into one joint embedding trained with a contrastive loss, eliminating multi-stage pipelines and task-specific modules. To evaluate its versatility, we introduce $\textbf{RSMEB}$, a novel benchmark covering key remote-sensing embedding applications: scene classification; cross-modal search; compositional retrieval; visual-question answering; visual grounding and region-level reasoning; and semantic geospatial retrieval. On RSMEB, it achieves $\textbf{26.6%}$ P@1 on region-caption retrieval (+25 pp vs. dual-encoder baselines), $\textbf{32.5%}$ P@1 on referring-expression retrieval (+19 pp), and $\textbf{17.8%}$ P@1 on semantic geo-localization retrieval (over $3\times$ prior best), while matching or exceeding specialized baselines on conventional tasks such as scene classification and cross-modal retrieval. VLM2GeoVec unifies scalable retrieval with region-level spatial reasoning, enabling cohesive multimodal analysis in remote sensing. We will publicly release the code, checkpoints, and data upon acceptance.
△ Less
Submitted 12 December, 2025;
originally announced December 2025.
-
Intermediate Mass Black Hole Binary Evolution in Nuclear Star Clusters: the effect of the stellar mass black hole population
Authors:
Fazeel Mahmood Khan,
Peter Berczik,
Margarita Sobolenko,
Andreas Just,
Rainer Spurzem,
Kelly Holley-Bockelmann,
Andrea Valerio Macciò
Abstract:
In this study, we investigate the dynamics of Intermediate-Mass Black Hole (IMBH) binaries within Nuclear Star Clusters (NSCs) that contain a population of stellar-mass black holes (BHs). We examine how these stellar and BH populations influence the dynamics of the IMBH binary and, in turn, how the evolving IMBH binary affects the surrounding stellar and BH populations. We conduct high-resolution…
▽ More
In this study, we investigate the dynamics of Intermediate-Mass Black Hole (IMBH) binaries within Nuclear Star Clusters (NSCs) that contain a population of stellar-mass black holes (BHs). We examine how these stellar and BH populations influence the dynamics of the IMBH binary and, in turn, how the evolving IMBH binary affects the surrounding stellar and BH populations. We conduct high-resolution $N$-body simulations of NSCs constructed based on observational parameters from two local dwarf galaxies: NGC205 and NGC404. For the first time, we achieve a star particle mass resolution of $1\rm\;M_{\odot}$ and a BH mass resolution of $10\rm\;M_{\odot}$. This level of resolution is crucial for accurately modeling the collisional dynamics of these dense systems. Including stellar-mass BHs within the stellar population significantly influences the IMBH binary dynamics, nearly doubling the sinking rate and halving the merger time. During the initial phase of the inspiral, the IMBH binary disrupts both the stellar and BH cusps. However, the BH cusp quickly regains its steep slope due to its shorter relaxation time and continues to dominate the evolution of the IMBH binary, despite being much less massive compared to the stellar component. We uncover an interesting mechanism in which BHs first efficiently extract energy from the IMBH binary and then transfer this energy to the surrounding stars, allowing the BHs to spiral back toward the center of the system and restart the process. Our results imply that, although stellar mass BHs are a minor component of a stellar population, they can significantly facilitate IMBH growth within NSCs via mergers. We also notice that these dense systems can potentially boost Intermediate Mass Ratio Inspirals (IMRIs) predominantly on radial orbits.
△ Less
Submitted 11 December, 2025;
originally announced December 2025.
-
Bring Your Dreams to Life: Continual Text-to-Video Customization
Authors:
Jiahua Dong,
Xudong Wang,
Wenqi Liang,
Zongyan Han,
Meng Cao,
Duzhen Zhang,
Hanbin Zhao,
Zhi Han,
Salman Khan,
Fahad Shahbaz Khan
Abstract:
Customized text-to-video generation (CTVG) has recently witnessed great progress in generating tailored videos from user-specific text. However, most CTVG methods assume that personalized concepts remain static and do not expand incrementally over time. Additionally, they struggle with forgetting and concept neglect when continuously learning new concepts, including subjects and motions. To resolv…
▽ More
Customized text-to-video generation (CTVG) has recently witnessed great progress in generating tailored videos from user-specific text. However, most CTVG methods assume that personalized concepts remain static and do not expand incrementally over time. Additionally, they struggle with forgetting and concept neglect when continuously learning new concepts, including subjects and motions. To resolve the above challenges, we develop a novel Continual Customized Video Diffusion (CCVD) model, which can continuously learn new concepts to generate videos across various text-to-video generation tasks by tackling forgetting and concept neglect. To address catastrophic forgetting, we introduce a concept-specific attribute retention module and a task-aware concept aggregation strategy. They can capture the unique characteristics and identities of old concepts during training, while combining all subject and motion adapters of old concepts based on their relevance during testing. Besides, to tackle concept neglect, we develop a controllable conditional synthesis to enhance regional features and align video contexts with user conditions, by incorporating layer-specific region attention-guided noise estimation. Extensive experimental comparisons demonstrate that our CCVD outperforms existing CTVG baselines on both the DreamVideo and Wan 2.1 backbones. The code is available at https://github.com/JiahuaDong/CCVD.
△ Less
Submitted 10 December, 2025; v1 submitted 5 December, 2025;
originally announced December 2025.
-
Step-by-step Layered Design Generation
Authors:
Faizan Farooq Khan,
K J Joseph,
Koustava Goswami,
Mohamed Elhoseiny,
Balaji Vasan Srinivasan
Abstract:
Design generation, in its essence, is a step-by-step process where designers progressively refine and enhance their work through careful modifications. Despite this fundamental characteristic, existing approaches mainly treat design synthesis as a single-step generation problem, significantly underestimating the inherent complexity of the creative process. To bridge this gap, we propose a novel pr…
▽ More
Design generation, in its essence, is a step-by-step process where designers progressively refine and enhance their work through careful modifications. Despite this fundamental characteristic, existing approaches mainly treat design synthesis as a single-step generation problem, significantly underestimating the inherent complexity of the creative process. To bridge this gap, we propose a novel problem setting called Step-by-Step Layered Design Generation, which tasks a machine learning model with generating a design that adheres to a sequence of instructions from a designer. Leveraging recent advancements in multi-modal LLMs, we propose SLEDGE: Step-by-step LayEred Design GEnerator to model each update to a design as an atomic, layered change over its previous state, while being grounded in the instruction. To complement our new problem setting, we introduce a new evaluation suite, including a dataset and a benchmark. Our exhaustive experimental analysis and comparison with state-of-the-art approaches tailored to our new setup demonstrate the efficacy of our approach. We hope our work will attract attention to this pragmatic and under-explored research area.
△ Less
Submitted 2 December, 2025;
originally announced December 2025.
-
Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models
Authors:
Muhammad Maaz,
Hanoona Rasheed,
Fahad Shahbaz Khan,
Salman Khan
Abstract:
Reasoning over dynamic visual content remains a central challenge for multimodal large language models. Recent thinking models generate explicit reasoning traces for interpretability; however, their reasoning often appears convincing while being logically inconsistent or weakly grounded in visual evidence. We identify and formalize these issues through two diagnostic metrics: Think Answer Consiste…
▽ More
Reasoning over dynamic visual content remains a central challenge for multimodal large language models. Recent thinking models generate explicit reasoning traces for interpretability; however, their reasoning often appears convincing while being logically inconsistent or weakly grounded in visual evidence. We identify and formalize these issues through two diagnostic metrics: Think Answer Consistency (TAC), which measures the alignment between reasoning and answers, and Video Attention Score (VAS), which captures the extent to which reasoning depends on visual versus textual cues. Analysis across 11 video reasoning benchmarks shows that current models rely heavily on linguistic priors rather than visual content. To address this, we propose a reinforcement learning approach that enhances both temporal precision and reasoning consistency. Our approach combines timestamp aware supervised fine tuning with Group Relative Policy Optimization (GRPO) guided by a novel Temporal Alignment Reward (TAR). This dual step post training stage encourages temporally aligned and causally coherent video reasoning. The resulting model, Video R2, achieves consistently higher TAC, VAS, and accuracy across multiple benchmarks, demonstrating that improvements in temporal alignment and reasoning coherence lead to more accurate and trustworthy video understanding. Code: https://github.com/mbzuai-oryx/Video-R2
△ Less
Submitted 8 December, 2025; v1 submitted 28 November, 2025;
originally announced November 2025.
-
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
Authors:
Hanoona Rasheed,
Mohammed Zumri,
Muhammad Maaz,
Ming-Hsuan Yang,
Fahad Shahbaz Khan,
Salman Khan
Abstract:
Recent multimodal large language models (MLLMs) have advanced video understanding, yet most still "think about videos" ie once a video is encoded, reasoning unfolds entirely in text, treating visual input as a static context. This passive paradigm creates a semantic bottleneck: models cannot rewatch, refocus, or verify evidence, leading to shallow visual reasoning on tasks requiring fine grained s…
▽ More
Recent multimodal large language models (MLLMs) have advanced video understanding, yet most still "think about videos" ie once a video is encoded, reasoning unfolds entirely in text, treating visual input as a static context. This passive paradigm creates a semantic bottleneck: models cannot rewatch, refocus, or verify evidence, leading to shallow visual reasoning on tasks requiring fine grained spatio temporal understanding. In this work, we introduce Interactive Video Reasoning, a new paradigm that transforms video into an active cognitive workspace, enabling models to "think with videos". Our model, Video CoM, reasons through a Chain of Manipulations (CoM), performing iterative visual actions to gather and refine evidence. To support this behavior, we construct Video CoM Instruct, an 18K instruction tuning dataset curated for multi step manipulation reasoning. Beyond supervised learning, we further optimize the manipulation policy via reinforcement learning with reasoning aware Group Relative Policy Optimization (GRPO). Unlike prior work that relies solely on sparse answer rewards, our method introduces step level reasoning rewards, guiding the model toward grounded and consistent reasoning. Video CoM achieves strong results across nine video reasoning benchmarks, improving average performance by 3.6 percent over recent state of the art models, while training on only 25K SFT and 3K GRPO video samples, significantly fewer than comparable large scale models. Ablation studies demonstrate that reasoning aware rewards improve both accuracy and interpretability. Code: https://github.com/mbzuai-oryx/Video-CoM
△ Less
Submitted 28 November, 2025;
originally announced November 2025.
-
MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities
Authors:
Tooba Tehreem Sheikh,
Jean Lahoud,
Rao Muhammad Anwer,
Fahad Shahbaz Khan,
Salman Khan,
Hisham Cholakkal
Abstract:
Traditional object detection models in medical imaging operate within a closed-set paradigm, limiting their ability to detect objects of novel labels. Open-vocabulary object detection (OVOD) addresses this limitation but remains underexplored in medical imaging due to dataset scarcity and weak text-image alignment. To bridge this gap, we introduce MedROV, the first Real-time Open Vocabulary detect…
▽ More
Traditional object detection models in medical imaging operate within a closed-set paradigm, limiting their ability to detect objects of novel labels. Open-vocabulary object detection (OVOD) addresses this limitation but remains underexplored in medical imaging due to dataset scarcity and weak text-image alignment. To bridge this gap, we introduce MedROV, the first Real-time Open Vocabulary detection model for medical imaging. To enable open-vocabulary learning, we curate a large-scale dataset, Omnis, with 600K detection samples across nine imaging modalities and introduce a pseudo-labeling strategy to handle missing annotations from multi-source datasets. Additionally, we enhance generalization by incorporating knowledge from a large pre-trained foundation model. By leveraging contrastive learning and cross-modal representations, MedROV effectively detects both known and novel structures. Experimental results demonstrate that MedROV outperforms the previous state-of-the-art foundation model for medical image detection with an average absolute improvement of 40 mAP50, and surpasses closed-set detectors by more than 3 mAP50, while running at 70 FPS, setting a new benchmark in medical detection. Our source code, dataset, and trained model are available at https://github.com/toobatehreem/MedROV.
△ Less
Submitted 25 November, 2025;
originally announced November 2025.
-
Search for planetary-mass ultra-compact binaries using data from the first part of the LIGO--Virgo--KAGRA fourth observing run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1743 additional authors not shown)
Abstract:
We present a search for gravitational waves from inspiraling, planetary-mass ultra-compact binaries using data from the first part of the fourth observing run of LIGO, Virgo and KAGRA. Finding no evidence of such systems, we determine the maximum distance reach for such objects and their merger rate densities, independently of how they could have formed. Then, we identify classes of primordial bla…
▽ More
We present a search for gravitational waves from inspiraling, planetary-mass ultra-compact binaries using data from the first part of the fourth observing run of LIGO, Virgo and KAGRA. Finding no evidence of such systems, we determine the maximum distance reach for such objects and their merger rate densities, independently of how they could have formed. Then, we identify classes of primordial black-hole mass distributions for which these rate limits can be translated into relevant constraints on the mass distribution of primordial black holes, assuming that they compose all of dark matter, in the mass range $[10^{-6},10^{-3}]M_\odot$. Our constraints are consistent with existing microlensing results in the planetary-mass range, and provide a complementary probe to sub-solar mass objects.
△ Less
Submitted 5 December, 2025; v1 submitted 24 November, 2025;
originally announced November 2025.
-
Diversity Has Always Been There in Your Visual Autoregressive Models
Authors:
Tong Wang,
Guanyu Yang,
Nian Liu,
Kai Wang,
Yaxing Wang,
Abdelrahman M Shaker,
Salman Khan,
Fahad Shahbaz Khan,
Senmao Li
Abstract:
Visual Autoregressive (VAR) models have recently garnered significant attention for their innovative next-scale prediction paradigm, offering notable advantages in both inference efficiency and image quality compared to traditional multi-step autoregressive (AR) and diffusion models. However, despite their efficiency, VAR models often suffer from the diversity collapse i.e., a reduction in output…
▽ More
Visual Autoregressive (VAR) models have recently garnered significant attention for their innovative next-scale prediction paradigm, offering notable advantages in both inference efficiency and image quality compared to traditional multi-step autoregressive (AR) and diffusion models. However, despite their efficiency, VAR models often suffer from the diversity collapse i.e., a reduction in output variability, analogous to that observed in few-step distilled diffusion models. In this paper, we introduce DiverseVAR, a simple yet effective approach that restores the generative diversity of VAR models without requiring any additional training. Our analysis reveals the pivotal component of the feature map as a key factor governing diversity formation at early scales. By suppressing the pivotal component in the model input and amplifying it in the model output, DiverseVAR effectively unlocks the inherent generative potential of VAR models while preserving high-fidelity synthesis. Empirical results demonstrate that our approach substantially enhances generative diversity with only neglectable performance influences. Our code will be publicly released at https://github.com/wangtong627/DiverseVAR.
△ Less
Submitted 21 November, 2025;
originally announced November 2025.
-
All-sky search for continuous gravitational-wave signals from unknown neutron stars in binary systems in the first part of the fourth LIGO-Virgo-KAGRA observing run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
D. Adhikari,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
S. Afroz,
A. Agapito,
D. Agarwal,
M. Agathos,
N. Aggarwal,
S. Aggarwal,
O. D. Aguiar,
I. -L. Ahrend,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu
, et al. (1743 additional authors not shown)
Abstract:
We present the results of a blind all-sky search for continuous gravitational-wave signals from neutron stars in binary systems using data from the first part of the fourth observing run (O4a) using LIGO detectors data. Rapidly rotating, non-axisymmetric neutron stars are expected to emit continuous gravitational waves, whose detection would significantly improve our understanding of the galactic…
▽ More
We present the results of a blind all-sky search for continuous gravitational-wave signals from neutron stars in binary systems using data from the first part of the fourth observing run (O4a) using LIGO detectors data. Rapidly rotating, non-axisymmetric neutron stars are expected to emit continuous gravitational waves, whose detection would significantly improve our understanding of the galactic neutron star population and matter under extreme conditions, while also providing valuable tests of general relativity. Neutron stars in binary systems likely constitute a substantial fraction of the unobserved galactic population and, due to potential mass accretion, may emit stronger gravitational-wave signals than their isolated counterparts. This search targets signals from neutron stars with frequencies in the 100-350 Hz range, with orbital periods between 7 and 15 days and projected semi-major axes between 5 and 15 light-seconds. The analysis employs the GPU-accelerated fasttracks pipeline. No credible astrophysical signals were identified, and, in the absence of a detection, we report search sensitivity estimates on the population of neutron stars in binary systems in the Milky Way.
△ Less
Submitted 4 December, 2025; v1 submitted 20 November, 2025;
originally announced November 2025.
-
EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
Authors:
Omkar Thawakar,
Shravan Venkatraman,
Ritesh Thawkar,
Abdelrahman Shaker,
Hisham Cholakkal,
Rao Muhammad Anwer,
Salman Khan,
Fahad Khan
Abstract:
Recent advances in large multimodal models (LMMs) have enabled impressive reasoning and perception abilities, yet most existing training pipelines still depend on human-curated data or externally verified reward models, limiting their autonomy and scalability. In this work, we strive to improve LMM reasoning capabilities in a purely unsupervised fashion (without any annotated data or reward distil…
▽ More
Recent advances in large multimodal models (LMMs) have enabled impressive reasoning and perception abilities, yet most existing training pipelines still depend on human-curated data or externally verified reward models, limiting their autonomy and scalability. In this work, we strive to improve LMM reasoning capabilities in a purely unsupervised fashion (without any annotated data or reward distillation). To this end, we propose a self-evolving framework, named EvoLMM, that instantiates two cooperative agents from a single backbone model: a Proposer, which generates diverse, image-grounded questions, and a Solver, which solves them through internal consistency, where learning proceeds through a continuous self-rewarding process. This dynamic feedback encourages both the generation of informative queries and the refinement of structured reasoning without relying on ground-truth or human judgments. When using the popular Qwen2.5-VL as the base model, our EvoLMM yields consistent gains upto $\sim$3\% on multimodal math-reasoning benchmarks, including ChartQA, MathVista, and MathVision, using only raw training images. We hope our simple yet effective approach will serve as a solid baseline easing future research in self-improving LMMs in a fully-unsupervised fashion. Our code and models are available at https://github.com/mbzuai-oryx/EvoLMM.
△ Less
Submitted 13 March, 2026; v1 submitted 20 November, 2025;
originally announced November 2025.
-
GLT matrix-sequences and few emblematic applications
Authors:
Muhammad Faisal Khan
Abstract:
This thesis advances the spectral theory of structured matrix-sequences within the framework of Generalized Locally Toeplitz (GLT) $*$-algebras, focusing on the geometric mean of Hermitian positive definite (HPD) GLT sequences and its applications in mathematical physics. For two HPD sequences $\{A_n\}_n \sim_{\mathrm{GLT}} κ$ and $\{B_n\}_n \sim_{\mathrm{GLT}} ξ$ in the same $d$-level, $r$-block…
▽ More
This thesis advances the spectral theory of structured matrix-sequences within the framework of Generalized Locally Toeplitz (GLT) $*$-algebras, focusing on the geometric mean of Hermitian positive definite (HPD) GLT sequences and its applications in mathematical physics. For two HPD sequences $\{A_n\}_n \sim_{\mathrm{GLT}} κ$ and $\{B_n\}_n \sim_{\mathrm{GLT}} ξ$ in the same $d$-level, $r$-block GLT $*$-algebra, we prove that when $κ$ and $ξ$ commute, the geometric mean sequence $\{G(A_n,B_n)\}_n$ is GLT with symbol $(κξ)^{1/2}$, without requiring invertibility of either symbol, settling \cite[Conjecture 10.1]{garoni2017} for $r=1$, $d\ge1$. In degenerate cases, we identify conditions ensuring $\{G(A_n,B_n)\}_n \sim_{\mathrm{GLT}} G(κ,ξ)$. For $r>1$ and non-commuting symbols, numerical evidence shows the sequence still admits a spectral symbol, indicating maximality of the commuting result. Numerical experiments in scalar and block settings confirm the theory and illustrate spectral behaviour. We also sketch the extension to $k\ge2$ sequences via the Karcher mean, obtaining $\{G(A_n^{(1)},\ldots,A_n^{(k)})\}_n \sim_{\mathrm{GLT}} G(κ_1,\ldots,κ_k)$. Finally, we apply the GLT framework to mean-field quantum spin systems, showing that matrices from the quantum Curie--Weiss model form GLT sequences with explicitly computable spectral distributions.
△ Less
Submitted 9 November, 2025;
originally announced November 2025.
-
Mitigating effects of nonlinearities in homodyne quadrature interferometers
Authors:
Johannes Lehmann,
Artem Basalaev,
Jonathan J. Carter,
Matteo Carlassara,
Harald Lück,
Gabriella Chiarini,
Pritam Sarkar,
Firoz Khan,
Satoru Takano,
Sara Al-Kershi,
Sina M. Koehlenbeck,
Pascal Birckigt,
Sarah L. Kranzhoff,
Juliane von Wrangel,
David S. Wu
Abstract:
Homodyne Quadrature interferometers (HoQI) are an interferometric displacement sensing scheme proven to have excellent noise performance, making them a strong candidate for sensing and control schemes in gravitational wave detector seismic isolation. Like many interferometric schemes, HoQIs are prone to nonlinear effects when measuring displacements. These nonlinearities, if left unsuppressed, wou…
▽ More
Homodyne Quadrature interferometers (HoQI) are an interferometric displacement sensing scheme proven to have excellent noise performance, making them a strong candidate for sensing and control schemes in gravitational wave detector seismic isolation. Like many interferometric schemes, HoQIs are prone to nonlinear effects when measuring displacements. These nonlinearities, if left unsuppressed, would substantially limit the use cases of HoQIs. This paper first shows a means of measuring and quantifying nonlinearities using a working HoQI and a mechanical resonator. We then demonstrate a method for real-time correction of these nonlinearities and several approaches for accurately calibrating the correction technique. By correcting in real time, we remove one of the biggest obstacles to including HoQIs in upgrades to future gravitational wave detectors. Finally, we discuss how to post correct data from HoQIs, suppressing even further the nonlinearity-induced errors, broadening the appeal of such sensors to other applications where measurement data can be reconstructed after the fact. We demonstrate all of this on a working HoQI system and show the measured suppression of nonlinear effects from each of these methods. Our work makes HoQIs a more broadly applicable tool for displacement sensing.
△ Less
Submitted 6 November, 2025;
originally announced November 2025.