-
NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Professional Image Quality Assessment (Track 1)
Authors:
Guanyi Qin,
Jie Liang,
Bingbing Zhang,
Lishen Qu,
Ya-nan Guan,
Hui Zeng,
Lei Zhang,
Radu Timofte,
Jianhui Sun,
Xinli Yue,
Tao Shao,
Huan Hou,
Wenjie Liao,
Shuhao Han,
Jieyu Yuan,
Chunle Guo,
Chongyi Li,
Zewen Chen,
Yunze Liu,
Jian Guo,
Juan Wang,
Yun Zeng,
Bing Li,
Weiming Hu,
Hesong Li
, et al. (28 additional authors not shown)
Abstract:
In this paper, we present an overview of the NTIRE 2026 challenge on the 3rd Restore Any Image Model in the Wild, specifically focusing on Track 1: Professional Image Quality Assessment. Conventional Image Quality Assessment (IQA) typically relies on scalar scores. By compressing complex visual characteristics into a single number, these methods fundamentally struggle to distinguish subtle differe…
▽ More
In this paper, we present an overview of the NTIRE 2026 challenge on the 3rd Restore Any Image Model in the Wild, specifically focusing on Track 1: Professional Image Quality Assessment. Conventional Image Quality Assessment (IQA) typically relies on scalar scores. By compressing complex visual characteristics into a single number, these methods fundamentally struggle to distinguish subtle differences among uniformly high-quality images. Furthermore, they fail to articulate why one image is superior, lacking the reasoning capabilities required to provide guidance for vision tasks. To bridge this gap, recent advancements in Multimodal Large Language Models (MLLMs) offer a promising paradigm. Inspired by this potential, our challenge establishes a novel benchmark exploring the ability of MLLMs to mimic human expert cognition in evaluating high-quality image pairs. Participants were tasked with overcoming critical bottlenecks in professional scenarios, centering on two primary objectives: (1) Comparative Quality Selection: reliably identifying the visually superior image within a high-quality pair; and (2) Interpretative Reasoning: generating grounded, expert-level explanations that detail the rationale behind the selection. In total, the challenge attracted nearly 200 registrations and over 2,500 submissions. The top-performing methods significantly advanced the state of the art in professional IQA. The challenge dataset is available at https://github.com/narthchin/RAIM-PIQA, and the official homepage is accessible at https://www.codabench.org/competitions/12789/.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Towards Robust Real-World Spreadsheet Understanding with Multi-Agent Multi-Format Reasoning
Authors:
Houxing Ren,
Mingjie Zhan,
Zimu Lu,
Ke Wang,
Yunqiao Yang,
Haotian Hou,
Hongsheng Li
Abstract:
Spreadsheets are central to real-world applications such as enterprise reporting, auditing, and scientific data management. Despite their ubiquity, existing large language model based approaches typically treat tables as plain text, overlooking critical layout cues and visual semantics. Moreover, real-world spreadsheets are often massive in scale, exceeding the input length that LLMs can efficient…
▽ More
Spreadsheets are central to real-world applications such as enterprise reporting, auditing, and scientific data management. Despite their ubiquity, existing large language model based approaches typically treat tables as plain text, overlooking critical layout cues and visual semantics. Moreover, real-world spreadsheets are often massive in scale, exceeding the input length that LLMs can efficiently process. To address these challenges, we propose SpreadsheetAgent, a two-stage multi-agent framework for spreadsheet understanding that adopts a step-by-step reading and reasoning paradigm. Instead of loading the entire spreadsheet at once, SpreadsheetAgent incrementally interprets localized regions through multiple modalities, including code execution results, images, and LaTeX tables. The method first constructs a structural sketch and row/column summaries, and then performs task-driven reasoning over this intermediate representation in the Solving Stage. To further enhance reliability, we design a verification module that validates extracted structures via targeted inspections, reducing error propagation and ensuring trustworthy inputs for downstream reasoning. Extensive experiments on two spreadsheet datasets demonstrate the effectiveness of our approach. With GPT-OSS-120B, SpreadsheetAgent achieves 38.16% on Spreadsheet Bench, outperforming the ChatGPT Agent baseline (35.27%) by 2.89 absolute points. These results highlight the potential of SpreadsheetAgent to advance robust and scalable spreadsheet understanding in real-world applications. Code is available at https://github.com/renhouxing/SpreadsheetAgent.git.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Reliable Online Resource Allocation for Multi-User Semantic Communications: A Constraint Bayesian Optimization Approach
Authors:
Huawei Hou,
Suzhi Bi,
Xian Li,
Haixia Zhang,
Zhi Quan
Abstract:
Semantic communication has been increasingly integrated into edge computing systems for reconstruction tasks, owing to its advantages in source compression, robustness to channel noise, and task execution efficiency. However, the black-box nature of neural-network (NN)-based semantic codecs, together with the noisy transmission of semantic features, makes it difficult to allocate transmission reso…
▽ More
Semantic communication has been increasingly integrated into edge computing systems for reconstruction tasks, owing to its advantages in source compression, robustness to channel noise, and task execution efficiency. However, the black-box nature of neural-network (NN)-based semantic codecs, together with the noisy transmission of semantic features, makes it difficult to allocate transmission resources and guarantee reconstruction quality for multiple users. In this paper, we propose a reliable online resource allocation framework for a semantic-driven multi-user edge computing system, where multiple users encode source information into semantic features and offload reconstruction to an edge server. We formulate a multi-user resource optimization problem whose objective jointly accounts for system-wide reconstruction performance and transmission latency, under constraints that guarantee each user's minimum reconstruction quality. To solve this problem, we develop a Bayesian optimization (BO)-based online algorithm that enables flexible control of the user-side semantic compression ratio (CR) and allocation of transmission rates. The edge server jointly determines each user's CR and transmission rate by exploiting Gaussian-process (GP) models that capture the relationship between reconstruction performance, signal-to-noise ratio (SNR), and CR, and by employing an acquisition function to select CRs that satisfy the performance quality constraints while maximizing the objective. Simulation results on high-resolution video-frame reconstruction datasets demonstrate that the proposed method selects near-optimal CRs via the GP surrogate and acquisition function, achieving a 98.03% constraint-satisfaction rate and reducing transmission latency by more than 45% compared with fixed-CR schemes.
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
A Channel Knowledge Map-Driven Two-Stage Coordinated User Scheduling in Multi-Cell Massive MIMO Systems
Authors:
Jiayang Wan,
Hongwei Hou,
Jiawei Zhuang,
Wenjin Wang,
Shi Jin
Abstract:
This paper investigates narrowband coordinated user scheduling in multi-cell massive multiple-input multiple-output (MIMO) systems. We formulate the problem under a spectral-efficiency maximization criterion, revealing inherent challenges in computational complexity and signaling overhead. To address these, we develop a user-scheduling-oriented CKM (US-CKM) and a US-CKM-driven two-stage coordinate…
▽ More
This paper investigates narrowband coordinated user scheduling in multi-cell massive multiple-input multiple-output (MIMO) systems. We formulate the problem under a spectral-efficiency maximization criterion, revealing inherent challenges in computational complexity and signaling overhead. To address these, we develop a user-scheduling-oriented CKM (US-CKM) and a US-CKM-driven two-stage coordinated scheduling framework. By exploiting the mapping between location information and statistical channel state information (SCSI), the system enables rapid SCSI retrieval and persistent reuse, substantially reducing CSI acquisition overhead. Embedding statistical channel correlation into the CKM further characterizes interuser interference patterns. The framework designs an intra-cell active-user selection scheme for the first stage and an inter-cell coordinated scheduling scheme for the second, both based on US-CKM entries. The first stage identifies users with favorable channel gains and low intra-cell interference, reducing the candidate set with marginal sum-rate loss. The second stage suppresses inter-cell interference (ICI) by exploiting cross-cell channel correlations. To enhance robustness against imperfect SCSI in dynamic scattering environments, we augment the framework with a reliability-guided mechanism. Instead of uniform treatment, we evaluate entry stability using a grid reliability metric quantifying channel measurement variance at sampling locations. Low-reliability grids are identified, and their instantaneous CSI is acquired in real time to integrate with existing SCSI. This process refines channel gain and spatial correlation characteristics, ensuring robust performance under imperfect conditions.
△ Less
Submitted 20 March, 2026;
originally announced March 2026.
-
SEMAG: Self-Evolutionary Multi-Agent Code Generation
Authors:
Yulin Peng,
Haowen Hou,
Xinxin Zhu,
Ying Tiffany He,
F. Richard Yu
Abstract:
Large Language Models (LLMs) have made significant progress in handling complex programming tasks. However, current methods rely on manual model selection and fixed workflows, which limit their ability to adapt to changing task complexities. To address this, we propose SEMAG, a Self-Evolutionary Multi-Agent code Generation framework that mimics human coding practices. It decomposes programming tas…
▽ More
Large Language Models (LLMs) have made significant progress in handling complex programming tasks. However, current methods rely on manual model selection and fixed workflows, which limit their ability to adapt to changing task complexities. To address this, we propose SEMAG, a Self-Evolutionary Multi-Agent code Generation framework that mimics human coding practices. It decomposes programming tasks into stages, including planning, coding, debugging, and discussion, while adapting workflows to task difficulty. Its self-evolutionary agents can access the latest models in real time and automatically upgrade the backbone model. SEMAG sets new state-of-the-art Pass@1 accuracy across benchmarks. Using identical backbone models, SEMAG outperforms prior methods by 3.3% on CodeContests. When augmented with self-evolutionary model selection that automatically identifies optimal backbones, SEMAG reaches 52.6%, showcasing both framework effectiveness and adaptability to evolving LLM capabilities.
△ Less
Submitted 16 March, 2026;
originally announced March 2026.
-
Good Reasoning Makes Good Demonstrations: Implicit Reasoning Quality Supervision via In-Context Reinforcement Learning
Authors:
Tiehua Mei,
Minxuan Lv,
Leiyu Pan,
Zhenpeng Su,
Hongru Hou,
Hengrui Chen,
Ao Xu,
Deqing Yang
Abstract:
Reinforcement Learning with Verifiable Rewards (RLVR) improves reasoning in large language models but treats all correct solutions equally, potentially reinforcing flawed traces that get correct answers by chance. We observe that better reasoning are better teachers: high-quality solutions serve as more effective demonstrations than low-quality ones. We term this teaching ability Demonstration Uti…
▽ More
Reinforcement Learning with Verifiable Rewards (RLVR) improves reasoning in large language models but treats all correct solutions equally, potentially reinforcing flawed traces that get correct answers by chance. We observe that better reasoning are better teachers: high-quality solutions serve as more effective demonstrations than low-quality ones. We term this teaching ability Demonstration Utility, and show that the policy model's own in-context learning ability provides an efficient way to measure it, yielding a quality signal termed Evidence Gain. To employ this signal during training, we introduce In-Context RLVR. By Bayesian analysis, we show that this objective implicitly reweights rewards by Evidence Gain, assigning higher weights to high-quality traces and lower weights to low-quality ones, without requiring costly computation or external evaluators. Experiments on mathematical benchmarks show improvements in both accuracy and reasoning quality over standard RLVR.
△ Less
Submitted 10 March, 2026;
originally announced March 2026.
-
On non-uniqueness of mild solutions and stationary singular solutions to the Navier-Stokes equations
Authors:
Alexey Cheskidov,
Hedong Hou
Abstract:
We prove that the unconditional uniqueness of mild solutions to the Navier-Stokes equations fails in all the Besov spaces with negative regularity index, by constructing non-trivial stationary singular solutions via convex integration. We also establish uniqueness of stationary weak solutions in an endpoint critical space. Similar results are proved for the fractional Navier-Stokes equations with…
▽ More
We prove that the unconditional uniqueness of mild solutions to the Navier-Stokes equations fails in all the Besov spaces with negative regularity index, by constructing non-trivial stationary singular solutions via convex integration. We also establish uniqueness of stationary weak solutions in an endpoint critical space. Similar results are proved for the fractional Navier-Stokes equations with arbitrarily large power of the Laplacian in both Lebesgue and Besov spaces.
△ Less
Submitted 3 March, 2026;
originally announced March 2026.
-
Susceptible-Infected Epidemics on Evolving Graphs at Critical Infection Rate
Authors:
Wenze Chen,
Haojie Hou,
Ruibo Ma,
Dong Yao
Abstract:
Consider an SI process on a graph $G$ where each S--I connection becomes I--I at rate $λ$. Here S and I stand for ``susceptible'' and ``infected'' respectively. The evoSI model is a modification of the SI model in which S--I edges are broken at rate $ρ$ and the ``S'' connects to a randomly chosen vertex. It is proven in Durrett and Yao [2022, Electron. J. Probab.] that, for the supercritical evoSI…
▽ More
Consider an SI process on a graph $G$ where each S--I connection becomes I--I at rate $λ$. Here S and I stand for ``susceptible'' and ``infected'' respectively. The evoSI model is a modification of the SI model in which S--I edges are broken at rate $ρ$ and the ``S'' connects to a randomly chosen vertex. It is proven in Durrett and Yao [2022, Electron. J. Probab.] that, for the supercritical evoSI process on the configuration model, there exists a quantity $Δ$ depending on the first three moments of the degree distribution such that the sign of $Δ$ governs the continuity of the phase transition of the final epidemic size near the critical infection rate $λ_c$.
In this paper, we consider the critical evoSI model on the configuration model, i.e., $λ=λ_c$. We show that, if $Δ>0$, then the probability of a major outbreak starting from a single infected individual is $Cn^{-1/3}(1+o(1))$ for some explicit constant $C>0$, where $n$ is the size of the graph. On the contrary, if $Δ<0$, then this probability is $o(n^{-1/3})$. The case $Δ<0$ is reminiscent of the critical {\ER} graphs, where the probability for the size of the largest component to be of order $n$ decays exponentially in $n$.
△ Less
Submitted 3 March, 2026;
originally announced March 2026.
-
Redundancy-Optimal Constructions of $(1,1)$-Criss-Cross Deletion Correcting Codes with Efficient Encoding/Decoding Algorithms
Authors:
Wenhao Liu,
Zhengyi Jiang,
Zhongyi Huang,
Hanxu Hou
Abstract:
Two-dimensional error-correcting codes, where codewords are represented as $n \times n$ arrays over a $q$-ary alphabet, find important applications in areas such as QR codes, DNA-based storage, and racetrack memories. Among the possible error patterns, $(t_r,t_c)$-criss-cross deletions-where $t_r$ rows and $t_c$ columns are simultaneously deleted-are of particular significance. In this paper, we f…
▽ More
Two-dimensional error-correcting codes, where codewords are represented as $n \times n$ arrays over a $q$-ary alphabet, find important applications in areas such as QR codes, DNA-based storage, and racetrack memories. Among the possible error patterns, $(t_r,t_c)$-criss-cross deletions-where $t_r$ rows and $t_c$ columns are simultaneously deleted-are of particular significance. In this paper, we focus on $q$-ary $(1,1)$-criss-cross deletion correcting codes. We present a novel code construction and develop complete encoding, decoding, and data recovery algorithms for parameters $n \ge 11$ and $q \ge 3$. The complexity of the proposed encoding, decoding, and data recovery algorithms is $\mathcal{O}(n^2)$. Furthermore, we show that for $n \ge 11$ and $q = Ω(n)$ (i.e., there exists a constant $c>0$ such that $q \ge cn$), both the code redundancy and the encoder redundancy of the constructed codes are $2n + 2\log_q n + \mathcal{O}(1)$, which attain the lower bound ($2n + 2\log_q n - 3$) within an $\mathcal{O}(1)$ gap. To the best of our knowledge, this is the first construction that can achieve the optimal redundancy with only an $\mathcal{O}(1)$ gap, while simultaneously featuring explicit encoding and decoding algorithms.
△ Less
Submitted 13 February, 2026;
originally announced February 2026.
-
Beyond Benchmarks of IUGC: Rethinking Requirements of Deep Learning Methods for Intrapartum Ultrasound Biometry from Fetal Ultrasound Videos
Authors:
Jieyun Bai,
Zihao Zhou,
Yitong Tang,
Jie Gan,
Zhuonan Liang,
Jianan Fan,
Lisa B. Mcguire,
Jillian L. Clarke,
Weidong Cai,
Jacaueline Spurway,
Yubo Tang,
Shiye Wang,
Wenda Shen,
Wangwang Yu,
Yihao Li,
Philippe Zhang,
Weili Jiang,
Yongjie Li,
Salem Muhsin Ali Binqahal Al Nasim,
Arsen Abzhanov,
Numan Saeed,
Mohammad Yaqub,
Zunhui Xian,
Hongxing Lin,
Libin Lan
, et al. (38 additional authors not shown)
Abstract:
A substantial proportion (45\%) of maternal deaths, neonatal deaths, and stillbirths occur during the intrapartum phase, with a particularly high burden in low- and middle-income countries. Intrapartum biometry plays a critical role in monitoring labor progression; however, the routine use of ultrasound in resource-limited settings is hindered by a shortage of trained sonographers. To address this…
▽ More
A substantial proportion (45\%) of maternal deaths, neonatal deaths, and stillbirths occur during the intrapartum phase, with a particularly high burden in low- and middle-income countries. Intrapartum biometry plays a critical role in monitoring labor progression; however, the routine use of ultrasound in resource-limited settings is hindered by a shortage of trained sonographers. To address this challenge, the Intrapartum Ultrasound Grand Challenge (IUGC), co-hosted with MICCAI 2024, was launched. The IUGC introduces a clinically oriented multi-task automatic measurement framework that integrates standard plane classification, fetal head-pubic symphysis segmentation, and biometry, enabling algorithms to exploit complementary task information for more accurate estimation. Furthermore, the challenge releases the largest multi-center intrapartum ultrasound video dataset to date, comprising 774 videos (68,106 frames) collected from three hospitals, providing a robust foundation for model training and evaluation. In this study, we present a comprehensive overview of the challenge design, review the submissions from eight participating teams, and analyze their methods from five perspectives: preprocessing, data augmentation, learning strategy, model architecture, and post-processing. In addition, we perform a systematic analysis of the benchmark results to identify key bottlenecks, explore potential solutions, and highlight open challenges for future research. Although encouraging performance has been achieved, our findings indicate that the field remains at an early stage, and further in-depth investigation is required before large-scale clinical deployment. All benchmark solutions and the complete dataset have been publicly released to facilitate reproducible research and promote continued advances in automatic intrapartum ultrasound biometry.
△ Less
Submitted 13 February, 2026;
originally announced February 2026.
-
Stone duality of Lawson compact algebraic L-domain
Authors:
Huijun Hou,
Ao Shen
Abstract:
In this paper, a subclass of bounded distributive lattices, that is, finitely disjunctive distributive lattices (FDD-lattices) have been introduced. Then we apply it to establish a Stone duality for Lawson compact algebraic L-domains. Furthermore, we develop a dual equivalence between the category of FDD-lattices with lattice homomorphisms and that of Lawson compact algebraic L-domains with spectr…
▽ More
In this paper, a subclass of bounded distributive lattices, that is, finitely disjunctive distributive lattices (FDD-lattices) have been introduced. Then we apply it to establish a Stone duality for Lawson compact algebraic L-domains. Furthermore, we develop a dual equivalence between the category of FDD-lattices with lattice homomorphisms and that of Lawson compact algebraic L-domains with spectral maps.
△ Less
Submitted 13 February, 2026;
originally announced February 2026.
-
Discovering Semantic Latent Structures in Psychological Scales: A Response-Free Pathway to Efficient Simplification
Authors:
Bo Wang,
Yuxuan Zhang,
Yueqin Hu,
Hanchao Hou,
Kaiping Peng,
Shiguang Ni
Abstract:
Psychological scale refinement traditionally relies on response-based methods such as factor analysis, item response theory, and network psychometrics to optimize item composition. Although rigorous, these approaches require large samples and may be constrained by data availability and cross-cultural comparability. Recent advances in natural language processing suggest that the semantic structure…
▽ More
Psychological scale refinement traditionally relies on response-based methods such as factor analysis, item response theory, and network psychometrics to optimize item composition. Although rigorous, these approaches require large samples and may be constrained by data availability and cross-cultural comparability. Recent advances in natural language processing suggest that the semantic structure of questionnaire items may encode latent construct organization, offering a complementary response-free perspective. We introduce a topic-modeling framework that operationalizes semantic latent structure for scale simplification. Items are encoded using contextual sentence embeddings and grouped via density-based clustering to discover latent semantic factors without predefining their number. Class-based term weighting derives interpretable topic representations that approximate constructs and enable merging of semantically adjacent clusters. Representative items are selected using membership criteria within an integrated reduction pipeline. We benchmarked the framework across DASS, IPIP, and EPOCH, evaluating structural recovery, internal consistency, factor congruence, correlation preservation, and reduction efficiency. The proposed method recovered coherent factor-like groupings aligned with established constructs. Selected items reduced scale length by 60.5% on average while maintaining psychometric adequacy. Simplified scales showed high concordance with original factor structures and preserved inter-factor correlations, indicating that semantic latent organization provides a response-free approximation of measurement structure. Our framework formalizes semantic structure as an inspectable front-end for scale construction and reduction. To facilitate adoption, we provide a visualization-supported tool enabling one-click semantic analysis and structured simplification.
△ Less
Submitted 8 March, 2026; v1 submitted 12 February, 2026;
originally announced February 2026.
-
EmbeddingRWKV: State-Centric Retrieval with Reusable States
Authors:
Haowen Hou,
Jie Yang
Abstract:
Current Retrieval-Augmented Generation (RAG) systems typically employ a traditional two-stage pipeline: an embedding model for initial retrieval followed by a reranker for refinement. However, this paradigm suffers from significant inefficiency due to the lack of shared information between stages, leading to substantial redundant computation. To address this limitation, we propose \textbf{State-Ce…
▽ More
Current Retrieval-Augmented Generation (RAG) systems typically employ a traditional two-stage pipeline: an embedding model for initial retrieval followed by a reranker for refinement. However, this paradigm suffers from significant inefficiency due to the lack of shared information between stages, leading to substantial redundant computation. To address this limitation, we propose \textbf{State-Centric Retrieval}, a unified retrieval paradigm that utilizes "states" as a bridge to connect embedding models and rerankers. First, we perform state representation learning by fine-tuning an RWKV-based LLM, transforming it into \textbf{EmbeddingRWKV}, a unified model that serves as both an embedding model and a state backbone for extracting compact, reusable states. Building upon these reusable states, we further design a state-based reranker to fully leverage precomputed information. During reranking, the model processes only query tokens, decoupling inference cost from document length and yielding a 5.4$\times$--44.8$\times$ speedup. Furthermore, we observe that retaining all intermediate layer states is unnecessary; with a uniform layer selection strategy, our model maintains 98.62\% of full-model performance using only 25\% of the layers. Extensive experiments demonstrate that State-Centric Retrieval achieves high-quality retrieval and reranking results while significantly enhancing overall system efficiency. Code is available at \href{https://github.com/howard-hou/EmbeddingRWKV}{our GitHub repository}.
△ Less
Submitted 9 January, 2026;
originally announced January 2026.
-
Minimum and extremal process for a branching random walk outside the boundary case
Authors:
Xinxin Chen,
Haojie Hou
Abstract:
This work extends the studies on the minimum and extremal process of a supercritical branching random walk outside the boundary case which cannot be reduced to the boundary case. We study here the situation where the log-generating function explodes at $1$ and the random walk associated to the spine possesses a stretched exponential tail with exponent $b\in(0,\frac12)$. Under suitable conditions,…
▽ More
This work extends the studies on the minimum and extremal process of a supercritical branching random walk outside the boundary case which cannot be reduced to the boundary case. We study here the situation where the log-generating function explodes at $1$ and the random walk associated to the spine possesses a stretched exponential tail with exponent $b\in(0,\frac12)$. Under suitable conditions, we confirm the conjecture of Barral, Hu and Madaule [Bernoulli 24(2) 2018 801-841], and obtain the weak convergence for the minimum and the extremal process. We also establish an a.s. infimum result over all infinity rays of this system.
△ Less
Submitted 13 January, 2026; v1 submitted 11 January, 2026;
originally announced January 2026.
-
LEAPS: An LLM-Empowered Adaptive Plugin for Taobao AI Search
Authors:
Lei Wang,
Jinhang Wu,
Zhibin Wang,
Biye Li,
Haiping Hou
Abstract:
The rapid advancement of large language models has reshaped user search cognition, driving a paradigm shift from discrete keyword-based search to high-dimensional conversational interaction. However, existing e-commerce search architectures face a critical capability deficit in adapting to this change. Users are often caught in a dilemma: precise natural language descriptions frequently trigger ze…
▽ More
The rapid advancement of large language models has reshaped user search cognition, driving a paradigm shift from discrete keyword-based search to high-dimensional conversational interaction. However, existing e-commerce search architectures face a critical capability deficit in adapting to this change. Users are often caught in a dilemma: precise natural language descriptions frequently trigger zero-result scenarios, while the forced simplification of queries leads to decision overload from noisy, generic results. To tackle this challenge, we propose LEAPS (LLM-Empowered Adaptive Plugin for Taobao AI Search), which seamlessly upgrades traditional search systems via a "Broaden-and-Refine" paradigm. Specifically, it attaches plugins to both ends of the search pipeline: (1) Upstream, a Query Expander acts as an intent translator. It employs a novel three-stage training strategy--inverse data augmentation, posterior-knowledge supervised fine-tuning, and diversity-aware reinforcement learning--to generate adaptive and complementary query combinations that maximize the candidate product set. (2) Downstream, a Relevance Verifier serves as a semantic gatekeeper. By synthesizing multi-source data (e.g., OCR text, reviews) and leveraging chain-of-thought reasoning, it precisely filters noise to resolve selection overload. Extensive offline experiments and online A/B testing demonstrate that LEAPS significantly enhances conversational search experiences. Crucially, its non-invasive architecture preserves established retrieval performance optimized for short-text queries, while simultaneously allowing for low-cost integration into diverse back-ends. Fully deployed on Taobao AI Search since August 2025, LEAPS currently serves hundreds of millions of users monthly.
△ Less
Submitted 8 January, 2026;
originally announced January 2026.
-
BioPIE: A Biomedical Protocol Information Extraction Dataset for High-Reasoning-Complexity Experiment Question Answer
Authors:
Haofei Hou,
Shunyi Zhao,
Fanxu Meng,
Kairui Yang,
Lecheng Ruan,
Qining Wang
Abstract:
Question Answer (QA) systems for biomedical experiments facilitate cross-disciplinary communication, and serve as a foundation for downstream tasks, e.g., laboratory automation. High Information Density (HID) and Multi-Step Reasoning (MSR) pose unique challenges for biomedical experimental QA. While extracting structured knowledge, e.g., Knowledge Graphs (KGs), can substantially benefit biomedical…
▽ More
Question Answer (QA) systems for biomedical experiments facilitate cross-disciplinary communication, and serve as a foundation for downstream tasks, e.g., laboratory automation. High Information Density (HID) and Multi-Step Reasoning (MSR) pose unique challenges for biomedical experimental QA. While extracting structured knowledge, e.g., Knowledge Graphs (KGs), can substantially benefit biomedical experimental QA. Existing biomedical datasets focus on general or coarsegrained knowledge and thus fail to support the fine-grained experimental reasoning demanded by HID and MSR. To address this gap, we introduce Biomedical Protocol Information Extraction Dataset (BioPIE), a dataset that provides procedure-centric KGs of experimental entities, actions, and relations at a scale that supports reasoning over biomedical experiments across protocols. We evaluate information extraction methods on BioPIE, and implement a QA system that leverages BioPIE, showcasing performance gains on test, HID, and MSR question sets, showing that the structured experimental knowledge in BioPIE underpins both AI-assisted and more autonomous biomedical experimentation.
△ Less
Submitted 7 January, 2026;
originally announced January 2026.
-
Interleaved Latent Visual Reasoning with Selective Perceptual Modeling
Authors:
Shuai Dong,
Siyuan Wang,
Xingyu Liu,
Chenglin Li,
Haowen Hou,
Zhongyu Wei
Abstract:
Interleaved reasoning paradigms enhance Multimodal Large Language Models (MLLMs) with visual feedback but are hindered by the prohibitive computational cost of re-encoding pixel-dense images. A promising alternative, latent visual reasoning, circumvents this bottleneck yet faces limitations: methods either fail to capture intermediate state evolution due to single-step, non-interleaved structures,…
▽ More
Interleaved reasoning paradigms enhance Multimodal Large Language Models (MLLMs) with visual feedback but are hindered by the prohibitive computational cost of re-encoding pixel-dense images. A promising alternative, latent visual reasoning, circumvents this bottleneck yet faces limitations: methods either fail to capture intermediate state evolution due to single-step, non-interleaved structures, or sacrifice precise perceptual modeling by over-compressing features. We introduce Interleaved Latent Visual Reasoning (ILVR), a framework that unifies dynamic state evolution with precise perceptual modeling. ILVR interleaves textual generation with latent visual representations that act as specific, evolving cues for subsequent reasoning. Specifically, we employ a self-supervision strategy where a momentum teacher model selectively distills relevant features from ground-truth intermediate images into sparse supervision targets. This adaptive selection mechanism guides the model to autonomously generate context-aware visual signals. Extensive experiments on multimodal reasoning benchmarks demonstrate that ILVR outperforms existing approaches, effectively bridging the gap between fine-grained perception and sequential multimodal reasoning. The code is available at https://github.com/XD111ds/ILVR.
△ Less
Submitted 21 January, 2026; v1 submitted 5 December, 2025;
originally announced December 2025.
-
Deep Learning-Based Joint Uplink-Downlink CSI Acquisition for Next-Generation Upper Mid-Band Systems
Authors:
Xuan He,
Hongwei Hou,
Yafei Wang,
Wenjin Wang,
Shi Jin,
Symeon Chatzinotas,
Björn Ottersten
Abstract:
In next-generation wireless communication systems, the newly designated upper mid-band has attracted considerable attention, also called frequency range 3 (FR3), highlighting the need for downlink (DL) transmission design, which fundamentally relies on accurate CSI. However, CSI acquisition in FR3 systems faces significant challenges: the increased number of antennas and wider transmission bandwid…
▽ More
In next-generation wireless communication systems, the newly designated upper mid-band has attracted considerable attention, also called frequency range 3 (FR3), highlighting the need for downlink (DL) transmission design, which fundamentally relies on accurate CSI. However, CSI acquisition in FR3 systems faces significant challenges: the increased number of antennas and wider transmission bandwidth introduces prohibitive training overhead with traditional estimation approaches, as each probing captures only incomplete spatial-frequency observation, while higher carrier frequencies lead to faster temporal channel variation. To address these challenges, we propose a novel CSI acquisition framework that integrates CSI feedback, uplink (UL) and DL channel estimation, as well as channel prediction in the FR3 TDD massive MIMO systems. Specifically, we first develop the Joint UL and DL Channel Estimation Network (JUDCEN) to fuse incomplete observations based on the SRSs and CSI-RSs. By exploiting the complementary characteristics of preliminary UL and DL estimation features, obtained through initial UL estimation and quantized-feedback-assisted DL estimation, it enables full CSI reconstruction in the spatial domain. To mitigate the performance degradation in the feedback process, we propose the Transformer-MLP CSI Feedback Network (TMCFN), employing an MLP-based module to jointly exploit angle- and delay-domain features. Building upon the reconstructed full CSI, we further develop the Mamba-based Channel Prediction Network (MCPN), which exploits selective state-space model (SSM) mechanism to capture long-range temporal dynamics in the angle-delay domain for future CSI prediction. Simulation results demonstrate that the proposed framework consistently outperforms benchmarks in both CSI acquisition accuracy and transmission spectral efficiency with lower computational complexity.
△ Less
Submitted 2 December, 2025;
originally announced December 2025.
-
VibOmni: Towards Scalable Bone-conduction Speech Enhancement on Earables
Authors:
Lixing He,
Yunqi Guo,
Haozheng Hou,
Zhenyu Yan
Abstract:
Earables, such as True Wireless Stereo earphones and VR/AR headsets, are increasingly popular, yet their compact design poses challenges for robust voice-related applications like telecommunication and voice assistant interactions in noisy environments. Existing speech enhancement systems, reliant solely on omnidirectional microphones, struggle with ambient noise like competing speakers. To addres…
▽ More
Earables, such as True Wireless Stereo earphones and VR/AR headsets, are increasingly popular, yet their compact design poses challenges for robust voice-related applications like telecommunication and voice assistant interactions in noisy environments. Existing speech enhancement systems, reliant solely on omnidirectional microphones, struggle with ambient noise like competing speakers. To address these issues, we propose VibOmni, a lightweight, end-to-end multi-modal speech enhancement system for earables that leverages bone-conducted vibrations captured by widely available Inertial Measurement Units (IMUs). VibOmni integrates a two-branch encoder-decoder deep neural network to fuse audio and vibration features. To overcome the scarcity of paired audio-vibration datasets, we introduce a novel data augmentation technique that models Bone Conduction Functions (BCFs) from limited recordings, enabling synthetic vibration data generation with only 4.5% spectrogram similarity error. Additionally, a multi-modal SNR estimator facilitates continual learning and adaptive inference, optimizing performance in dynamic, noisy settings without on-device back-propagation. Evaluated on real-world datasets from 32 volunteers with different devices, VibOmni achieves up to 21% improvement in Perceptual Evaluation of Speech Quality (PESQ), 26% in Signal-to-Noise Ratio (SNR), and about 40% WER reduction with much less latency on mobile devices. A user study with 35 participants showed 87% preferred VibOmni over baselines, demonstrating its effectiveness for deployment in diverse acoustic environments.
△ Less
Submitted 2 December, 2025;
originally announced December 2025.
-
R3A: Reliable RTL Repair Framework with Multi-Agent Fault Localization and Stochastic Tree-of-Thoughts Patch Generation
Authors:
Zizhang Luo,
Fan Cui,
Kexing Zhou,
Runlin Guo,
Mile Xia,
Hongyuan Hou,
Yun Liang
Abstract:
Repairing RTL bugs is crucial for hardware design and verification. Traditional automatic program repair (APR) methods define dedicated search spaces to locate and fix bugs with program synthesis. However, they heavily rely on fixed templates and can only deal with limited bugs. As an alternative, Large Language Models with the ability to understand code semantics can be explored for RTL repair. H…
▽ More
Repairing RTL bugs is crucial for hardware design and verification. Traditional automatic program repair (APR) methods define dedicated search spaces to locate and fix bugs with program synthesis. However, they heavily rely on fixed templates and can only deal with limited bugs. As an alternative, Large Language Models with the ability to understand code semantics can be explored for RTL repair. However, they suffer from unreliable outcomes due to inherent randomness and long input contexts of RTL code and waveform. To address these challenges, we propose R3A, an LLM-based automatic RTL program repair framework upon the basic model to improve reliability. R3A proposes the stochastic Tree-Of-Thoughts method to control a patch generation agent to explore a validated solution for the bug. The algorithm samples search states according to a heuristic function to balance between exploration and exploitation for a reliable outcome. Besides, R3A proposes a multi-agent fault localization method to find fault candidates as the starting points for the patch generation agent, further increasing the reliability. Experiments show R3A can fix 90.6% of bugs in the RTL-repair dataset within a given time limit, which covers 45% more bugs than traditional methods and other LLM-based approaches, while achieving an 86.7% pass@5 rate on average, showing a high reliability.
△ Less
Submitted 25 November, 2025; v1 submitted 25 November, 2025;
originally announced November 2025.
-
Evaluating Parametric Car-Following Models in Naturalistic Congestion: Insights in Driver Behavior and Model Limitations
Authors:
Huaidian Hou,
Arpan Kusari,
Brian T. W. Lin
Abstract:
Car-Following is a broadly studied state of driving, and many modeling approaches through various heuristics and engineering methods have been proposed. Congestion is a common traffic phenomenon also widely investigated, both from macroscopic and microscopic perspectives. Yet, current literature lack a unified evaluation of Car-Following models with naturalistic congestion data. This paper compare…
▽ More
Car-Following is a broadly studied state of driving, and many modeling approaches through various heuristics and engineering methods have been proposed. Congestion is a common traffic phenomenon also widely investigated, both from macroscopic and microscopic perspectives. Yet, current literature lack a unified evaluation of Car-Following models with naturalistic congestion data. This paper compares the performance of five parametric Car-Following models: IDM, ACC, Gipps, OVM, and FVDM, using a rich naturalistic congestion dataset. The five models in question is found to perform similarly when optimized over the same RMSNE metric. Sub-sequences of Car-Following where models noticeably disagree with driver behavior is noticed and separately investigated. A review of corresponding front-facing and cabin video data reveals distraction and driving with momentum as potential reasons for model-reality difference. We further show that drivers often employ coasting and idle creep under Car-Following in different speed ranges, which existing parametric models fail to capture. Finally, time-series clustering is performed and analysis of result clusters align with empirical findings.
Our findings highlight the necessity to consider vehicle dynamical properties including coasting and idle creep abilities, which drivers take extensive use of under low speed congestions. Future research could integrate such parameters with traditional parametric models to improve congestion modeling performance. We also suggest future research into investigating temporal correlations between clustered blocks to reveal behavioral transition patterns exhibited by drivers in congestions. Source code for this study can be found on Github.
△ Less
Submitted 22 November, 2025;
originally announced November 2025.
-
Experimental realization of a full-band wave antireflection based on temporal taper metamaterials
Authors:
Haonan Hou,
Kai Peng,
Yangkai Wang,
Jiarui Wang,
Xudong Zhang,
Ren Wang,
Hao Hu,
Jiang Xiong
Abstract:
As time can be introduced as an additional degree of freedom, temporal metamaterials nowadays open up new avenues for wave control and manipulation. Among these advancements, temporal metamaterial-based antireflection coatings have recently emerged as an innovative method that inherently avoids additional spatial insertions. However, prior temporal antireflection models with finite inserted tempor…
▽ More
As time can be introduced as an additional degree of freedom, temporal metamaterials nowadays open up new avenues for wave control and manipulation. Among these advancements, temporal metamaterial-based antireflection coatings have recently emerged as an innovative method that inherently avoids additional spatial insertions. However, prior temporal antireflection models with finite inserted temporal transition sections that rely on the destructive interference mechanism exhibit residual periodic strong reflections at high frequencies, fundamentally limiting the achievable bandwidth. In this work, the concept of "temporal taper", the temporal counterpart of a conventional spatial taper with a nearly full-band antireflection feature and good compatibility with gradual time-varying components, has been experimentally realized. A 1D temporal metamaterial base on voltage-controlled varactors has been designed experimentally validated. The temporal taper based broadband antireflection exempts the system from spatial matching insertions, and enables agile impedance matching for various terminal loads, positioning it as a promising approach in future photonic systems.
△ Less
Submitted 18 November, 2025;
originally announced November 2025.
-
Multimodal Diffusion Forcing for Forceful Manipulation
Authors:
Zixuan Huang,
Huaidian Hou,
Dmitry Berenson
Abstract:
Given a dataset of expert trajectories, standard imitation learning approaches typically learn a direct mapping from observations (e.g., RGB images) to actions. However, such methods often overlook the rich interplay between different modalities, i.e., sensory inputs, actions, and rewards, which is crucial for modeling robot behavior and understanding task outcomes. In this work, we propose Multim…
▽ More
Given a dataset of expert trajectories, standard imitation learning approaches typically learn a direct mapping from observations (e.g., RGB images) to actions. However, such methods often overlook the rich interplay between different modalities, i.e., sensory inputs, actions, and rewards, which is crucial for modeling robot behavior and understanding task outcomes. In this work, we propose Multimodal Diffusion Forcing, a unified framework for learning from multimodal robot trajectories that extends beyond action generation. Rather than modeling a fixed distribution, MDF applies random partial masking and trains a diffusion model to reconstruct the trajectory. This training objective encourages the model to learn temporal and cross-modal dependencies, such as predicting the effects of actions on force signals or inferring states from partial observations. We evaluate MDF on contact-rich, forceful manipulation tasks in simulated and real-world environments. Our results show that MDF not only delivers versatile functionalities, but also achieves strong performance, and robustness under noisy observations. More visualizations can be found on our $\href{https://unified-df.github.io}{website}$.
△ Less
Submitted 13 April, 2026; v1 submitted 6 November, 2025;
originally announced November 2025.
-
Achieving Constant-Envelope Waveform in CP-OFDMA Framework
Authors:
Yiming Zhu,
Zhuhong Zhu,
Xiaodong Xu,
Hongwei Hou,
Wenjin Wang,
Rui Ding
Abstract:
OFDM is widely adopted in modern wireless communication systems, but its power efficiency is limited by high envelope fluctuations. Although various high power-efficiency waveforms have been proposed, most are incompatible with the CP-OFDMA framework and remain ineffective in multi-user downlink transmissions. To address this issue, we propose a constant-envelope (CE) waveform design, which enable…
▽ More
OFDM is widely adopted in modern wireless communication systems, but its power efficiency is limited by high envelope fluctuations. Although various high power-efficiency waveforms have been proposed, most are incompatible with the CP-OFDMA framework and remain ineffective in multi-user downlink transmissions. To address this issue, we propose a constant-envelope (CE) waveform design, which enables low-complexity transceiver architectures while maintaining full compatibility with the prevailing CP-OFDMA framework. Specifically, we start from a general CE FDMA signal model and develop a CP-OFDMA-compatible waveform implementation structure, followed by the design of an optimized CE-constrained pulse-shaping filter to suppress out-of-band emissions. To tackle channel estimation challenge under non-flat frequency-domain pilots induced by CE modulation, we optimize the time-domain binary pilot sequence to achieve frequency-domain CE properties, and then propose a multi-stage method combining delay-domain denoising with power delay profile estimation to facilitate reduced-dimension LMMSE estimation. Subsequently, we design a low-complexity maximum ratio combining-aided LMMSE equalizer by exploiting the periodicity and conjugate symmetry of the CE received signals. To mitigate the downlink peak-to-average power ratio increase caused by FDMA, we further develop a multi-user downlink CE transmission scheme including multiple access mechanism, downlink control information design, and corresponding system-level implementation, which ensures compatibility with the New Radio standard. Numerical results demonstrate that the proposed scheme achieves bit error rate performance close to the ideal case while significantly reducing transceiver complexity compared to existing CE waveform solutions.
△ Less
Submitted 28 October, 2025;
originally announced October 2025.
-
Predictive Coding Enhances Meta-RL To Achieve Interpretable Bayes-Optimal Belief Representation Under Partial Observability
Authors:
Po-Chen Kuo,
Han Hou,
Will Dabney,
Edgar Y. Walker
Abstract:
Learning a compact representation of history is critical for planning and generalization in partially observable environments. While meta-reinforcement learning (RL) agents can attain near Bayes-optimal policies, they often fail to learn the compact, interpretable Bayes-optimal belief states. This representational inefficiency potentially limits the agent's adaptability and generalization capacity…
▽ More
Learning a compact representation of history is critical for planning and generalization in partially observable environments. While meta-reinforcement learning (RL) agents can attain near Bayes-optimal policies, they often fail to learn the compact, interpretable Bayes-optimal belief states. This representational inefficiency potentially limits the agent's adaptability and generalization capacity. Inspired by predictive coding in neuroscience--which suggests that the brain predicts sensory inputs as a neural implementation of Bayesian inference--and by auxiliary predictive objectives in deep RL, we investigate whether integrating self-supervised predictive coding modules into meta-RL can facilitate learning of Bayes-optimal representations. Through state machine simulation, we show that meta-RL with predictive modules consistently generates more interpretable representations that better approximate Bayes-optimal belief states compared to conventional meta-RL across a wide variety of tasks, even when both achieve optimal policies. In challenging tasks requiring active information seeking, only meta-RL with predictive modules successfully learns optimal representations and policies, whereas conventional meta-RL struggles with inadequate representation learning. Finally, we demonstrate that better representation learning leads to improved generalization. Our results strongly suggest the role of predictive learning as a guiding principle for effective representation learning in agents navigating partial observability.
△ Less
Submitted 24 October, 2025;
originally announced October 2025.
-
In-situ Autoguidance: Eliciting Self-Correction in Diffusion Models
Authors:
Enhao Gu,
Haolin Hou
Abstract:
The generation of high-quality, diverse, and prompt-aligned images is a central goal in image-generating diffusion models. The popular classifier-free guidance (CFG) approach improves quality and alignment at the cost of reduced variation, creating an inherent entanglement of these effects. Recent work has successfully disentangled these properties by guiding a model with a separately trained, inf…
▽ More
The generation of high-quality, diverse, and prompt-aligned images is a central goal in image-generating diffusion models. The popular classifier-free guidance (CFG) approach improves quality and alignment at the cost of reduced variation, creating an inherent entanglement of these effects. Recent work has successfully disentangled these properties by guiding a model with a separately trained, inferior counterpart; however, this solution introduces the considerable overhead of requiring an auxiliary model. We challenge this prerequisite by introducing In-situ Autoguidance, a method that elicits guidance from the model itself without any auxiliary components. Our approach dynamically generates an inferior prediction on the fly using a stochastic forward pass, reframing guidance as a form of inference-time self-correction. We demonstrate that this zero-cost approach is not only viable but also establishes a powerful new baseline for cost-efficient guidance, proving that the benefits of self-guidance can be achieved without external models.
△ Less
Submitted 20 October, 2025;
originally announced October 2025.
-
Benchmarking and Improving LLM Robustness for Personalized Generation
Authors:
Chimaobi Okite,
Naihao Deng,
Kiran Bodipati,
Huaidian Hou,
Joyce Chai,
Rada Mihalcea
Abstract:
Recent years have witnessed a growing interest in personalizing the responses of large language models (LLMs). While existing evaluations primarily focus on whether a response aligns with a user's preferences, we argue that factuality is an equally important yet often overlooked dimension. In the context of personalization, we define a model as robust if its responses are both factually accurate a…
▽ More
Recent years have witnessed a growing interest in personalizing the responses of large language models (LLMs). While existing evaluations primarily focus on whether a response aligns with a user's preferences, we argue that factuality is an equally important yet often overlooked dimension. In the context of personalization, we define a model as robust if its responses are both factually accurate and align with the user preferences. To assess this, we introduce PERG, a scalable framework for evaluating robustness in LLMs, along with a new dataset, PERGData. We evaluate fourteen models from five different model families using different prompting methods. Our findings show that current LLMs struggle with robust personalization: even the strongest models (GPT-4.1, LLaMA3-70B) fail to maintain correctness in 5% of previously successful cases without personalization, while smaller models (e.g., 7B-scale) can fail more than 20% of the time. Further analysis reveals that robustness is significantly affected by the nature of the query and the type of user preference. To mitigate these failures, we propose Pref-Aligner, a two-stage approach that improves robustness by an average of 25% across models. Our work highlights critical gaps in current evaluation practices and introduces tools and metrics to support more reliable, user-aligned LLM deployments.
△ Less
Submitted 18 September, 2025;
originally announced September 2025.
-
VideoPro: Adaptive Program Reasoning for Long Video Understanding
Authors:
Chenglin Li,
Feng Han,
Yikun Wang,
Ruilin Li,
Shuai Dong,
Haowen Hou,
Haitao Li,
Qianglong Chen,
Feng Tao,
Jingqi Tong,
Yin Zhang,
Jiaqi Wang
Abstract:
Large language models (LLMs) have shown promise in generating program workflows for visual tasks. However, previous approaches often rely on closed-source models, lack systematic reasoning, and struggle with long-form video question answering (videoQA). To address these challenges, we introduce the FS-VisPR framework, an adaptive visual program reasoning approach that balances fast reasoning for s…
▽ More
Large language models (LLMs) have shown promise in generating program workflows for visual tasks. However, previous approaches often rely on closed-source models, lack systematic reasoning, and struggle with long-form video question answering (videoQA). To address these challenges, we introduce the FS-VisPR framework, an adaptive visual program reasoning approach that balances fast reasoning for simple queries with slow reasoning for difficult ones. First, we design efficient visual modules (e.g., key clip retrieval and subtitle retrieval) to support long-form video tasks. Then, we construct a diverse and high-quality fast-slow reasoning dataset with a strong LLM to align open-source language models' ability to generate visual program workflows as FS-LLM. Next, we design a fast-slow reasoning framework with FS-LLM: Simple queries are directly solved by VideoLLMs, while difficult ones invoke visual program reasoning, motivated by human-like reasoning processes. During this process, low-confidence fast-thinking answers will trigger a second-stage slow-reasoning process, and a fallback mechanism to fast reasoning is activated if the program execution fails. Moreover, we improve visual programs through parameter search during both training and inference. By adjusting the parameters of the visual modules within the program, multiple variants are generated: during training, programs that yield correct answers are selected, while during inference, the program with the highest confidence result is applied. Experiments show that FS-VisPR improves both efficiency and reliability in visual program workflows. It achieves 50.4% accuracy on LVBench, surpassing GPT-4o, matching the performance of Qwen2.5VL-72B on VideoMME.
△ Less
Submitted 25 January, 2026; v1 submitted 22 September, 2025;
originally announced September 2025.
-
EG-MLA: Embedding-Gated Multi-head Latent Attention for Scalable and Efficient LLMs
Authors:
Zhengge Cai,
Haowen Hou
Abstract:
Reducing the key-value (KV) cache size is a crucial step toward enabling efficient inference in large language models (LLMs), especially under latency and memory constraints. While Multi-Head Attention (MHA) offers strong representational power, it incurs significant memory overhead. Recent work on Multi-head Latent Attention (MLA) mitigates this by compressing KV representations into a shared lat…
▽ More
Reducing the key-value (KV) cache size is a crucial step toward enabling efficient inference in large language models (LLMs), especially under latency and memory constraints. While Multi-Head Attention (MHA) offers strong representational power, it incurs significant memory overhead. Recent work on Multi-head Latent Attention (MLA) mitigates this by compressing KV representations into a shared latent space, achieving a better trade-off between performance and cache efficiency. While MLA already achieves significant KV cache reduction, the scope for further compression remains limited without performance loss. In this paper, we propose \textbf{Embedding-Gated Multi-head Latent Attention (EG-MLA)}, a novel extension of MLA that further reduces KV cache size while enhancing representational expressiveness. EG-MLA introduces a token-specific embedding gating mechanism applied in the latent space, enabling fine-grained modulation of compressed KV vectors with minimal additional computation. Compared to MHA, EG-MLA achieves over 91.6\% reduction in KV cache size with negligible performance degradation. Relative to MLA, EG-MLA consistently improves task accuracy across diverse reasoning benchmarks while achieving up to 59.9\% additional memory savings. Our theoretical analysis highlights how embedding gating induces implicit high-order interactions, and empirical evaluations demonstrate robust generalization across model scales and compression regimes. Notably, we successfully scale EG-MLA to over 1 billion parameters, demonstrating its practical viability for large-scale LLM deployment. These results establish EG-MLA as a memory- and compute-efficient attention mechanism that enables scalable, high-performance inference in modern LLMs.
△ Less
Submitted 20 September, 2025;
originally announced September 2025.
-
Low-Resource Fine-Tuning for Multi-Task Structured Information Extraction with a Billion-Parameter Instruction-Tuned Model
Authors:
Yu Cheng Chih,
Yong Hao Hou
Abstract:
Deploying large language models (LLMs) for structured data extraction in domains such as financial compliance reporting, legal document analytics, and multilingual knowledge base construction is often impractical for smaller teams due to the high cost of running large architectures and the difficulty of preparing large, high-quality datasets. Most recent instruction-tuning studies focus on seven-b…
▽ More
Deploying large language models (LLMs) for structured data extraction in domains such as financial compliance reporting, legal document analytics, and multilingual knowledge base construction is often impractical for smaller teams due to the high cost of running large architectures and the difficulty of preparing large, high-quality datasets. Most recent instruction-tuning studies focus on seven-billion-parameter or larger models, leaving limited evidence on whether much smaller models can work reliably under low-resource, multi-task conditions. This work presents ETLCH, a billion-parameter LLaMA-based model fine-tuned with low-rank adaptation on only a few hundred to one thousand samples per task for JSON extraction, knowledge graph extraction, and named entity recognition. Despite its small scale, ETLCH outperforms strong baselines across most evaluation metrics, with substantial gains observed even at the lowest data scale. These findings demonstrate that well-tuned small models can deliver stable and accurate structured outputs at a fraction of the computational cost, enabling cost-effective and reliable information extraction pipelines in resource-constrained environments.
△ Less
Submitted 10 September, 2025;
originally announced September 2025.
-
The changing role of cited papers over time: An analysis of highly cited papers based on a large full-text dataset
Authors:
Gege Lin,
Nees Jan van Eck,
Haiyan Hou,
Zhigang Hu
Abstract:
This paper examines how the role of cited papers evolves over time by analyzing nearly 900 highly cited papers (HCPs) published between 2000 and 2016 and the full text of over 220,000 papers citing them. We investigate multiple citation characteristics, including citation location within the full text, reference and in-text citation types, citation sentiment, and textual and bibliographic relatedn…
▽ More
This paper examines how the role of cited papers evolves over time by analyzing nearly 900 highly cited papers (HCPs) published between 2000 and 2016 and the full text of over 220,000 papers citing them. We investigate multiple citation characteristics, including citation location within the full text, reference and in-text citation types, citation sentiment, and textual and bibliographic relatedness between citing and cited papers. Our findings reveal that as HCPs age, they tend to be cited earlier in papers citing them, mentioned fewer times in the full text, and more often cited alongside other references. Citation sentiment remains predominantly neutral, while both textual and bibliographic similarity between HCPs and their citing papers decline over time. These patterns indicate a shift from direct topical and methodological engagement toward more general, background, and symbolic referencing. The findings highlight the importance to consider citation context rather than relying solely on simple citation counts. Large-scale full-text analyses such as ours can help refine measures of scientific impact and advance scholarly search and science mapping by uncovering more nuanced connections between papers.
△ Less
Submitted 4 September, 2025;
originally announced September 2025.
-
Alignment with Fill-In-the-Middle for Enhancing Code Generation
Authors:
Houxing Ren,
Zimu Lu,
Weikang Shi,
Haotian Hou,
Yunqiao Yang,
Ke Wang,
Aojun Zhou,
Junting Pan,
Mingjie Zhan,
Hongsheng Li
Abstract:
The code generation capabilities of Large Language Models (LLMs) have advanced applications like tool invocation and problem-solving. However, improving performance in code-related tasks remains challenging due to limited training data that is verifiable with accurate test cases. While Direct Preference Optimization (DPO) has shown promise, existing methods for generating test cases still face lim…
▽ More
The code generation capabilities of Large Language Models (LLMs) have advanced applications like tool invocation and problem-solving. However, improving performance in code-related tasks remains challenging due to limited training data that is verifiable with accurate test cases. While Direct Preference Optimization (DPO) has shown promise, existing methods for generating test cases still face limitations. In this paper, we propose a novel approach that splits code snippets into smaller, granular blocks, creating more diverse DPO pairs from the same test cases. Additionally, we introduce the Abstract Syntax Tree (AST) splitting and curriculum training method to enhance the DPO training. Our approach demonstrates significant improvements in code generation tasks, as validated by experiments on benchmark datasets such as HumanEval (+), MBPP (+), APPS, LiveCodeBench, and BigCodeBench. Code and data are available at https://github.com/SenseLLM/StructureCoder.
△ Less
Submitted 26 August, 2025;
originally announced August 2025.
-
On the maximal displacement of subcritical branching random walks with or without killing
Authors:
Haojie Hou,
Shuxiong Zhang
Abstract:
Consider a subcritical branching random walk $\{Z_k\}_{k\geq 0}$ with offspring distribution $\{p_k\}_{k\geq 0}$ and step size $X$. Let $M_n$ denote the rightmost position reached by $\{Z_k\}_{k\geq 0}$ up to generation $n$, and define $M := \sup_{n\geq 0} M_n$. In this paper we give asymptotics of tail probability of $M$ under optimal assumptions $\sum^{\infty}_{k=1}(k\log k) p_k<\infty$ and…
▽ More
Consider a subcritical branching random walk $\{Z_k\}_{k\geq 0}$ with offspring distribution $\{p_k\}_{k\geq 0}$ and step size $X$. Let $M_n$ denote the rightmost position reached by $\{Z_k\}_{k\geq 0}$ up to generation $n$, and define $M := \sup_{n\geq 0} M_n$. In this paper we give asymptotics of tail probability of $M$ under optimal assumptions $\sum^{\infty}_{k=1}(k\log k) p_k<\infty$ and $\mathbb{E}[Xe^{γX}]<\infty$, where $γ>0$ is a constant such that $\mathbb{E}[e^{γX}]=\frac{1}{m}$ and $m=\sum_{k=0}^\infty kp_k\in (0,1)$. Moreover, we confirm the conjecture of Neuman and Zheng [Probab. Theory Related Fields. 167 (2017) 1137--1164] by establishing the existence of a critical value $m\mathbb{E}[X e^{γX}]$ such that
\begin{align*}
\lim_{n\to\infty}e^{γcn}\mathbb{P}(M_n\geq cn)= \left\{
\begin{aligned}
&κ\in(0,1], &c\in\big(0,m\mathbb{E}[Xe^{γX}]\big);
&0, &c\in\big(m\mathbb{E}[Xe^{γX}],\infty\big),
\end{aligned}
\right.
\end{align*}
where $κ$ represents the non-zero limit. Finally, we extend these results to the maximal displacement of branching random walks with killing. Interestingly, this limit can be characterized through both the global minimum of a random walk with positive drift and the maximal displacement of the branching random walk without killing.
△ Less
Submitted 20 August, 2025;
originally announced August 2025.
-
Heterogeneous Influence Maximization in User Recommendation
Authors:
Hongru Hou,
Jiachen Sun,
Wenqing Lin,
Wendong Bi,
Xiangrong Wang,
Deqing Yang
Abstract:
User recommendation systems enhance user engagement by encouraging users to act as inviters to interact with other users (invitees), potentially fostering information propagation. Conventional recommendation methods typically focus on modeling interaction willingness. Influence-Maximization (IM) methods focus on identifying a set of users to maximize the information propagation. However, existing…
▽ More
User recommendation systems enhance user engagement by encouraging users to act as inviters to interact with other users (invitees), potentially fostering information propagation. Conventional recommendation methods typically focus on modeling interaction willingness. Influence-Maximization (IM) methods focus on identifying a set of users to maximize the information propagation. However, existing methods face two significant challenges. First, recommendation methods fail to unleash the candidates' spread capability. Second, IM methods fail to account for the willingness to interact. To solve these issues, we propose two models named HeteroIR and HeteroIM. HeteroIR provides an intuitive solution to unleash the dissemination potential of user recommendation systems. HeteroIM fills the gap between the IM method and the recommendation task, improving interaction willingness and maximizing spread coverage. The HeteroIR introduces a two-stage framework to estimate the spread profits. The HeteroIM incrementally selects the most influential invitee to recommend and rerank based on the number of reverse reachable (RR) sets containing inviters and invitees. RR set denotes a set of nodes that can reach a target via propagation. Extensive experiments show that HeteroIR and HeteroIM significantly outperform the state-of-the-art baselines with the p-value < 0.05. Furthermore, we have deployed HeteroIR and HeteroIM in Tencent's online gaming platforms and gained an 8.5\% and 10\% improvement in the online A/B test, respectively. Implementation codes are available at https://github.com/socialalgo/HIM.
△ Less
Submitted 19 August, 2025;
originally announced August 2025.
-
Law of the iterated logarithm for supercritical non-local spatial branching processes
Authors:
Haojie Hou,
Ting Yang
Abstract:
Suppose that $X=(X_{t})_{t\ge 0}$ is either a general supercritical non-local branching Markov process, or a general supercritical non-local superprocess, on a Luzin space. Here, by ``supercritical" we mean that the mean semigroup of $X$ exhibits a Perron-Frobenius type behaviour with a positive principal eigenvalue. In this paper, we study the almost sure behaviour of a family of martingales natu…
▽ More
Suppose that $X=(X_{t})_{t\ge 0}$ is either a general supercritical non-local branching Markov process, or a general supercritical non-local superprocess, on a Luzin space. Here, by ``supercritical" we mean that the mean semigroup of $X$ exhibits a Perron-Frobenius type behaviour with a positive principal eigenvalue. In this paper, we study the almost sure behaviour of a family of martingales naturally associated with the real or complex-valued eigenpairs of the mean semigroup. Under a fourth-moment condition, we establish limit theorems of the iterated logarithm type for these martingales. In particular, we discover three regimes, each resulting in different scaling factors and limits. Furthermore, we obtain a law of the iterated logarithm for the linear functional $\langle \mathrm{Re}(f),X_{t}\rangle$ where $f$ is a sum of finite terms of eigenfunctions and $\mathrm{Re}(f)$ denotes its real part. In the context of branching Markov processes, our results improve on existing literature by complementing the known results for multitype branching processes in Asmussen [Trans. Amer. Math. Soc. 231 (1) (1977) 233--248] and generalizing the recent work of Hou, Ren and Song [arXiv: 2505.12691] to allow for non-local branching mechanism and non-symmetric spatial motion. For superprocesses, as far as we know, our results are new.
△ Less
Submitted 16 September, 2025; v1 submitted 18 August, 2025;
originally announced August 2025.
-
Tensor-Structured Bayesian Channel Prediction for Upper Mid-Band XL-MIMO Systems
Authors:
Hongwei Hou,
Yafei Wang,
Xinping Yi,
Wenjin Wang,
Dirk T. M. Slock,
Shi Jin
Abstract:
The upper mid-band balances coverage and capacity for the future cellular systems and also embraces XL-MIMO systems, offering enhanced spectral and energy efficiency. However, these benefits are significantly degraded under mobility due to channel aging, and further exacerbated by the unique near-field (NF) and spatial non-stationarity (SnS) propagation in such systems. To address this challenge,…
▽ More
The upper mid-band balances coverage and capacity for the future cellular systems and also embraces XL-MIMO systems, offering enhanced spectral and energy efficiency. However, these benefits are significantly degraded under mobility due to channel aging, and further exacerbated by the unique near-field (NF) and spatial non-stationarity (SnS) propagation in such systems. To address this challenge, we propose a novel channel prediction approach that incorporates dedicated channel modeling, probabilistic representations, and Bayesian inference algorithms for this emerging scenario. Specifically, we develop tensor-structured channel models in both the spatial-frequency-temporal (SFT) and beam-delay-Doppler (BDD) domains, which leverage temporal correlations among multiple pilot symbols for channel prediction. The factor matrices of multi-linear transformations are parameterized by BDD domain grids and SnS factors, where beam domain grids are jointly determined by angles and slopes under spatial-chirp based NF representations. To enable tractable inference, we replace environment-dependent BDD domain grids with uniformly sampled ones, and introduce perturbation parameters in each domain to mitigate grid mismatch. We further propose a hybrid beam domain strategy that integrates angle-only sampling with slope hyperparameterization to avoid the computational burden of explicit slope sampling. Based on the probabilistic models, we develop tensor-structured bi-layer inference (TS-BLI) algorithm under the expectation-maximization (EM) framework, which reduces computational complexity via tensor operations by leveraging the bi-layer factor graph for approximate E-step inference and an alternating strategy with closed-form updates in the M-step. Numerical simulations based on the near-practical channel simulator demonstrate the superior channel prediction performance of the proposed algorithm.
△ Less
Submitted 11 August, 2025;
originally announced August 2025.
-
Grid-like Error-Correcting Codes for Matrix Multiplication with Better Correcting Capability
Authors:
Hao Shi,
Zhengyi Jiang,
Zhongyi Huang,
Bo Bai,
Gong Zhang,
Hanxu Hou
Abstract:
Matrix multiplication over the real field constitutes a foundational operation in the training of deep learning models, serving as a computational cornerstone for both forward and backward propagation processes. However, the presence of silent data corruption (SDC) in large-scale distributed training environments poses a significant threat to model convergence and predictive accuracy, particularly…
▽ More
Matrix multiplication over the real field constitutes a foundational operation in the training of deep learning models, serving as a computational cornerstone for both forward and backward propagation processes. However, the presence of silent data corruption (SDC) in large-scale distributed training environments poses a significant threat to model convergence and predictive accuracy, particularly when such errors manifest during matrix multiplication. Due to their transient and non-intrusive nature, these errors often evade detection, allowing them to propagate and accumulate over time, ultimately leading to substantial degradation in model performance. In this paper, we introduce a novel error-correcting coding framework specifically tailored for matrix multiplication operations. Our proposed framework is designed to detect and correct multiple computational errors that may arise during the execution of matrix products. By leveraging a grid-based structural encoding scheme, our approach enhances error localization and correction capabilities across all participating matrices, thereby significantly improving the fault tolerance of the computation. Experimental results demonstrate that our method achieves deterministic correction of up to two erroneous symbols distributed across three matrices with 100\% reliability, while incurring only a 24\% overhead in computational time on GPU architectures. Furthermore, we provide a rigorous theoretical analysis of the error-correction properties inherent to our coding scheme, establishing its correctness and robustness under well-defined fault models.
△ Less
Submitted 6 August, 2025;
originally announced August 2025.
-
Active IRS-Enabled Integrated Sensing and Communications with Extended Targets
Authors:
Yuan Fang,
Xianxin Song,
Huazhou Hou,
Ziguo Zhong,
Xianghao Yu,
Jie Xu,
Yongming Huang
Abstract:
This paper studies the active intelligent reflecting surface (IRS)-enabled integrated sensing and communications (ISAC), in which an active IRS is deployed to assist the base station (BS) in serving multiple communication users (CUs) and simultaneously sensing an \emph{extended} target at the non-line-of-sight (NLoS) area of the BS. The active IRS has the capability of amplifying the reflected sig…
▽ More
This paper studies the active intelligent reflecting surface (IRS)-enabled integrated sensing and communications (ISAC), in which an active IRS is deployed to assist the base station (BS) in serving multiple communication users (CUs) and simultaneously sensing an \emph{extended} target at the non-line-of-sight (NLoS) area of the BS. The active IRS has the capability of amplifying the reflected signals so as to overcome significant reflection path loss in NLoS communication and sensing. In particular, we derive the sensing Cramér-Rao bound (CRB) for estimating the target response matrix. Accordingly, we jointly optimize the transmit beamforming at the BS and the reflective beamforming at the active IRS to minimize the sensing CRB, subject to the signal-to-interference-plus-noise ratio (SINR) requirements at the CUs, the transmit power budgets at the BS and active IRS, as well as the power amplification gain constraints at the active IRS. The CRB minimization problem is highly non-convex and thus difficult to solve in general. To address this challenge, we first focus on two specified conditions by considering the sensing-only scenario via ignoring the SINR constraints for communications, for which the closed-form optimal transmit beamforming is derived. Then, we propose two efficient alternating optimization (AO)-based algorithms to obtain high-quality solutions for the general ISAC scenarios. Next, we analyze the inherent relationship between the power scaling at the BS and the amplification scaling at the active IRS. It is shown that the active IRS always amplifies the signal using the maximum amplification gain under practical system settings. Finally, numerical results are provided to verify the effectiveness of the proposed AO-based algorithms and the benefits of active IRS-enabled ISAC compared to its passive IRSs counterparts.
△ Less
Submitted 1 August, 2025;
originally announced August 2025.
-
FROSS: Faster-than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images
Authors:
Hao-Yu Hou,
Chun-Yi Lee,
Motoharu Sonogashira,
Yasutomo Kawanishi
Abstract:
The ability to abstract complex 3D environments into simplified and structured representations is crucial across various domains. 3D semantic scene graphs (SSGs) achieve this by representing objects as nodes and their interrelationships as edges, facilitating high-level scene understanding. Existing methods for 3D SSG generation, however, face significant challenges, including high computational d…
▽ More
The ability to abstract complex 3D environments into simplified and structured representations is crucial across various domains. 3D semantic scene graphs (SSGs) achieve this by representing objects as nodes and their interrelationships as edges, facilitating high-level scene understanding. Existing methods for 3D SSG generation, however, face significant challenges, including high computational demands and non-incremental processing that hinder their suitability for real-time open-world applications. To address this issue, we propose FROSS (Faster-than-Real-Time Online 3D Semantic Scene Graph Generation), an innovative approach for online and faster-than-real-time 3D SSG generation that leverages the direct lifting of 2D scene graphs to 3D space and represents objects as 3D Gaussian distributions. This framework eliminates the dependency on precise and computationally-intensive point cloud processing. Furthermore, we extend the Replica dataset with inter-object relationship annotations, creating the ReplicaSSG dataset for comprehensive evaluation of FROSS. The experimental results from evaluations on ReplicaSSG and 3DSSG datasets show that FROSS can achieve superior performance while operating significantly faster than prior 3D SSG generation methods. Our implementation and dataset are publicly available at https://github.com/Howardkhh/FROSS.
△ Less
Submitted 10 August, 2025; v1 submitted 26 July, 2025;
originally announced July 2025.
-
All-in-One Medical Image Restoration with Latent Diffusion-Enhanced Vector-Quantized Codebook Prior
Authors:
Haowei Chen,
Zhiwen Yang,
Haotian Hou,
Hui Zhang,
Bingzheng Wei,
Gang Zhou,
Yan Xu
Abstract:
All-in-one medical image restoration (MedIR) aims to address multiple MedIR tasks using a unified model, concurrently recovering various high-quality (HQ) medical images (e.g., MRI, CT, and PET) from low-quality (LQ) counterparts. However, all-in-one MedIR presents significant challenges due to the heterogeneity across different tasks. Each task involves distinct degradations, leading to diverse i…
▽ More
All-in-one medical image restoration (MedIR) aims to address multiple MedIR tasks using a unified model, concurrently recovering various high-quality (HQ) medical images (e.g., MRI, CT, and PET) from low-quality (LQ) counterparts. However, all-in-one MedIR presents significant challenges due to the heterogeneity across different tasks. Each task involves distinct degradations, leading to diverse information losses in LQ images. Existing methods struggle to handle these diverse information losses associated with different tasks. To address these challenges, we propose a latent diffusion-enhanced vector-quantized codebook prior and develop \textbf{DiffCode}, a novel framework leveraging this prior for all-in-one MedIR. Specifically, to compensate for diverse information losses associated with different tasks, DiffCode constructs a task-adaptive codebook bank to integrate task-specific HQ prior features across tasks, capturing a comprehensive prior. Furthermore, to enhance prior retrieval from the codebook bank, DiffCode introduces a latent diffusion strategy that utilizes the diffusion model's powerful mapping capabilities to iteratively refine the latent feature distribution, estimating more accurate HQ prior features during restoration. With the help of the task-adaptive codebook bank and latent diffusion strategy, DiffCode achieves superior performance in both quantitative metrics and visual quality across three MedIR tasks: MRI super-resolution, CT denoising, and PET synthesis.
△ Less
Submitted 26 July, 2025;
originally announced July 2025.
-
Multicolor interband solitons in microcombs
Authors:
Qing-Xin Ji,
Hanfei Hou,
Jinhao Ge,
Yan Yu,
Maodong Gao,
Warren Jin,
Joel Guo,
Lue Wu,
Peng Liu,
Avi Feshali,
Mario Paniccia,
John Bowers,
Kerry Vahala
Abstract:
In microcombs, solitons can drive non-soliton-forming modes to induce optical gain. Under specific conditions, a regenerative secondary temporal pulse coinciding in time and space with the exciting soliton pulse will form at a new spectral location. A mechanism involving Kerr-induced pulse interactions has been proposed theoretically, leading to multicolor solitons containing constituent phase-loc…
▽ More
In microcombs, solitons can drive non-soliton-forming modes to induce optical gain. Under specific conditions, a regenerative secondary temporal pulse coinciding in time and space with the exciting soliton pulse will form at a new spectral location. A mechanism involving Kerr-induced pulse interactions has been proposed theoretically, leading to multicolor solitons containing constituent phase-locked pulses. However, the occurrence of this phenomenon requires dispersion conditions that are not naturally satisfied in conventional optical microresonators. Here, we report the experimental observation of multicolor pulses from a single optical pump in a way that is closely related to the concept of multicolor solitons. The individual soliton pulses share the same repetition rate and could potentially be fully phase-locked. They are generated using interband coupling in a compound resonator.
△ Less
Submitted 23 July, 2025;
originally announced July 2025.
-
Benefit from Reference: Retrieval-Augmented Cross-modal Point Cloud Completion
Authors:
Hongye Hou,
Liu Zhan,
Yang Yang
Abstract:
Completing the whole 3D structure based on an incomplete point cloud is a challenging task, particularly when the residual point cloud lacks typical structural characteristics. Recent methods based on cross-modal learning attempt to introduce instance images to aid the structure feature learning. However, they still focus on each particular input class, limiting their generation abilities. In this…
▽ More
Completing the whole 3D structure based on an incomplete point cloud is a challenging task, particularly when the residual point cloud lacks typical structural characteristics. Recent methods based on cross-modal learning attempt to introduce instance images to aid the structure feature learning. However, they still focus on each particular input class, limiting their generation abilities. In this work, we propose a novel retrieval-augmented point cloud completion framework. The core idea is to incorporate cross-modal retrieval into completion task to learn structural prior information from similar reference samples. Specifically, we design a Structural Shared Feature Encoder (SSFE) to jointly extract cross-modal features and reconstruct reference features as priors. Benefiting from a dual-channel control gate in the encoder, relevant structural features in the reference sample are enhanced and irrelevant information interference is suppressed. In addition, we propose a Progressive Retrieval-Augmented Generator (PRAG) that employs a hierarchical feature fusion mechanism to integrate reference prior information with input features from global to local. Through extensive evaluations on multiple datasets and real-world scenes, our method shows its effectiveness in generating fine-grained point clouds, as well as its generalization capability in handling sparse data and unseen categories.
△ Less
Submitted 19 July, 2025;
originally announced July 2025.
-
Observation of wave amplification and temporal topological state in a genuine photonic time crystal
Authors:
Jiang Xiong,
Xudong Zhang,
Longji Duan,
Jiarui Wang,
Yang Long,
Haonan Hou,
Letian Yu,
Linyang Zou,
Baile Zhang
Abstract:
Photonic time crystals (PTCs) are materials whose dielectric permittivity is periodically modulated in time, giving rise to bandgaps not in energy-as in conventional photonic crystals-but in momentum, known as k-gaps. These k-gaps enable wave amplification by extracting energy from temporal modulation, offering a mechanism for coherent light generation that bypasses traditional optical gain. PTCs…
▽ More
Photonic time crystals (PTCs) are materials whose dielectric permittivity is periodically modulated in time, giving rise to bandgaps not in energy-as in conventional photonic crystals-but in momentum, known as k-gaps. These k-gaps enable wave amplification by extracting energy from temporal modulation, offering a mechanism for coherent light generation that bypasses traditional optical gain. PTCs also extend the concept of topological insulators to the time domain, inducing a temporal topological state at the mid-gap of the k-gap, characterized by the Zak phase-a topological invariant originally defined for spatial lattices. Here, we experimentally demonstrate the properties of a k gap in a genuine PTC, realized in a dynamically modulated transmission-line metamaterial. Wave amplification within the k-gap is observed, with an initial power spectrum narrowing and shifting toward the gap. To probe the mid-gaptopological state, we introduce a temporal interface separating two PTCs with distinct topological phases. The measured phase shift between time-reflected and time-refracted waves, together with the temporal confinement of the topological state, provides direct evidence of nontrivial temporal topology. By integrating kgap amplification with time-domain topological features, our work opens new avenues for light generation and manipulation in time-varying photonic materials.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
Authors:
Kaiyuan Chen,
Yixin Ren,
Yang Liu,
Xiaobo Hu,
Haotong Tian,
Tianbao Xie,
Fangfu Liu,
Haoye Zhang,
Hongzhang Liu,
Yuan Gong,
Chen Sun,
Han Hou,
Hui Yang,
James Pan,
Jianan Lou,
Jiayi Mao,
Jizheng Liu,
Jinpeng Li,
Kangyi Liu,
Kenkun Liu,
Rui Wang,
Run Li,
Tong Niu,
Wenlong Zhang,
Wenqi Yan
, et al. (8 additional authors not shown)
Abstract:
We introduce xbench, a dynamic, profession-aligned evaluation suite designed to bridge the gap between AI agent capabilities and real-world productivity. While existing benchmarks often focus on isolated technical skills, they may not accurately reflect the economic value agents deliver in professional settings. To address this, xbench targets commercially significant domains with evaluation tasks…
▽ More
We introduce xbench, a dynamic, profession-aligned evaluation suite designed to bridge the gap between AI agent capabilities and real-world productivity. While existing benchmarks often focus on isolated technical skills, they may not accurately reflect the economic value agents deliver in professional settings. To address this, xbench targets commercially significant domains with evaluation tasks defined by industry professionals. Our framework creates metrics that strongly correlate with productivity value, enables prediction of Technology-Market Fit (TMF), and facilitates tracking of product capabilities over time. As our initial implementations, we present two benchmarks: Recruitment and Marketing. For Recruitment, we collect 50 tasks from real-world headhunting business scenarios to evaluate agents' abilities in company mapping, information retrieval, and talent sourcing. For Marketing, we assess agents' ability to match influencers with advertiser needs, evaluating their performance across 50 advertiser requirements using a curated pool of 836 candidate influencers. We present initial evaluation results for leading contemporary agents, establishing a baseline for these professional domains. Our continuously updated evalsets and evaluations are available at https://xbench.org.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
DMRS-Based Uplink Channel Estimation for MU-MIMO Systems with Location-Specific SCSI Acquisition
Authors:
Jiawei Zhuang,
Hongwei Hou,
Minjie Tang,
Wenjin Wang,
Shi Jin,
Vincent K. N. Lau
Abstract:
With the growing number of users in multi-user multiple-input multiple-output (MU-MIMO) systems, demodulation reference signals (DMRSs) are efficiently multiplexed in the code domain via orthogonal cover codes (OCC) to ensure orthogonality and minimize pilot interference. In this paper, we investigate uplink DMRS-based channel estimation for MU-MIMO systems with Type II OCC pattern standardized in…
▽ More
With the growing number of users in multi-user multiple-input multiple-output (MU-MIMO) systems, demodulation reference signals (DMRSs) are efficiently multiplexed in the code domain via orthogonal cover codes (OCC) to ensure orthogonality and minimize pilot interference. In this paper, we investigate uplink DMRS-based channel estimation for MU-MIMO systems with Type II OCC pattern standardized in 3GPP Release 18, leveraging location-specific statistical channel state information (SCSI) to enhance performance. Specifically, we propose a SCSI-assisted Bayesian channel estimator (SA-BCE) based on the minimum mean square error criterion to suppress the pilot interference and noise, albeit at the cost of cubic computational complexity due to matrix inversions. To reduce this complexity while maintaining performance, we extend the scheme to a windowed version (SA-WBCE), which incorporates antenna-frequency domain windowing and beam-delay domain processing to exploit asymptotic sparsity and mitigate energy leakage in practical systems. To avoid the frequent real-time SCSI acquisition, we construct a grid-based location-specific SCSI database based on the principle of spatial consistency, and subsequently leverage the uplink received signals within each grid to extract the SCSI. Facilitated by the multilinear structure of wireless channels, we formulate the SCSI acquisition problem within each grid as a tensor decomposition problem, where the factor matrices are parameterized by the multi-path powers, delays, and angles. The computational complexity of SCSI acquisition can be significantly reduced by exploiting the Vandermonde structure of the factor matrices. Simulation results demonstrate that the proposed location-specific SCSI database construction method achieves high accuracy, while the SA-BCE and SA-WBCE significantly outperform state-of-the-art benchmarks in MU-MIMO systems.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding
Authors:
Nianbo Zeng,
Haowen Hou,
Fei Richard Yu,
Si Shi,
Ying Tiffany He
Abstract:
Despite recent advances in retrieval-augmented generation (RAG) for video understanding, effectively understanding long-form video content remains underexplored due to the vast scale and high complexity of video data. Current RAG approaches typically segment videos into fixed-length chunks, which often disrupts the continuity of contextual information and fails to capture authentic scene boundarie…
▽ More
Despite recent advances in retrieval-augmented generation (RAG) for video understanding, effectively understanding long-form video content remains underexplored due to the vast scale and high complexity of video data. Current RAG approaches typically segment videos into fixed-length chunks, which often disrupts the continuity of contextual information and fails to capture authentic scene boundaries. Inspired by the human ability to naturally organize continuous experiences into coherent scenes, we present SceneRAG, a unified framework that leverages large language models to segment videos into narrative-consistent scenes by processing ASR transcripts alongside temporal metadata. SceneRAG further sharpens these initial boundaries through lightweight heuristics and iterative correction. For each scene, the framework fuses information from both visual and textual modalities to extract entity relations and dynamically builds a knowledge graph, enabling robust multi-hop retrieval and generation that account for long-range dependencies. Experiments on the LongerVideos benchmark, featuring over 134 hours of diverse content, confirm that SceneRAG substantially outperforms prior baselines, achieving a win rate of up to 72.5 percent on generation tasks.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Reasoning Meets Personalization: Unleashing the Potential of Large Reasoning Model for Personalized Generation
Authors:
Sichun Luo,
Guanzhi Deng,
Jian Xu,
Xiaojie Zhang,
Hanxu Hou,
Linqi Song
Abstract:
Personalization is a critical task in modern intelligent systems, with applications spanning diverse domains, including interactions with large language models (LLMs). Recent advances in reasoning capabilities have significantly enhanced LLMs, enabling unprecedented performance in tasks such as mathematics and coding. However, their potential for personalization tasks remains underexplored.
In t…
▽ More
Personalization is a critical task in modern intelligent systems, with applications spanning diverse domains, including interactions with large language models (LLMs). Recent advances in reasoning capabilities have significantly enhanced LLMs, enabling unprecedented performance in tasks such as mathematics and coding. However, their potential for personalization tasks remains underexplored.
In this paper, we present the first systematic evaluation of large reasoning models (LRMs) for personalization tasks. Surprisingly, despite generating more tokens, LRMs do not consistently outperform general-purpose LLMs, especially in retrieval-intensive scenarios where their advantages diminish. Our analysis identifies three key limitations: divergent thinking, misalignment of response formats, and ineffective use of retrieved information. To address these challenges, we propose Reinforced Reasoning for Personalization (\model), a novel framework that incorporates a hierarchical reasoning thought template to guide LRMs in generating structured outputs. Additionally, we introduce a reasoning process intervention method to enforce adherence to designed reasoning patterns, enhancing alignment. We also propose a cross-referencing mechanism to ensure consistency. Extensive experiments demonstrate that our approach significantly outperforms existing techniques.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment
Authors:
Shuhao Han,
Haotian Fan,
Fangyuan Kong,
Wenjie Liao,
Chunle Guo,
Chongyi Li,
Radu Timofte,
Liang Li,
Tao Li,
Junhui Cui,
Yunqiu Wang,
Yang Tai,
Jingwei Sun,
Jianhui Sun,
Xinli Yue,
Tianyi Wang,
Huan Hou,
Junda Lu,
Xinyang Huang,
Zitang Zhou,
Zijian Zhang,
Xuhui Zheng,
Xuecheng Wu,
Chong Peng,
Xuezhi Cao
, et al. (90 additional authors not shown)
Abstract:
This paper reports on the NTIRE 2025 challenge on Text to Image (T2I) generation model quality assessment, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. The aim of this challenge is to address the fine-grained quality assessment of text-to-image generation models. This challenge evaluates text-to-image models from two aspe…
▽ More
This paper reports on the NTIRE 2025 challenge on Text to Image (T2I) generation model quality assessment, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. The aim of this challenge is to address the fine-grained quality assessment of text-to-image generation models. This challenge evaluates text-to-image models from two aspects: image-text alignment and image structural distortion detection, and is divided into the alignment track and the structural track. The alignment track uses the EvalMuse-40K, which contains around 40K AI-Generated Images (AIGIs) generated by 20 popular generative models. The alignment track has a total of 371 registered participants. A total of 1,883 submissions are received in the development phase, and 507 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. The structure track uses the EvalMuse-Structure, which contains 10,000 AI-Generated Images (AIGIs) with corresponding structural distortion mask. A total of 211 participants have registered in the structure track. A total of 1155 submissions are received in the development phase, and 487 submissions are received in the test phase. Finally, 8 participating teams submitted their models and fact sheets. Almost all methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on T2I model quality assessment.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Law of iterated logarithm for supercritical non-symmetric branching Markov process
Authors:
Haojie Hou,
Yan-Xia Ren,
Renming Song
Abstract:
Let $\{(X_t)_{t\geq 0}, \mathbb{P}_{δ_x}, x\in E\}$ be a supercritical branching Markov process (which is not necessary symmetric) on a locally compact metric measure space $(E,μ)$ with spatially dependent local branching mechanism. Under some assumptions on the semigroup of the spatial motion, we first prove law of iterated logarithm type results for $\langle f, X_t\rangle$ under the second momen…
▽ More
Let $\{(X_t)_{t\geq 0}, \mathbb{P}_{δ_x}, x\in E\}$ be a supercritical branching Markov process (which is not necessary symmetric) on a locally compact metric measure space $(E,μ)$ with spatially dependent local branching mechanism. Under some assumptions on the semigroup of the spatial motion, we first prove law of iterated logarithm type results for $\langle f, X_t\rangle$ under the second moment condition on the branching mechanism, where $f$ is a linear combination of eigenfunctions of the mean semigroup $\{T_t, t\geq0\}$ of $X$. Then we prove law of iterated logarithm type results for $\langle f, X_t\rangle$ under the fourth moment condition, where $f$ belongs to a larger class of functions.
△ Less
Submitted 11 December, 2025; v1 submitted 19 May, 2025;
originally announced May 2025.
-
On well-posedness for non-autonomous parabolic Cauchy problems with rough initial data
Authors:
Hedong Hou
Abstract:
We establish a complete picture for existence, uniqueness, and representation of weak solutions to non-autonomous parabolic Cauchy problems of divergence type. The coefficients are only assumed to be uniformly elliptic, bounded, measurable, and complex-valued, without any additional regularity or symmetry conditions. The initial data are tempered distributions taken in homogeneous Hardy--Sobolev s…
▽ More
We establish a complete picture for existence, uniqueness, and representation of weak solutions to non-autonomous parabolic Cauchy problems of divergence type. The coefficients are only assumed to be uniformly elliptic, bounded, measurable, and complex-valued, without any additional regularity or symmetry conditions. The initial data are tempered distributions taken in homogeneous Hardy--Sobolev spaces $\dot{H}^{s,p}$, and source terms belong to certain scales of weighted tent spaces. Weak solutions are constructed with their gradients in weighted tent spaces $T^{p}_{s/2}$. Analogous results are also exhibited for initial data in homogeneous Besov spaces $\dot{B}^{s}_{p,p}$.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.