Skip to main content

Showing 1–50 of 643 results for author: Wei, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.16969  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

    Authors: Wanghan Xu, Yuhao Zhou, Yifan Zhou, Qinglong Cao, Shuo Li, Jia Bu, Bo Liu, Yixin Chen, Xuming He, Xiangyu Zhao, Xiang Zhuang, Fengxiang Wang, Zhiwang Zhou, Qiantai Feng, Wenxuan Huang, Jiaqi Wei, Hao Wu, Yuejin Yang, Guangshuai Wang, Sheng Xu, Ziyan Huang, Xinyao Liu, Jiyao Liu, Cheng Tang, Wei Li , et al. (82 additional authors not shown)

    Abstract: Despite advances in scientific AI, a coherent framework for Scientific General Intelligence (SGI)-the ability to autonomously conceive, investigate, and reason across scientific domains-remains lacking. We present an operational SGI definition grounded in the Practical Inquiry Model (PIM: Deliberation, Conception, Action, Perception) and operationalize it via four scientist-aligned tasks: deep res… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

  2. arXiv:2512.12272  [pdf, ps, other

    q-bio.QM cs.AI

    Accurate de novo sequencing of the modified proteome with OmniNovo

    Authors: Yuhan Chen, Shang Qu, Zhiqiang Gao, Yuejin Yang, Xiang Zhang, Sheng Xu, Xinjie Mao, Liujia Qian, Jiaqi Wei, Zijie Qiu, Chenyu You, Lei Bai, Ning Ding, Tiannan Guo, Bowen Zhou, Siqi Sun

    Abstract: Post-translational modifications (PTMs) serve as a dynamic chemical language regulating protein function, yet current proteomic methods remain blind to a vast portion of the modified proteome. Standard database search algorithms suffer from a combinatorial explosion of search spaces, limiting the identification of uncharacterized or complex modifications. Here we introduce OmniNovo, a unified deep… ▽ More

    Submitted 13 December, 2025; originally announced December 2025.

  3. arXiv:2512.06835  [pdf, ps, other

    cs.AI

    Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning

    Authors: Tingyu Li, Zheng Sun, Jingxuan Wei, Siyuan Li, Conghui He, Lijun Wu, Cheng Tan

    Abstract: Recent vision-language models (VLMs) achieve remarkable reasoning through reinforcement learning (RL), which provides a feasible solution for realizing continuous self-evolving large vision-language models (LVLMs) in the era of experience. However, RL for VLMs requires abundant high-quality multimodal data, especially challenging in specialized domains like chemistry, earth sciences, and multimoda… ▽ More

    Submitted 7 December, 2025; originally announced December 2025.

    Comments: 25 pages, 5 figures

  4. arXiv:2512.06443  [pdf, ps, other

    cs.DC cs.AI

    Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices

    Authors: Xiangyu Li, Chengyu Yin, Weijun Wang, Jianyu Wei, Ting Cao, Yunxin Liu

    Abstract: Large language models (LLMs) are increasingly deployed on edge devices. To meet strict resource constraints, real-world deployment has pushed LLM quantization from 8-bit to 4-bit, 2-bit, and now 1.58-bit. Combined with lookup table (LUT)-based inference, CPUs run these ultra-low-bit LLMs even faster than NPUs, opening new opportunities for ubiquitous on-device intelligence. However, this paper i… ▽ More

    Submitted 6 December, 2025; originally announced December 2025.

    Comments: Preprint

  5. arXiv:2512.06018  [pdf, ps, other

    cs.CY cs.AI

    Uncovering Students' Inquiry Patterns in GenAI-Supported Clinical Practice: An Integration of Epistemic Network Analysis and Sequential Pattern Mining

    Authors: Jiameng Wei, Dinh Dang, Kaixun Yang, Emily Stokes, Amna Mazeh, Angelina Lim, David Wei Dai, Joel Moore, Yizhou Fan, Danijela Gasevic, Dragan Gasevic, Guanliang Chen

    Abstract: Assessment of medication history-taking has traditionally relied on human observation, limiting scalability and detailed performance data. While Generative AI (GenAI) platforms enable extensive data collection and learning analytics provide powerful methods for analyzing educational traces, these approaches remain largely underexplored in pharmacy clinical training. This study addresses this gap b… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

  6. arXiv:2512.04751  [pdf, ps, other

    cs.CE

    NAWOA-XGBoost: A Novel Model for Early Prediction of Academic Potential in Computer Science Students

    Authors: Junhao Wei, Yanzhao Gu, Ran Zhang, Mingjing Huang, Jinhong Song, Yanxiao Li, Wenxuan Zhu, Yapeng Wang, Zikun Li, Zhiwen Wang, Xu Yang, Ngai Cheong

    Abstract: Whale Optimization Algorithm (WOA) suffers from limited global search ability, slow convergence, and tendency to fall into local optima, restricting its effectiveness in hyperparameter optimization for machine learning models. To address these issues, this study proposes a Nonlinear Adaptive Whale Optimization Algorithm (NAWOA), which integrates strategies such as Good Nodes Set initialization, Le… ▽ More

    Submitted 5 December, 2025; v1 submitted 4 December, 2025; originally announced December 2025.

  7. arXiv:2512.04082  [pdf, ps, other

    cs.CV

    PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

    Authors: Jiazhe Wei, Ken Li, Tianyu Lao, Haofan Wang, Liang Wang, Caifeng Shan, Chenyang Si

    Abstract: Graphic design forms the cornerstone of modern visual communication, serving as a vital medium for promoting cultural and commercial events. Recent advances have explored automating this process using Large Multimodal Models (LMMs), yet existing methods often produce geometrically inaccurate layouts and lack the iterative, layer-specific editing required in professional workflows. To address these… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

    Comments: Project page: https://postercopilot.github.io/

  8. arXiv:2512.02794  [pdf, ps, other

    cs.CV

    PhyCustom: Towards Realistic Physical Customization in Text-to-Image Generation

    Authors: Fan Wu, Cheng Chen, Zhoujie Fu, Jiacheng Wei, Yi Xu, Deheng Ye, Guosheng Lin

    Abstract: Recent diffusion-based text-to-image customization methods have achieved significant success in understanding concrete concepts to control generation processes, such as styles and shapes. However, few efforts dive into the realistic yet challenging customization of physical concepts. The core limitation of current methods arises from the absence of explicitly introducing physical knowledge during… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

    Comments: codes:https://github.com/wufan-cse/PhyCustom

  9. arXiv:2512.02793  [pdf, ps, other

    cs.CV

    IC-World: In-Context Generation for Shared World Modeling

    Authors: Fan Wu, Jiacheng Wei, Ruibo Li, Yi Xu, Junyou Li, Deheng Ye, Guosheng Lin

    Abstract: Video-based world models have recently garnered increasing attention for their ability to synthesize diverse and dynamic visual environments. In this paper, we focus on shared world modeling, where a model generates multiple videos from a set of input images, each representing the same underlying world in different camera poses. We propose IC-World, a novel generation framework, enabling parallel… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

    Comments: codes:https://github.com/wufan-cse/IC-World

  10. arXiv:2512.01274  [pdf, ps, other

    cs.CL cs.AI cs.LG

    SUPERChem: A Multimodal Reasoning Benchmark in Chemistry

    Authors: Zehua Zhao, Zhixian Huang, Junren Li, Siyu Lin, Junting Zhou, Fengqi Cao, Kun Zhou, Rui Ge, Tingting Long, Yuexiang Zhu, Yan Liu, Jie Zheng, Junnian Wei, Rong Zhu, Peng Zou, Wenyu Li, Zekai Cheng, Tian Ding, Yaxuan Wang, Yizhao Yan, Tingru Wei, Haowei Ming, Weijie Mao, Chen Sun, Yiming Liu , et al. (6 additional authors not shown)

    Abstract: Current benchmarks for evaluating the chemical reasoning capabilities of Large Language Models (LLMs) are limited by oversimplified tasks, lack of process-level evaluation, and misalignment with expert-level chemistry skills. To address these issues, we introduce SUPERChem, a benchmark of 500 expert-curated reasoning-intensive chemistry problems, covering diverse subfields and provided in both mul… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

    Comments: 35 pages, 11 figures, 5 tables

  11. arXiv:2511.22853  [pdf, ps, other

    cs.LG

    TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE

    Authors: Jiawen Wei, Lan Jiang, Pengbo Wei, Ziwen Ye, Teng Song, Chen Chen, Guangrui Ma

    Abstract: Time series data is ubiquitous, with forecasting applications spanning from finance to healthcare. Beyond popular deterministic methods, generative models are gaining attention due to advancements in areas like image synthesis and video generation, as well as their inherent ability to provide probabilistic predictions. However, existing generative approaches mostly involve recurrent generative ope… ▽ More

    Submitted 27 November, 2025; originally announced November 2025.

  12. arXiv:2511.22017  [pdf, ps, other

    cs.CR

    POLARIS: Cross-Domain Access Control via Verifiable Identity and Policy-Based Authorization

    Authors: Aiyao Zhang, Xiaodong Lee, Zhixian Zhuang, Jiuqi Wei, Yufan Fu, Botao Peng

    Abstract: Access control is a security mechanism designed to ensure that only authorized users can access specific resources. Cross-domain access control involves access to resources across different organizations, institutions, or applications. Traditional access control, however, which handles authentication and authorization separately in centralized environments, faces challenges in identity dispersion,… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

  13. arXiv:2511.20635  [pdf, ps, other

    cs.CV

    iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation

    Authors: Zhoujie Fu, Xianfang Zeng, Jinghong Lan, Xinyao Liao, Cheng Chen, Junyi Chen, Jiacheng Wei, Wei Cheng, Shiyu Liu, Yunuo Chen, Gang Yu, Guosheng Lin

    Abstract: Pre-trained video models learn powerful priors for generating high-quality, temporally coherent content. While these models excel at temporal coherence, their dynamics are often constrained by the continuous nature of their training data. We hypothesize that by injecting the rich and unconstrained content diversity from image data into this coherent temporal framework, we can generate image sets t… ▽ More

    Submitted 1 December, 2025; v1 submitted 25 November, 2025; originally announced November 2025.

    Comments: Our homepage: https://kr1sjfu.github.io/iMontage-web/

  14. arXiv:2511.19913  [pdf

    cs.CV

    Coupled Physics-Gated Adaptation: Spatially Decoding Volumetric Photochemical Conversion in Complex 3D-Printed Objects

    Authors: Maryam Eftekharifar, Churun Zhang, Jialiang Wei, Xudong Cao, Hossein Heidari

    Abstract: We present a framework that pioneers the prediction of photochemical conversion in complex three-dimensionally printed objects, introducing a challenging new computer vision task: predicting dense, non-visual volumetric physical properties from 3D visual data. This approach leverages the largest-ever optically printed 3D specimen dataset, comprising a large family of parametrically designed comple… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

  15. arXiv:2511.19261  [pdf, ps, other

    cs.CV

    LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models

    Authors: Shuai Wang, Daoan Zhang, Tianyi Bai, Shitong Shao, Jiebo Luo, Jiaheng Wei

    Abstract: Humans can perceive and understand 3D space and long videos from sequential visual observations. But do vision-language models (VLMs) can? Recent work demonstrates that even state-of-the-art VLMs still struggle to understand 3D space and long videos, although they are powerful in typical vision-language tasks. Current methods often rely on specialized architectural designs to improve performance f… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

  16. arXiv:2511.18977  [pdf, ps, other

    cs.LG cs.AI

    FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning

    Authors: Xin Yuan, Siqi Li, Jiateng Wei, Chengrui Zhu, Yanming Wu, Qingpeng Li, Jiajun Lv, Xiaoke Lan, Jun Chen, Yong Liu

    Abstract: Pruning is an effective method for compressing Large Language Models, but finding an optimal, non-uniform layer-wise sparsity allocation remains a key challenge. While heuristic methods are fast but yield suboptimal performance, more powerful search-based approaches like Reinforcement Learning are often hindered by prohibitive computational costs on large-scale models. To overcome this efficiency… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

    Comments: 5 pages, 2 figures, 4 tables

    ACM Class: I.2.7; I.2.6

  17. arXiv:2511.18723  [pdf, ps, other

    cs.AI cs.DC math.OC

    N2N: A Parallel Framework for Large-Scale MILP under Distributed Memory

    Authors: Longfei Wang, Junyan Liu, Fan Zhang, Jiangwen Wei, Yuanhua Tang, Jie Sun, Xiaodong Luo

    Abstract: Parallelization has emerged as a promising approach for accelerating MILP solving. However, the complexity of the branch-and-bound (B&B) framework and the numerous effective algorithm components in MILP solvers make it difficult to parallelize. In this study, a scalable parallel framework, N2N (a node-to-node framework that maps the B&B nodes to distributed computing nodes), was proposed to solve… ▽ More

    Submitted 18 December, 2025; v1 submitted 23 November, 2025; originally announced November 2025.

    Comments: 18 pages, 2 figures; the affiliation of some authors is updated in this version

    ACM Class: I.2.8; D.1.3

  18. arXiv:2511.17392  [pdf, ps, other

    cs.CV

    MorphSeek: Fine-grained Latent Representation-Level Policy Optimization for Deformable Image Registration

    Authors: Runxun Zhang, Yizhou Liu, Li Dongrui, Bo XU, Jingwei Wei

    Abstract: Deformable image registration (DIR) remains a fundamental yet challenging problem in medical image analysis, largely due to the prohibitively high-dimensional deformation space of dense displacement fields and the scarcity of voxel-level supervision. Existing reinforcement learning frameworks often project this space into coarse, low-dimensional representations, limiting their ability to capture s… ▽ More

    Submitted 21 November, 2025; originally announced November 2025.

  19. arXiv:2511.15970  [pdf, ps, other

    cs.CE

    An Enhanced Whale Optimization Algorithm with Log-Normal Distribution for Optimizing Coverage of Wireless Sensor Networks

    Authors: Junhao Wei, Yanzhao Gu, Ran Zhang, Yanxiao Li, Wenxuan Zhu, Jinhong Song, Yapeng Wang, Xu Yang, Ngai Cheong

    Abstract: Wireless Sensor Networks (WSNs) are essential for monitoring and communication in complex environments, where coverage optimization directly affects performance and energy efficiency. However, traditional algorithms such as the Whale Optimization Algorithm (WOA) often suffer from limited exploration and premature convergence. To overcome these issues, this paper proposes an enhanced WOA which is c… ▽ More

    Submitted 2 December, 2025; v1 submitted 19 November, 2025; originally announced November 2025.

  20. arXiv:2511.13712  [pdf, ps, other

    cs.LG cs.AI

    From Black Box to Insight: Explainable AI for Extreme Event Preparedness

    Authors: Kiana Vu, İsmet Selçuk Özer, Phung Lai, Zheng Wu, Thilanka Munasinghe, Jennifer Wei

    Abstract: As climate change accelerates the frequency and severity of extreme events such as wildfires, the need for accurate, explainable, and actionable forecasting becomes increasingly urgent. While artificial intelligence (AI) models have shown promise in predicting such events, their adoption in real-world decision-making remains limited due to their black-box nature, which limits trust, explainability… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  21. arXiv:2511.12034  [pdf, ps, other

    cs.CV cs.LG cs.MM

    Calibrated Multimodal Representation Learning with Missing Modalities

    Authors: Xiaohao Liu, Xiaobo Xia, Jiaheng Wei, Shuo Yang, Xiu Su, See-Kiong Ng, Tat-Seng Chua

    Abstract: Multimodal representation learning harmonizes distinct modalities by aligning them into a unified latent space. Recent research generalizes traditional cross-modal alignment to produce enhanced multimodal synergy but requires all modalities to be present for a common instance, making it challenging to utilize prevalent datasets with missing modalities. We provide theoretical insights into this iss… ▽ More

    Submitted 15 November, 2025; originally announced November 2025.

  22. arXiv:2511.11438  [pdf, ps, other

    cs.CV

    VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models

    Authors: Mingjie Xu, Jinpeng Chen, Yuzhi Zhao, Jason Chun Lok Li, Yue Qiu, Zekang Du, Mengyang Wu, Pingping Zhang, Kun Li, Hongzheng Yang, Wenao Ma, Jiaheng Wei, Qinbin Li, Kangcheng Liu, Wenqiang Lei

    Abstract: Multimodal large language models (MLLMs) have enabled a wide range of advanced vision-language applications, including fine-grained object recognition and contextual understanding. When querying specific regions or objects in an image, human users naturally use "visual prompts" (VPs), such as bounding boxes, to provide reference. However, no existing benchmark systematically evaluates the ability… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

    Comments: This is the extended version of the paper accepted at AAAI 2026, which includes all technical appendices and additional experimental details

  23. arXiv:2511.11248  [pdf, ps, other

    cs.AR

    T-MAN: Enabling End-to-End Low-Bit LLM Inference on NPUs via Unified Table Lookup

    Authors: Jianyu Wei, Qingtao Li, Shijie Cao, Lingxiao Ma, Zixu Hao, Yanyong Zhang, Xiaoyan Hu, Ting Cao

    Abstract: Large language models (LLMs) are increasingly deployed on customer devices. To support them, current devices are adopting SoCs (System on Chip) with NPUs (Neural Processing Unit) installed. Although high performance is expected, LLM inference on NPUs is slower than its CPU counterpart. The reason is that NPUs have poor performance on computations other than GEMM, like dequantization. Current works… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

  24. arXiv:2511.11134  [pdf, ps, other

    cs.AI

    GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

    Authors: Jingxuan Wei, Caijun Jia, Xi Bai, Xinglong Xu, Siyuan Li, Linzhuang Sun, Bihui Yu, Conghui He, Lijun Wu, Cheng Tan

    Abstract: The advent of Unified Multimodal Models (UMMs) signals a paradigm shift in artificial intelligence, moving from passive perception to active, cross-modal generation. Despite their unprecedented ability to synthesize information, a critical gap persists in evaluation: existing benchmarks primarily assess discriminative understanding or unconstrained image generation separately, failing to measure t… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

    Comments: 35 pages, 22 figures

  25. arXiv:2511.10088  [pdf, ps, other

    cs.LG cs.AI cs.CV

    eXIAA: eXplainable Injections for Adversarial Attack

    Authors: Leonardo Pesce, Jiawen Wei, Gianmarco Mengaldo

    Abstract: Post-hoc explainability methods are a subset of Machine Learning (ML) that aim to provide a reason for why a model behaves in a certain way. In this paper, we show a new black-box model-agnostic adversarial attack for post-hoc explainable Artificial Intelligence (XAI), particularly in the image domain. The goal of the attack is to modify the original explanations while being undetected by the huma… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

  26. arXiv:2511.09970  [pdf, ps, other

    cs.LG cs.AI

    MultiTab: A Scalable Foundation for Multitask Learning on Tabular Data

    Authors: Dimitrios Sinodinos, Jack Yi Wei, Narges Armanfard

    Abstract: Tabular data is the most abundant data type in the world, powering systems in finance, healthcare, e-commerce, and beyond. As tabular datasets grow and span multiple related targets, there is an increasing need to exploit shared task information for improved multitask generalization. Multitask learning (MTL) has emerged as a powerful way to improve generalization and efficiency, yet most existing… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

    Comments: Accepted for publication at AAAI 2026

  27. arXiv:2511.09961  [pdf, ps, other

    cs.OS

    Vmem: A Lightweight Hot-Upgradable Memory Management for In-production Cloud Environment

    Authors: Hao Zheng, Qiang Wang, Longxiang Wang, Xishi Qiu, Yibin Shen, Xiaoshe Dong, Naixuan Guan, Jia Wei, Fudong Qiu, Xingjun Zhang, Yun Xu, Mao Zhao, Yisheng Xie, Shenglong Zhao, Min He, Yu Li, Xiao Zheng, Ben Luo, Jiesheng Wu

    Abstract: Traditional memory management suffers from metadata overhead, architectural complexity, and stability degradation, problems intensified in cloud environments. Existing software/hardware optimizations are insufficient for cloud computing's dual demands of flexibility and low overhead. This paper presents Vmem, a memory management architecture for in-production cloud environments that enables flexib… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

  28. arXiv:2511.09936  [pdf, ps, other

    cs.OS

    Taiji: A DPU Memory Elasticity Solution for In-production Cloud Environments

    Authors: Hao Zheng, Longxiang Wang, Yun Xu, Qiang Wang, Yibin Shen, Xiaoshe Dong, Bang Di, Jia Wei, Shenyu Dong, Xingjun Zhang, Weichen Chen, Zhao Han, Sanqian Zhao, Dongdong Huang, Jie Qi, Yifan Yang, Zhao Gao, Yi Wang, Jinhu Li, Xudong Ren, Min He, Hang Yang, Xiao Zheng, Haijiao Hao, Jiesheng Wu

    Abstract: The growth of cloud computing drives data centers toward higher density and efficiency. Data processing units (DPUs) enhance server network and storage performance but face challenges such as long hardware upgrade cycles and limited resources. To address these, we propose Taiji, a resource-elasticity architecture for DPUs. Combining hybrid virtualization with parallel memory swapping, Taiji switch… ▽ More

    Submitted 14 November, 2025; v1 submitted 12 November, 2025; originally announced November 2025.

  29. arXiv:2511.04722  [pdf, ps, other

    cs.LG

    AWEMixer: Adaptive Wavelet-Enhanced Mixer Network for Long-Term Time Series Forecasting

    Authors: Qianyang Li, Xingjun Zhang, Peng Tao, Shaoxun Wang, Yancheng Pan, Jia Wei

    Abstract: Forecasting long-term time series in IoT environments remains a significant challenge due to the non-stationary and multi-scale characteristics of sensor signals. Furthermore, error accumulation causes a decrease in forecast quality when predicting further into the future. Traditional methods are restricted to operate in time-domain, while the global frequency information achieved by Fourier trans… ▽ More

    Submitted 6 November, 2025; originally announced November 2025.

  30. arXiv:2511.00537  [pdf, ps, other

    cs.CL cs.LG

    Multi-refined Feature Enhanced Sentiment Analysis Using Contextual Instruction

    Authors: Peter Atandoh, Jie Zou, Weikang Guo, Jiwei Wei, Zheng Wang

    Abstract: Sentiment analysis using deep learning and pre-trained language models (PLMs) has gained significant traction due to their ability to capture rich contextual representations. However, existing approaches often underperform in scenarios involving nuanced emotional cues, domain shifts, and imbalanced sentiment distributions. We argue that these limitations stem from inadequate semantic grounding, po… ▽ More

    Submitted 4 November, 2025; v1 submitted 1 November, 2025; originally announced November 2025.

  31. arXiv:2510.26495  [pdf, ps, other

    cs.DB cs.CL

    Rethinking Text-to-SQL: Dynamic Multi-turn SQL Interaction for Real-world Database Exploration

    Authors: Linzhuang Sun, Tianyu Guo, Hao Liang, Yuying Li, Qifeng Cai, Jingxuan Wei, Bihui Yu, Wentao Zhang, Bin Cui

    Abstract: Recent advances in Text-to-SQL have achieved strong results in static, single-turn tasks, where models generate SQL queries from natural language questions. However, these systems fall short in real-world interactive scenarios, where user intents evolve and queries must be refined over multiple turns. In applications such as finance and business analytics, users iteratively adjust query constraint… ▽ More

    Submitted 13 November, 2025; v1 submitted 30 October, 2025; originally announced October 2025.

  32. arXiv:2510.24345  [pdf, ps, other

    cs.CL cs.AI

    LongWeave: A Long-Form Generation Benchmark Bridging Real-World Relevance and Verifiability

    Authors: Zikai Xiao, Fei Huang, Jianhong Tu, Jianhui Wei, Wen Ma, Yuxuan Zhou, Jian Wu, Bowen Yu, Zuozhu Liu, Junyang Lin

    Abstract: Generating long, informative, and factual outputs remains a major challenge for Large Language Models (LLMs). Existing benchmarks for long-form generation typically assess real-world queries with hard-to-verify metrics or use synthetic setups that ease evaluation but overlook real-world intricacies. In this paper, we introduce \textbf{LongWeave}, which balances real-world and verifiable assessment… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

    Comments: EMNLP Findings 2025

  33. arXiv:2510.23160  [pdf, ps, other

    cs.CL

    ENTP: Enhancing Low-Quality SFT Data via Neural-Symbolic Text Purge-Mix

    Authors: Zile Yang, Ling Li, Na Di, Jinlong Pang, Yao Zhou, Hao Cheng, Bo Han, Jiaheng Wei

    Abstract: Supervised Fine-Tuning (SFT) adapts pre-trained Large Language Models (LLMs) to domain-specific instructions by training on a carefully curated subset of high-quality instruction-response pairs, typically drawn from a larger dataset that often contains many low-quality or noisy samples. However, existing quality-first paradigms often overlook valuable signals in discarded low-quality data and rely… ▽ More

    Submitted 27 October, 2025; originally announced October 2025.

  34. arXiv:2510.22535  [pdf, ps, other

    cs.AI cs.CL

    OFFSIDE: Benchmarking Unlearning Misinformation in Multimodal Large Language Models

    Authors: Hao Zheng, Zirui Pang, Ling li, Zhijie Deng, Yuhan Pu, Zhaowei Zhu, Xiaobo Xia, Jiaheng Wei

    Abstract: Advances in Multimodal Large Language Models (MLLMs) intensify concerns about data privacy, making Machine Unlearning (MU), the selective removal of learned information, a critical necessity. However, existing MU benchmarks for MLLMs are limited by a lack of image diversity, potential inaccuracies, and insufficient evaluation scenarios, which fail to capture the complexity of real-world applicatio… ▽ More

    Submitted 26 October, 2025; originally announced October 2025.

  35. arXiv:2510.22376  [pdf, ps, other

    cs.LG cs.CL

    Label Smoothing Improves Gradient Ascent in LLM Unlearning

    Authors: Zirui Pang, Hao Zheng, Zhijie Deng, Ling Li, Zixin Zhong, Jiaheng Wei

    Abstract: LLM unlearning has emerged as a promising approach, aiming to enable models to forget hazardous/undesired knowledge at low cost while preserving as much model utility as possible. Among existing techniques, the most straightforward method is performing Gradient Ascent (GA) w.r.t. the forget data, thereby forcing the model to unlearn the forget dataset. However, GA suffers from severe instability,… ▽ More

    Submitted 25 October, 2025; originally announced October 2025.

  36. arXiv:2510.22115  [pdf, ps, other

    cs.CL cs.AI

    Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

    Authors: Ling Team, Ang Li, Ben Liu, Binbin Hu, Bing Li, Bingwei Zeng, Borui Ye, Caizhi Tang, Changxin Tian, Chao Huang, Chao Zhang, Chen Qian, Chenchen Ju, Chenchen Li, Chengfu Tang, Chilin Fu, Chunshao Ren, Chunwei Wu, Cong Zhang, Cunyin Peng, Dafeng Xu, Daixin Wang, Dalong Zhang, Dingnan Jin, Dingyuan Zhu , et al. (117 additional authors not shown)

    Abstract: We introduce Ling 2.0, a series reasoning-oriented language foundation built upon the principle that every activation boosts reasoning capability. Designed to scale from tens of billions to one trillion parameters under a unified Mixture-of-Experts (MoE) paradigm, Ling 2.0 emphasizes high sparsity, cross-scale consistency, and efficiency guided by empirical scaling laws. The series includes three… ▽ More

    Submitted 6 November, 2025; v1 submitted 24 October, 2025; originally announced October 2025.

    Comments: Ling 2.0 Technical Report

  37. SpecTokenizer: A Lightweight Streaming Codec in the Compressed Spectrum Domain

    Authors: Zixiang Wan, Guochang Zhang, Yifeng He, Jianqiang Wei

    Abstract: Neural Audio Codecs (NACs) have gained growing attention in recent years as technologies for audio compression and audio representation in speech language models. While mainstream NACs typically require G-level computation and M-level parameters, the performance of lightweight and streaming NACs remains underexplored. This paper proposes SpecTokenizer, a lightweight streaming codec that operates i… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

    Comments: Accepted by Interspeech 2025; 5 pages, 1 figure, 5 tables

  38. arXiv:2510.21196  [pdf, ps, other

    eess.AS cs.SD

    PhoenixCodec: Taming Neural Speech Coding for Extreme Low-Resource Scenarios

    Authors: Zixiang Wan, Haoran Zhao, Guochang Zhang, Runqiang Han, Jianqiang Wei, Yuexian Zou

    Abstract: This paper presents PhoenixCodec, a comprehensive neural speech coding and decoding framework designed for extremely low-resource conditions. The proposed system integrates an optimized asymmetric frequency-time architecture, a Cyclical Calibration and Refinement (CCR) training strategy, and a noise-invariant fine-tuning procedure. Under stringent constraints - computation below 700 MFLOPs, latenc… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

    Comments: 5 pages, 1 figure, 4 tables

  39. arXiv:2510.20498  [pdf, ps, other

    cs.CL

    Robust Preference Alignment via Directional Neighborhood Consensus

    Authors: Ruochen Mao, Yuling Shi, Xiaodong Gu, Jiaheng Wei

    Abstract: Aligning large language models with human preferences is critical for creating reliable and controllable AI systems. A human preference can be visualized as a high-dimensional vector where different directions represent trade-offs between desired attributes (e.g., helpfulness vs. verbosity). Yet, because the training data often reflects dominant, average preferences, LLMs tend to perform well on c… ▽ More

    Submitted 23 October, 2025; v1 submitted 23 October, 2025; originally announced October 2025.

  40. arXiv:2510.20449  [pdf, ps, other

    cs.CL

    LM-mixup: Text Data Augmentation via Language Model based Mixup

    Authors: Zhijie Deng, Zhouan Shen, Ling Li, Yao Zhou, Zhaowei Zhu, Yanji He, Wei Wang, Jiaheng Wei

    Abstract: Instruction tuning is crucial for aligning Large Language Models (LLMs), yet the quality of instruction-following data varies significantly. While high-quality data is paramount, it is often scarce; conversely, abundant low-quality data is frequently discarded, leading to substantial information loss. Existing data augmentation methods struggle to augment this low-quality data effectively, and the… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

  41. arXiv:2510.19811  [pdf, ps, other

    cs.CL cs.LG

    Hubble: a Model Suite to Advance the Study of LLM Memorization

    Authors: Johnny Tian-Zheng Wei, Ameya Godbole, Mohammad Aflah Khan, Ryan Wang, Xiaoyuan Zhu, James Flemings, Nitya Kashyap, Krishna P. Gummadi, Willie Neiswanger, Robin Jia

    Abstract: We present Hubble, a suite of fully open-source large language models (LLMs) for the scientific study of LLM memorization. Hubble models come in standard and perturbed variants: standard models are pretrained on a large English corpus, and perturbed models are trained in the same way but with controlled insertion of text (e.g., book passages, biographies, and test sets) designed to emulate key mem… ▽ More

    Submitted 22 October, 2025; originally announced October 2025.

  42. arXiv:2510.19361  [pdf, ps, other

    cs.CL cs.AI

    AgenticMath: Enhancing LLM Reasoning via Agentic-based Math Data Generation

    Authors: Xianyang Liu, Yilin Liu, Shuai Wang, Hao Cheng, Andrew Estornell, Yuzhi Zhao, Jiaheng Wei

    Abstract: The creation of high-quality datasets to improve Large Language Model (LLM) reasoning remains a significant challenge, as current methods often suffer from generating low-quality/incorrect answers and limited information richness from available data sources. To address this, we propose AgenticMath, a novel agentic pipeline for generating high-quality mathematical question-answer pairs to enhance t… ▽ More

    Submitted 5 November, 2025; v1 submitted 22 October, 2025; originally announced October 2025.

    Comments: 9 pages

  43. arXiv:2510.18533  [pdf, ps, other

    cs.SD cs.MM eess.AS

    Noise-Conditioned Mixture-of-Experts Framework for Robust Speaker Verification

    Authors: Bin Gu, Lipeng Dai, Huipeng Du, Haitao Zhao, Jibo Wei

    Abstract: Robust speaker verification under noisy conditions remains an open challenge. Conventional deep learning methods learn a robust unified speaker representation space against diverse background noise and achieve significant improvement. In contrast, this paper presents a noise-conditioned mixture-ofexperts framework that decomposes the feature space into specialized noise-aware subspaces for speaker… ▽ More

    Submitted 21 October, 2025; originally announced October 2025.

  44. arXiv:2510.18530  [pdf, ps, other

    cs.SD eess.AS

    A Stage-Wise Learning Strategy with Fixed Anchors for Robust Speaker Verification

    Authors: Bin Gu, Lipeng Dai, Huipeng Du, Haitao Zhao, Jibo Wei

    Abstract: Learning robust speaker representations under noisy conditions presents significant challenges, which requires careful handling of both discriminative and noise-invariant properties. In this work, we proposed an anchor-based stage-wise learning strategy for robust speaker representation learning. Specifically, our approach begins by training a base model to establish discriminative speaker boundar… ▽ More

    Submitted 21 October, 2025; originally announced October 2025.

  45. arXiv:2510.18357  [pdf, ps, other

    cs.CV

    Learning Human-Object Interaction as Groups

    Authors: Jiajun Hong, Jianan Wei, Wenguan Wang

    Abstract: Human-Object Interaction Detection (HOI-DET) aims to localize human-object pairs and identify their interactive relationships. To aggregate contextual cues, existing methods typically propagate information across all detected entities via self-attention mechanisms, or establish message passing between humans and objects with bipartite graphs. However, they primarily focus on pairwise relationships… ▽ More

    Submitted 21 October, 2025; originally announced October 2025.

  46. arXiv:2510.16555  [pdf, ps, other

    cs.AI cs.LG

    Urban-R1: Reinforced MLLMs Mitigate Geospatial Biases for Urban General Intelligence

    Authors: Qiongyan Wang, Xingchen Zou, Yutian Jiang, Haomin Wen, Jiaheng Wei, Qingsong Wen, Yuxuan Liang

    Abstract: Rapid urbanization intensifies the demand for Urban General Intelligence (UGI), referring to AI systems that can understand and reason about complex urban environments. Recent studies have built urban foundation models using supervised fine-tuning (SFT) of LLMs and MLLMs, yet these models exhibit persistent geospatial bias, producing regionally skewed predictions and limited generalization. To thi… ▽ More

    Submitted 18 October, 2025; originally announced October 2025.

  47. arXiv:2510.13864  [pdf, ps, other

    cs.LG cs.AI cs.CV

    Self-Training with Dynamic Weighting for Robust Gradual Domain Adaptation

    Authors: Zixi Wang, Yushe Cao, Yubo Huang, Jinzhu Wei, Jingzehua Xu, Shuai Zhang, Xin Lai

    Abstract: In this paper, we propose a new method called Self-Training with Dynamic Weighting (STDW), which aims to enhance robustness in Gradual Domain Adaptation (GDA) by addressing the challenge of smooth knowledge migration from the source to the target domain. Traditional GDA methods mitigate domain shift through intermediate domains and self-training but often suffer from inefficient knowledge migratio… ▽ More

    Submitted 13 October, 2025; originally announced October 2025.

    Comments: It had formerly appeared as arXiv:2501.19159v2 in error. Accepted by NIPS 25

  48. arXiv:2510.13660  [pdf, ps, other

    cs.CV

    OmniGaze: Reward-inspired Generalizable Gaze Estimation In The Wild

    Authors: Hongyu Qu, Jianan Wei, Xiangbo Shu, Yazhou Yao, Wenguan Wang, Jinhui Tang

    Abstract: Current 3D gaze estimation methods struggle to generalize across diverse data domains, primarily due to i) the scarcity of annotated datasets, and ii) the insufficient diversity of labeled data. In this work, we present OmniGaze, a semi-supervised framework for 3D gaze estimation, which utilizes large-scale unlabeled data collected from diverse and unconstrained real-world environments to mitigate… ▽ More

    Submitted 15 October, 2025; v1 submitted 15 October, 2025; originally announced October 2025.

    Comments: Accepted to NeurIPS 2025; Project page: https://github.com/quhongyu/OmniGaze

  49. arXiv:2510.09988  [pdf, ps, other

    cs.CL

    Unifying Tree Search Algorithm and Reward Design for LLM Reasoning: A Survey

    Authors: Jiaqi Wei, Xiang Zhang, Yuejin Yang, Wenxuan Huang, Juntai Cao, Sheng Xu, Xiang Zhuang, Zhangyang Gao, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Chenyu You, Wanli Ouyang, Siqi Sun

    Abstract: Deliberative tree search is a cornerstone of modern Large Language Model (LLM) research, driving the pivot from brute-force scaling toward algorithmic efficiency. This single paradigm unifies two critical frontiers: \textbf{Test-Time Scaling (TTS)}, which deploys on-demand computation to solve hard problems, and \textbf{Self-Improvement}, which uses search-generated data to durably enhance model p… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.

  50. arXiv:2510.09845  [pdf

    cs.LG cs.AI cs.CV

    Harnessing Self-Supervised Deep Learning and Geostationary Remote Sensing for Advancing Wildfire and Associated Air Quality Monitoring: Improved Smoke and Fire Front Masking using GOES and TEMPO Radiance Data

    Authors: Nicholas LaHaye, Thilanka Munashinge, Hugo Lee, Xiaohua Pan, Gonzalo Gonzalez Abad, Hazem Mahmoud, Jennifer Wei

    Abstract: This work demonstrates the possibilities for improving wildfire and air quality management in the western United States by leveraging the unprecedented hourly data from NASA's TEMPO satellite mission and advances in self-supervised deep learning. Here we demonstrate the efficacy of deep learning for mapping the near real-time hourly spread of wildfire fronts and smoke plumes using an innovative se… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.

    Comments: https://2025.ieeeigarss.org/view_paper.php?PaperNum=6389&SessionID=1611