Skip to main content

Showing 1–50 of 302 results for author: Xie, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.20561  [pdf, ps, other

    cs.CV

    FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models

    Authors: Kaitong Cai, Jusheng Zhang, Jing Yang, Yijia Fan, Pengtao Xie, Jian Wang, Keze Wang

    Abstract: Large vision-language models (VLMs) typically process hundreds or thousands of visual tokens per image or video frame, incurring quadratic attention cost and substantial redundancy. Existing token reduction methods often ignore the textual query or rely on deep attention maps, whose instability under aggressive pruning leads to degraded semantic alignment. We propose FlashVLM, a text guided visu… ▽ More

    Submitted 23 December, 2025; originally announced December 2025.

    Comments: Under submission

  2. arXiv:2512.15000  [pdf, ps, other

    cs.LG cs.AI cs.CL

    DreamPRM-Code: Function-as-Step Process Reward Model with Label Correction for LLM Coding

    Authors: Ruiyi Zhang, Peijia Qin, Qi Cao, Pengtao Xie

    Abstract: Process Reward Models (PRMs) have become essential for improving Large Language Models (LLMs) via test-time scaling, yet their effectiveness in coding remains limited due to the lack of meaningful step decompositions in code and the noise of Monte-Carlo-generated partial labels. We propose DreamPRM-Code, a coding-focused PRM that treats functions as reasoning steps using a Chain-of-Function prompt… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

  3. arXiv:2512.13507  [pdf, ps, other

    cs.CV

    Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

    Authors: Team Seedance, Heyi Chen, Siyan Chen, Xin Chen, Yanfei Chen, Ying Chen, Zhuo Chen, Feng Cheng, Tianheng Cheng, Xinqi Cheng, Xuyan Chi, Jian Cong, Jing Cui, Qinpeng Cui, Qide Dong, Junliang Fan, Jing Fang, Zetao Fang, Chengjian Feng, Han Feng, Mingyuan Gao, Yu Gao, Dong Guo, Qiushan Guo, Boyang Hao , et al. (172 additional authors not shown)

    Abstract: Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional au… ▽ More

    Submitted 23 December, 2025; v1 submitted 15 December, 2025; originally announced December 2025.

    Comments: Seedance 1.5 pro Technical Report

  4. arXiv:2512.08868  [pdf, ps, other

    cs.AI

    EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce

    Authors: Rui Min, Zile Qiao, Ze Xu, Jiawen Zhai, Wenyu Gao, Xuanzhong Chen, Haozhen Sun, Zhen Zhang, Xinyu Wang, Hong Zhou, Wenbiao Yin, Bo Zhang, Xuan Zhou, Ming Yan, Yong Jiang, Haicheng Liu, Liang Ding, Ling Zou, Yi R. Fung, Yalong Li, Pengjun Xie

    Abstract: Foundation agents have rapidly advanced in their ability to reason and interact with real environments, making the evaluation of their core capabilities increasingly important. While many benchmarks have been developed to assess agent performance, most concentrate on academic settings or artificially designed scenarios while overlooking the challenges that arise in real applications. To address th… ▽ More

    Submitted 11 December, 2025; v1 submitted 9 December, 2025; originally announced December 2025.

  5. arXiv:2512.04675  [pdf, ps, other

    cs.CR

    Cryptanalysis of Gleeok-128

    Authors: Siwei Chen, Peipei Xie, Shengyuan Xu, Xiutao Feng, Zejun Xiang, Xiangyong Zeng

    Abstract: Gleeok is a family of low latency keyed pseudorandom functions (PRFs) consisting of three parallel SPN based permutations whose outputs are XORed to form the final value. Both Gleeok-128 and Gleeok-256 use a 256 bit key, with block sizes of 128 and 256 bits, respectively. Owing to its multi branch structure, evaluating security margins and mounting effective key recovery attacks present nontrivial… ▽ More

    Submitted 4 December, 2025; originally announced December 2025.

    Comments: 44 pages, 5 figures

    MSC Class: 94A60

  6. arXiv:2511.16156  [pdf, ps, other

    cs.CV

    Pluggable Pruning with Contiguous Layer Distillation for Diffusion Transformers

    Authors: Jian Ma, Qirong Peng, Xujie Zhu, Peixing Xie, Chen Chen, Haonan Lu

    Abstract: Diffusion Transformers (DiTs) have shown exceptional performance in image generation, yet their large parameter counts incur high computational costs, impeding deployment in resource-constrained settings. To address this, we propose Pluggable Pruning with Contiguous Layer Distillation (PPCL), a flexible structured pruning framework specifically designed for DiT architectures. First, we identify re… ▽ More

    Submitted 20 November, 2025; originally announced November 2025.

    Comments: https://github.com/OPPO-Mente-Lab/Qwen-Image-Pruning

  7. arXiv:2511.14301  [pdf, ps, other

    cs.CR cs.CL cs.LG

    Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion

    Authors: Eric Xue, Ruiyi Zhang, Zijun Zhang, Pengtao Xie

    Abstract: Transformer models are foundational to natural language processing (NLP) applications, yet remain vulnerable to backdoor attacks introduced through poisoned data, which implant hidden behaviors during training. To strengthen the ability to prevent such compromises, recent research has focused on designing increasingly stealthy attacks to stress-test existing defenses, pairing backdoor behaviors wi… ▽ More

    Submitted 25 November, 2025; v1 submitted 18 November, 2025; originally announced November 2025.

  8. arXiv:2511.12288  [pdf, ps, other

    cs.SE

    Reducing Hallucinations in LLM-Generated Code via Semantic Triangulation

    Authors: Yihan Dai, Sijie Liang, Haotian Xu, Peichu Xie, Sergey Mechtaev

    Abstract: When generating code from natural language prompts, an LLM samples programs from a probability distribution, many of which might be incorrect. Sample consensus techniques - such as majority voting or validation against generated tests or specifications - aim to identify a correct program in the sample or abstain if none is valid. However, existing methods often fail to select a correct solution wh… ▽ More

    Submitted 21 November, 2025; v1 submitted 15 November, 2025; originally announced November 2025.

  9. arXiv:2511.10909  [pdf, ps, other

    cs.AR cs.LG math.NA

    MMA-Sim: Bit-Accurate Reference Model of Tensor Cores and Matrix Cores

    Authors: Peichen Xie, Yang Wang, Fan Yang, Mao Yang

    Abstract: The rapidly growing computation demands of deep neural networks (DNNs) have driven hardware vendors to integrate matrix multiplication accelerators (MMAs), such as NVIDIA Tensor Cores and AMD Matrix Cores, into modern GPUs. However, due to distinct and undocumented arithmetic specifications for floating-point matrix multiplication, some MMAs can lead to numerical imprecision and inconsistency that… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

  10. arXiv:2511.07327  [pdf, ps, other

    cs.AI cs.CL

    IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

    Authors: Guoxin Chen, Zile Qiao, Xuanzhong Chen, Donglei Yu, Haotian Xu, Wayne Xin Zhao, Ruihua Song, Wenbiao Yin, Huifeng Yin, Liwen Zhang, Kuan Li, Minpeng Liao, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou

    Abstract: Recent advances in deep-research agents have shown promise for autonomous knowledge construction through dynamic reasoning over external sources. However, existing approaches rely on a mono-contextual paradigm that accumulates all information in a single, expanding context window, leading to context suffocation and noise contamination that limit their effectiveness on long-horizon tasks. We introd… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

    Comments: https://github.com/Alibaba-NLP/DeepResearch

  11. arXiv:2510.27571  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum

    Authors: Zhuoning Guo, Mingxin Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Xiaowen Chu

    Abstract: The prevailing video retrieval paradigm is structurally misaligned, as narrow benchmarks incentivize correspondingly limited data and single-task training. Therefore, universal capability is suppressed due to the absence of a diagnostic evaluation that defines and demands multi-dimensional generalization. To break this cycle, we introduce a framework built on the co-design of evaluation, data, and… ▽ More

    Submitted 31 October, 2025; originally announced October 2025.

  12. arXiv:2510.24701  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG cs.MA

    Tongyi DeepResearch Technical Report

    Authors: Tongyi DeepResearch Team, Baixuan Li, Bo Zhang, Dingchu Zhang, Fei Huang, Guangyu Li, Guoxin Chen, Huifeng Yin, Jialong Wu, Jingren Zhou, Kuan Li, Liangcai Su, Litu Ou, Liwen Zhang, Pengjun Xie, Rui Ye, Wenbiao Yin, Xinmiao Yu, Xinyu Wang, Xixi Wu, Xuanzhong Chen, Yida Zhao, Zhen Zhang, Zhengwei Tao, Zhongwang Zhang , et al. (32 additional authors not shown)

    Abstract: We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across co… ▽ More

    Submitted 4 November, 2025; v1 submitted 28 October, 2025; originally announced October 2025.

    Comments: https://tongyi-agent.github.io/blog

  13. arXiv:2510.24699  [pdf, ps, other

    cs.CL cs.AI cs.LG

    AgentFold: Long-Horizon Web Agents with Proactive Context Management

    Authors: Rui Ye, Zhongwang Zhang, Kuan Li, Huifeng Yin, Zhengwei Tao, Yida Zhao, Liangcai Su, Liwen Zhang, Zile Qiao, Xinyu Wang, Pengjun Xie, Fei Huang, Siheng Chen, Jingren Zhou, Yong Jiang

    Abstract: LLM-based web agents show immense promise for information seeking, yet their effectiveness on long-horizon tasks is hindered by a fundamental trade-off in context management. Prevailing ReAct-based agents suffer from context saturation as they accumulate noisy, raw histories, while methods that fixedly summarize the full history at each step risk the irreversible loss of critical details. Addressi… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

    Comments: 26 pages, 9 figures

  14. arXiv:2510.24698  [pdf, ps, other

    cs.CL cs.AI

    ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking

    Authors: Baixuan Li, Dingchu Zhang, Jialong Wu, Wenbiao Yin, Zhengwei Tao, Yida Zhao, Liwen Zhang, Haiyang Shen, Runnan Fang, Pengjun Xie, Jingren Zhou, Yong Jiang

    Abstract: Parallel thinking expands exploration breadth, complementing the deep exploration of information-seeking (IS) agents to further enhance problem-solving capability. However, conventional parallel thinking faces two key challenges in this setting: inefficiency from repeatedly rolling out from scratch, and difficulty in integrating long-horizon reasoning trajectories during answer generation, as limi… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

  15. arXiv:2510.24697  [pdf, ps, other

    cs.CL

    WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking

    Authors: Zhengwei Tao, Haiyang Shen, Baixuan Li, Wenbiao Yin, Jialong Wu, Kuan Li, Zhongwang Zhang, Huifeng Yin, Rui Ye, Liwen Zhang, Xinyu Wang, Pengjun Xie, Jingren Zhou, Yong Jiang

    Abstract: Large Language Model (LLM)-based agents have emerged as a transformative approach for open-ended problem solving, with information seeking (IS) being a core capability that enables autonomous reasoning and decision-making. While prior research has largely focused on improving retrieval depth, we observe that current IS agents often suffer from low search efficiency, which in turn constrains overal… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

  16. arXiv:2510.24695  [pdf, ps, other

    cs.CL

    AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis

    Authors: Xuanzhong Chen, Zile Qiao, Guoxin Chen, Liangcai Su, Zhen Zhang, Xinyu Wang, Pengjun Xie, Fei Huang, Jingren Zhou, Yong Jiang

    Abstract: Training large language model agents on tasks at the frontier of their capabilities is key to unlocking advanced reasoning. We introduce a data synthesis approach inspired by the educational theory of the Zone of Proximal Development (ZPD), which defines this frontier as tasks an LLM cannot solve alone but can master with guidance. To operationalize this, we present the AgentFrontier Engine, an au… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

    Comments: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/

  17. arXiv:2510.24694  [pdf, ps, other

    cs.CL cs.AI

    Repurposing Synthetic Data for Fine-grained Search Agent Supervision

    Authors: Yida Zhao, Kuan Li, Xixi Wu, Liwen Zhang, Dingchu Zhang, Baixuan Li, Maojia Song, Zhuo Chen, Chenxi Wang, Xinyu Wang, Kewei Tu, Pengjun Xie, Jingren Zhou, Yong Jiang

    Abstract: LLM-based search agents are increasingly trained on entity-centric synthetic data to solve complex, knowledge-intensive tasks. However, prevailing training methods like Group Relative Policy Optimization (GRPO) discard this rich entity information, relying instead on sparse, outcome-based rewards. This critical limitation renders them unable to distinguish informative "near-miss" samples-those wit… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

  18. arXiv:2510.23458  [pdf, ps, other

    cs.CL cs.AI

    BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents

    Authors: Litu Ou, Kuan Li, Huifeng Yin, Liwen Zhang, Zhongwang Zhang, Xixi Wu, Rui Ye, Zile Qiao, Pengjun Xie, Jingren Zhou, Yong Jiang

    Abstract: Confidence in LLMs is a useful indicator of model uncertainty and answer reliability. Existing work mainly focused on single-turn scenarios, while research on confidence in complex multi-turn interactions is limited. In this paper, we investigate whether LLM-based search agents have the ability to communicate their own confidence through verbalized confidence scores after long sequences of actions… ▽ More

    Submitted 28 October, 2025; v1 submitted 27 October, 2025; originally announced October 2025.

    Comments: 25 pages

  19. arXiv:2510.22733  [pdf, ps, other

    cs.CL cs.AI cs.IR

    E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker

    Authors: Qi Liu, Yanzhao Zhang, Mingxin Li, Dingkun Long, Pengjun Xie, Jiaxin Mao

    Abstract: Text embedding models serve as a fundamental component in real-world search applications. By mapping queries and documents into a shared embedding space, they deliver competitive retrieval performance with high efficiency. However, their ranking fidelity remains limited compared to dedicated rerankers, especially recent LLM-based listwise rerankers, which capture fine-grained query-document and do… ▽ More

    Submitted 30 October, 2025; v1 submitted 26 October, 2025; originally announced October 2025.

    Comments: Code and models are avaliable at https://alibaba-nlp.github.io/E2Rank

  20. arXiv:2510.22728  [pdf, ps, other

    cs.LG cs.CV

    S-Chain: Structured Visual Chain-of-Thought For Medicine

    Authors: Khai Le-Duc, Duy M. H. Nguyen, Phuong T. H. Trinh, Tien-Phat Nguyen, Nghiem T. Diep, An Ngo, Tung Vu, Trinh Vuong, Anh-Tien Nguyen, Mau Nguyen, Van Trung Hoang, Khai-Nguyen Nguyen, Hy Nguyen, Chris Ngo, Anji Liu, Nhat Ho, Anne-Christin Hauschild, Khanh Xuan Nguyen, Thanh Nguyen-Tang, Pengtao Xie, Daniel Sonntag, James Zou, Mathias Niepert, Anh Totti Nguyen

    Abstract: Faithful reasoning in medical vision-language models (VLMs) requires not only accurate predictions but also transparent alignment between textual rationales and visual evidence. While Chain-of-Thought (CoT) prompting has shown promise in medical visual question answering (VQA), no large-scale expert-level dataset has captured stepwise reasoning with precise visual grounding. We introduce S-Chain,… ▽ More

    Submitted 26 October, 2025; originally announced October 2025.

    Comments: First version

  21. arXiv:2510.21712  [pdf, ps, other

    cs.IR cs.AI cs.CL

    DecoupleSearch: Decouple Planning and Search via Hierarchical Reward Modeling

    Authors: Hao Sun, Zile Qiao, Bo Wang, Guoxin Chen, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, Yan Zhang

    Abstract: Retrieval-Augmented Generation (RAG) systems have emerged as a pivotal methodology for enhancing Large Language Models (LLMs) through the dynamic integration of external knowledge. To further improve RAG's flexibility, Agentic RAG introduces autonomous agents into the workflow. However, Agentic RAG faces several challenges: (1) the success of each step depends on both high-quality planning and acc… ▽ More

    Submitted 7 September, 2025; originally announced October 2025.

    Comments: EMNLP 2025 Main Conference

  22. arXiv:2510.18606  [pdf, ps, other

    cs.MM eess.IV eess.SY

    PIRA: Pan-CDN Intra-video Resource Adaptation for Short Video Streaming

    Authors: Chunyu Qiao, Tong Liu, Yucheng Zhang, Zhiwei Fan, Pengjin Xie, Zhen Wang, Liang Liu

    Abstract: In large scale short video platforms, CDN resource selection plays a critical role in maintaining Quality of Experience (QoE) while controlling escalating traffic costs. To better understand this phenomenon, we conduct in the wild network measurements during video playback in a production short video system. The results reveal that CDNs delivering higher average QoE often come at greater financial… ▽ More

    Submitted 21 October, 2025; originally announced October 2025.

  23. arXiv:2510.18459  [pdf, ps, other

    cs.MM cs.AI eess.IV

    DeLoad: Demand-Driven Short-Video Preloading with Scalable Watch-Time Estimation

    Authors: Tong Liu, Zhiwei Fan, Guanyan Peng, Haodan Zhang, Yucheng Zhang, Zhen Wang, Pengjin Xie, Liang Liu

    Abstract: Short video streaming has become a dominant paradigm in digital media, characterized by rapid swiping interactions and diverse media content. A key technical challenge is designing an effective preloading strategy that dynamically selects and prioritizes download tasks from an evolving playlist, balancing Quality of Experience (QoE) and bandwidth efficiency under practical commercial constraints.… ▽ More

    Submitted 21 October, 2025; originally announced October 2025.

  24. arXiv:2510.14824  [pdf, ps, other

    cs.CL cs.CV cs.IR

    Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking

    Authors: Ziqi Dai, Xin Zhang, Mingxin Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Wenjie Li, Min Zhang

    Abstract: In information retrieval, training reranking models mainly focuses on two types of objectives: metric learning (e.g. contrastive loss to increase the predicted scores on relevant query-document pairs) and classification (binary label prediction of relevance vs. irrelevance). For BERT-style encoders, various studies have shown that contrastive learning (CL) can be more effective than discriminative… ▽ More

    Submitted 16 October, 2025; originally announced October 2025.

  25. arXiv:2510.14276  [pdf, ps, other

    cs.CL

    Qwen3Guard Technical Report

    Authors: Haiquan Zhao, Chenhan Yuan, Fei Huang, Xiaomeng Hu, Yichang Zhang, An Yang, Bowen Yu, Dayiheng Liu, Jingren Zhou, Junyang Lin, Baosong Yang, Chen Cheng, Jialong Tang, Jiandong Jiang, Jianwei Zhang, Jijie Xu, Ming Yan, Minmin Sun, Pei Zhang, Pengjun Xie, Qiaoyu Tang, Qin Zhu, Rong Zhang, Shibin Wu, Shuo Zhang , et al. (18 additional authors not shown)

    Abstract: As large language models (LLMs) become more capable and widely used, ensuring the safety of their outputs is increasingly critical. Existing guardrail models, though useful in static evaluation settings, face two major limitations in real-world applications: (1) they typically output only binary "safe/unsafe" labels, which can be interpreted inconsistently across diverse safety policies, rendering… ▽ More

    Submitted 16 October, 2025; originally announced October 2025.

  26. arXiv:2510.10912  [pdf, ps, other

    cs.RO

    More than A Point: Capturing Uncertainty with Adaptive Affordance Heatmaps for Spatial Grounding in Robotic Tasks

    Authors: Xinyu Shao, Yanzhe Tang, Pengwei Xie, Kaiwen Zhou, Yuzheng Zhuang, Xingyue Quan, Jianye Hao, Long Zeng, Xiu Li

    Abstract: Many language-guided robotic systems rely on collapsing spatial reasoning into discrete points, making them brittle to perceptual noise and semantic ambiguity. To address this challenge, we propose RoboMAP, a framework that represents spatial targets as continuous, adaptive affordance heatmaps. This dense representation captures the uncertainty in spatial grounding and provides richer information… ▽ More

    Submitted 15 October, 2025; v1 submitted 12 October, 2025; originally announced October 2025.

    Comments: More details and videos can be found at https://robo-map.github.io

  27. arXiv:2510.09180  [pdf, ps, other

    cs.LG cs.SE

    RepDL: Bit-level Reproducible Deep Learning Training and Inference

    Authors: Peichen Xie, Xian Zhang, Shuo Chen

    Abstract: Non-determinism and non-reproducibility present significant challenges in deep learning, leading to inconsistent results across runs and platforms. These issues stem from two origins: random number generation and floating-point computation. While randomness can be controlled through deterministic configurations, floating-point inconsistencies remain largely unresolved. To address this, we introduc… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.

    Comments: Originally drafted in 2023

  28. arXiv:2510.05137  [pdf, ps, other

    cs.CL

    Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics

    Authors: Maojia Song, Renhang Liu, Xinyu Wang, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou, Dorien Herremans, Soujanya Poria

    Abstract: RAG (Retrieval-Augmented Generation) systems and web agents are increasingly evaluated on multi-hop deep search tasks, yet current practice suffers from two major limitations. First, most benchmarks leak the reasoning path in the question text, allowing models to follow surface cues rather than discover reasoning chains autonomously. Second, evaluation is typically reduced to a single pass rate, w… ▽ More

    Submitted 10 December, 2025; v1 submitted 1 October, 2025; originally announced October 2025.

  29. arXiv:2510.04935  [pdf, ps, other

    cs.AI cs.CL cs.LG

    MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning

    Authors: Guoxin Chen, Zile Qiao, Wenqing Wang, Donglei Yu, Xuanzhong Chen, Hao Sun, Minpeng Liao, Kai Fan, Yong Jiang, Penguin Xie, Wayne Xin Zhao, Ruihua Song, Fei Huang

    Abstract: Large Reasoning Models (LRMs) often exhibit a tendency for overanalysis in simple tasks, where the models excessively utilize System 2-type, deliberate reasoning, leading to inefficient token generation. Furthermore, these models face challenges in adapting their reasoning capabilities to rapidly changing environments due to the static nature of their pretraining data. To address these issues, adv… ▽ More

    Submitted 6 October, 2025; originally announced October 2025.

    Comments: Ongoing Work

  30. arXiv:2510.02340  [pdf, ps, other

    cs.CL cs.LG

    Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge Cutoffs

    Authors: Xin Gao, Ruiyi Zhang, Daniel Du, Saurabh Mahindre, Sai Ashish Somayajula, Pengtao Xie

    Abstract: Large Language Models (LLMs) are widely used for temporal prediction, but their reliance on pretraining data raises contamination concerns, as accurate predictions on pre-cutoff test data may reflect memorization rather than reasoning, leading to an overestimation of their generalization capability. With the recent emergence of prompting-based unlearning techniques, a natural question arises: Can… ▽ More

    Submitted 14 October, 2025; v1 submitted 26 September, 2025; originally announced October 2025.

    Comments: Published at EMNLP 2025; Code and data available at https://github.com/gxx27/time_unlearn

  31. arXiv:2509.25084  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG

    Scaling Generalist Data-Analytic Agents

    Authors: Shuofei Qiao, Yanqiu Zhao, Zhisong Qiu, Xiaobin Wang, Jintian Zhang, Zhao Bin, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen

    Abstract: Data-analytic agents are emerging as a key catalyst for automated scientific discovery and for the vision of Innovating AI. Current approaches, however, rely heavily on prompt engineering over proprietary models, while open-source models struggle to face diverse-format, large-scale data files and long-horizon, multi-step reasoning that real-world analytics demands. This paper introduces DataMind,… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: Work in progress

  32. arXiv:2509.13313  [pdf, ps, other

    cs.CL

    ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

    Authors: Xixi Wu, Kuan Li, Yida Zhao, Liwen Zhang, Litu Ou, Huifeng Yin, Zhongwang Zhang, Xinmiao Yu, Dingchu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Minhao Cheng, Shuai Wang, Hong Cheng, Jingren Zhou

    Abstract: Large Language Model (LLM)-based web agents demonstrate strong performance on knowledge-intensive tasks but are hindered by context window limitations in paradigms like ReAct. Complex queries involving multiple entities, intertwined relationships, and high uncertainty demand extensive search cycles that rapidly exhaust context budgets before reaching solutions. To overcome this challenge, we intro… ▽ More

    Submitted 15 October, 2025; v1 submitted 16 September, 2025; originally announced September 2025.

    Comments: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/

  33. arXiv:2509.13312  [pdf, ps, other

    cs.CL

    WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

    Authors: Zijian Li, Xin Guan, Bo Zhang, Shen Huang, Houquan Zhou, Shaopeng Lai, Ming Yan, Yong Jiang, Pengjun Xie, Fei Huang, Jun Zhang, Jingren Zhou

    Abstract: This paper tackles \textbf{open-ended deep research (OEDR)}, a complex challenge where AI agents must synthesize vast web-scale information into insightful reports. Current approaches are plagued by dual-fold limitations: static research pipelines that decouple planning from evidence acquisition and monolithic generation paradigms that include redundant, irrelevant evidence, suffering from halluci… ▽ More

    Submitted 7 October, 2025; v1 submitted 16 September, 2025; originally announced September 2025.

    Comments: An agent system for open-ended deep research

  34. arXiv:2509.13311  [pdf, ps, other

    cs.CL

    Towards General Agentic Intelligence via Environment Scaling

    Authors: Runnan Fang, Shihao Cai, Baixuan Li, Jialong Wu, Guangyu Li, Wenbiao Yin, Xinyu Wang, Xiaobin Wang, Liangcai Su, Zhen Zhang, Shibin Wu, Zhengwei Tao, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou

    Abstract: Advanced agentic intelligence is a prerequisite for deploying Large Language Models in practical, real-world applications. Diverse real-world APIs demand precise, robust function-calling intelligence, which needs agents to develop these capabilities through interaction in varied environments. The breadth of function-calling competence is closely tied to the diversity of environments in which agent… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

    Comments: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/

  35. arXiv:2509.13310  [pdf, ps, other

    cs.CL

    Scaling Agents via Continual Pre-training

    Authors: Liangcai Su, Zhen Zhang, Guangyu Li, Zhuo Chen, Chenxi Wang, Maojia Song, Xinyu Wang, Kuan Li, Jialong Wu, Xuanzhong Chen, Zile Qiao, Zhongwang Zhang, Huifeng Yin, Shihao Cai, Runnan Fang, Zhengwei Tao, Wenbiao Yin, Chenxiong Qian, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou

    Abstract: Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models consistently underperform in agentic tasks, particularly in open-source implementations. We identify the root cause: the absence of robust agentic foundation models force… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

    Comments: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/

  36. arXiv:2509.13309  [pdf, ps, other

    cs.CL

    WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents

    Authors: Zile Qiao, Guoxin Chen, Xuanzhong Chen, Donglei Yu, Wenbiao Yin, Xinyu Wang, Zhen Zhang, Baixuan Li, Huifeng Yin, Kuan Li, Rui Min, Minpeng Liao, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou

    Abstract: Recent advances in deep-research systems have demonstrated the potential for AI agents to autonomously discover and synthesize knowledge from external sources. In this paper, we introduce WebResearcher, a novel framework for building such agents through two key components: (1) WebResearcher, an iterative deep-research paradigm that reformulates deep research as a Markov Decision Process, where age… ▽ More

    Submitted 20 September, 2025; v1 submitted 16 September, 2025; originally announced September 2025.

    Comments: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/

  37. arXiv:2509.13305  [pdf, ps, other

    cs.LG cs.CL

    WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

    Authors: Kuan Li, Zhongwang Zhang, Huifeng Yin, Rui Ye, Yida Zhao, Liwen Zhang, Litu Ou, Dingchu Zhang, Xixi Wu, Jialong Wu, Xinyu Wang, Zile Qiao, Zhen Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou

    Abstract: Transcending human cognitive limitations represents a critical frontier in LLM training. Proprietary agentic systems like DeepResearch have demonstrated superhuman capabilities on extremely complex information-seeking benchmarks such as BrowseComp, a feat previously unattainable. We posit that their success hinges on a sophisticated reasoning pattern absent in open-source models: the ability to sy… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

    Comments: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/

  38. arXiv:2509.09332  [pdf, ps, other

    cs.RO cs.AI cs.CL cs.CV

    OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning

    Authors: Yuecheng Liu, Dafeng Chi, Shiguang Wu, Zhanguang Zhang, Yuzheng Zhuang, Bowen Yang, He Zhu, Lingfeng Zhang, Pengwei Xie, David Gamaliel Arcos Bravo, Yingxue Zhang, Jianye Hao, Xingyue Quan

    Abstract: Recent advances in multimodal large language models (MLLMs) have opened new opportunities for embodied intelligence, enabling multimodal understanding, reasoning, and interaction, as well as continuous spatial decision-making. Nevertheless, current MLLM-based embodied systems face two critical limitations. First, Geometric Adaptability Gap: models trained solely on 2D inputs or with hard-coded 3D… ▽ More

    Submitted 12 September, 2025; v1 submitted 11 September, 2025; originally announced September 2025.

  39. arXiv:2509.07538  [pdf, ps, other

    cs.CV

    TextlessRAG: End-to-End Visual Document RAG by Speech Without Text

    Authors: Peijin Xie, Shun Qian, Bingquan Liu, Dexin Wang, Lin Sun, Xiangzheng Zhang

    Abstract: Document images encapsulate a wealth of knowledge, while the portability of spoken queries enables broader and flexible application scenarios. Yet, no prior work has explored knowledge base question answering over visual document images with queries provided directly in speech. We propose TextlessRAG, the first end-to-end framework for speech-based question answering over large-scale document imag… ▽ More

    Submitted 10 September, 2025; v1 submitted 9 September, 2025; originally announced September 2025.

    Comments: 5 pages, 4 figures,

  40. arXiv:2509.07413  [pdf, ps, other

    cs.RO

    Robust Docking Maneuvers for Autonomous Trolley Collection: An Optimization-Based Visual Servoing Scheme

    Authors: Yuhan Pang, Bingyi Xia, Zhe Zhang, Zhirui Sun, Peijia Xie, Bike Zhu, Wenjun Xu, Jiankun Wang

    Abstract: Service robots have demonstrated significant potential for autonomous trolley collection and redistribution in public spaces like airports or warehouses to improve efficiency and reduce cost. Usually, a fully autonomous system for the collection and transportation of multiple trolleys is based on a Leader-Follower formation of mobile manipulators, where reliable docking maneuvers of the mobile bas… ▽ More

    Submitted 17 September, 2025; v1 submitted 9 September, 2025; originally announced September 2025.

  41. arXiv:2509.06650  [pdf, ps, other

    cs.CL cs.IR

    Domain-Aware RAG: MoL-Enhanced RL for Efficient Training and Scalable Retrieval

    Authors: Hao Lin, Peitong Xie, Jingxue Chen, Jie Lin, Qingkun Tang, Qianchun Lu

    Abstract: Retrieval-Augmented Generation (RAG) systems rely heavily on the retrieval stage, particularly the coarse-ranking process. Existing coarse-ranking optimization approaches often struggle to balance domain-specific knowledge learning with query enhencement, resulting in suboptimal retrieval performance. To address this challenge, we propose MoLER, a domain-aware RAG method that uses MoL-Enhanced Rei… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

  42. arXiv:2509.05542  [pdf, ps, other

    cs.LG

    DreamPRM-1.5: Unlocking the Potential of Each Instance for Multimodal Process Reward Model Training

    Authors: Qi Cao, Pengtao Xie

    Abstract: Training multimodal process reward models (PRMs) is hard due to (i) distribution shift between training set and test set and (ii) quality imbalance across training data samples. While domain-level reweighting (e.g., DreamPRM) aligns training with test-time objectives, it leaves a clear gap to an oracle upper bound (pass@N), even under a "sanity check" that uses test set data to probe headroom -- p… ▽ More

    Submitted 21 October, 2025; v1 submitted 5 September, 2025; originally announced September 2025.

  43. arXiv:2509.00520  [pdf, ps, other

    cs.IR cs.CL

    ERank: Fusing Supervised Fine-Tuning and Reinforcement Learning for Effective and Efficient Text Reranking

    Authors: Yuzheng Cai, Yanzhao Zhang, Dingkun Long, Mingxin Li, Pengjun Xie, Weiguo Zheng

    Abstract: Text reranking models are a crucial component in modern systems like Retrieval-Augmented Generation, tasked with selecting the most relevant documents prior to generation. However, current Large Language Models (LLMs) powered rerankers often face a fundamental trade-off. On one hand, Supervised Fine-Tuning based pointwise methods that frame relevance as a binary classification task lack the necess… ▽ More

    Submitted 30 August, 2025; originally announced September 2025.

  44. arXiv:2508.20304  [pdf, ps, other

    cs.AR eess.SY

    Testing and Fault Tolerance Techniques for CNT-Based FPGAs

    Authors: Siyuan Lu, Kangwei Xu, Peng Xie, Rui Wang, Yuanqing Cheng

    Abstract: As the semiconductor manufacturing process technology node shrinks into the nanometer-scale, the CMOS-based Field Programmable Gate Arrays (FPGAs) face big challenges in scalability of performance and power consumption. Multi-walled Carbon Nanotube (MWCNT) serves as a promising candidate for Cu interconnects thanks to the superior conductivity. Moreover, Carbon Nanotube Field Transistor (CNFET) al… ▽ More

    Submitted 18 September, 2025; v1 submitted 27 August, 2025; originally announced August 2025.

    Comments: 13 pages

  45. arXiv:2508.20290  [pdf, ps, other

    cs.LG cs.AI math.NA math.OC

    Objective Value Change and Shape-Based Accelerated Optimization for the Neural Network Approximation

    Authors: Pengcheng Xie, Zihao Zhou, Zijian Zhou

    Abstract: This paper introduce a novel metric of an objective function f, we say VC (value change) to measure the difficulty and approximation affection when conducting an neural network approximation task, and it numerically supports characterizing the local performance and behavior of neural network approximation. Neural networks often suffer from unpredictable local performance, which can hinder their re… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

    Comments: 27 pages

    MSC Class: 68T07; 65K05; 65D15; 90C30

  46. arXiv:2508.20210  [pdf, ps, other

    cs.CV

    InfinityHuman: Towards Long-Term Audio-Driven Human

    Authors: Xiaodi Li, Pan Xie, Yi Ren, Qijun Gan, Chen Zhang, Fangyuan Kong, Xiang Yin, Bingyue Peng, Zehuan Yuan

    Abstract: Audio-driven human animation has attracted wide attention thanks to its practical applications. However, critical challenges remain in generating high-resolution, long-duration videos with consistent appearance and natural hand motions. Existing methods extend videos using overlapping motion frames but suffer from error accumulation, leading to identity drift, color shifts, and scene instability.… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

    Comments: Project Page: https://infinityhuman.github.io/

  47. arXiv:2508.06433  [pdf, ps, other

    cs.CL cs.AI cs.LG cs.MA

    Memp: Exploring Agent Procedural Memory

    Authors: Runnan Fang, Yuan Liang, Xiaobin Wang, Jialong Wu, Shuofei Qiao, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang

    Abstract: Large Language Models (LLMs) based agents excel at diverse tasks, yet they suffer from brittle procedural memory that is manually engineered or entangled in static parameters. In this work, we investigate strategies to endow agents with a learnable, updatable, and lifelong procedural memory. We propose Memp that distills past agent trajectories into both fine-grained, step-by-step instructions and… ▽ More

    Submitted 13 August, 2025; v1 submitted 8 August, 2025; originally announced August 2025.

    Comments: Work in progress

  48. arXiv:2508.05748  [pdf, ps, other

    cs.IR

    WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

    Authors: Xinyu Geng, Peng Xia, Zhen Zhang, Xinyu Wang, Qiuchen Wang, Ruixue Ding, Chenxi Wang, Jialong Wu, Yida Zhao, Kuan Li, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou

    Abstract: Web agents such as Deep Research have demonstrated superhuman cognitive abilities, capable of solving highly challenging information-seeking problems. However, most research remains primarily text-centric, overlooking visual information in the real world. This makes multimodal Deep Research highly challenging, as such agents require much stronger reasoning abilities in perception, logic, knowledge… ▽ More

    Submitted 31 August, 2025; v1 submitted 7 August, 2025; originally announced August 2025.

  49. arXiv:2508.04195  [pdf, ps, other

    cs.SD cs.AI cs.LG

    NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations

    Authors: Huan Liao, Qinke Ni, Yuancheng Wang, Yiheng Lu, Haoyue Zhan, Pengyuan Xie, Qiang Zhang, Zhizheng Wu

    Abstract: Paralinguistic vocalizations-including non-verbal sounds like laughter and breathing, as well as lexicalized interjections such as "uhm" and "oh"-are integral to natural spoken communication. Despite their importance in conveying affect, intent, and interactional cues, such cues remain largely overlooked in conventional automatic speech recognition (ASR) and text-to-speech (TTS) systems. We presen… ▽ More

    Submitted 6 August, 2025; originally announced August 2025.

  50. arXiv:2508.02128  [pdf, ps, other

    cs.LG cs.AI

    Amber Pruner: Leveraging N:M Activation Sparsity for Efficient Prefill in Large Language Models

    Authors: Tai An, Ruwu Cai, Yanzhe Zhang, Yang Liu, Hao Chen, Pengcheng Xie, Sheng Chang, Yiwu Yao, Gongyi Wang

    Abstract: In the era of large language models (LLMs), N:M sparsity has emerged as a structured compression technique critical for accelerating inference. While prior work has primarily focused on weight sparsity, it often suffers from significant accuracy degradation. Activation sparsity, though promising, is typically training-dependent and faces challenges in generalization. To address these limitations,… ▽ More

    Submitted 4 August, 2025; originally announced August 2025.