Skip to main content

Showing 1–50 of 296 results for author: Guo, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.09700  [pdf, ps, other

    cs.CV cs.AI

    Attention-Guided Flow-Matching for Sparse 3D Geological Generation

    Authors: Zhixiang Lu, Mengqi Han, Peixin Guo, Tianming Bai, Jionglong Su, Fei Fang, Sifan Song

    Abstract: Constructing high-resolution 3D geological models from sparse 1D borehole and 2D surface data is a highly ill-posed inverse problem. Traditional heuristic and implicit modeling methods fundamentally fail to capture non-linear topological discontinuities under extreme sparsity, often yielding unrealistic artifacts. Furthermore, while deep generative architectures like Diffusion Models have revoluti… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  2. arXiv:2604.07798  [pdf, ps, other

    cs.AI

    Lightweight LLM Agent Memory with Small Language Models

    Authors: Jiaquan Zhang, Chaoning Zhang, Shuxu Chen, Zhenzhen Huang, Pengcheng Zheng, Zhicheng Wang, Ping Guo, Fan Mo, Sung-Ho Bae, Jie Zou, Jiwei Wei, Yang Yang

    Abstract: Although LLM agents can leverage tools for complex tasks, they still need memory to maintain cross-turn consistency and accumulate reusable information in long-horizon interactions. However, retrieval-based external memory systems incur low online overhead but suffer from unstable accuracy due to limited query construction and candidate filtering. In contrast, many systems use repeated large-model… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

    Comments: accept by ACL 2026

  3. arXiv:2604.06201  [pdf, ps, other

    cs.CL cs.AI

    Beyond Facts: Benchmarking Distributional Reading Comprehension in Large Language Models

    Authors: Pei-Fu Guo, Ya-An Tsai, Chun-Chia Hsu, Kai-Xin Chen, Yun-Da Tsai, Kai-Wei Chang, Nanyun Peng, Mi-Yen Yeh, Shou-De Lin

    Abstract: While most reading comprehension benchmarks for LLMs focus on factual information that can be answered by localizing specific textual evidence, many real-world tasks require understanding distributional information, such as population-level trends and preferences expressed across collections of text. We introduce Text2DistBench, a reading comprehension benchmark for evaluating LLMs' ability to inf… ▽ More

    Submitted 13 March, 2026; originally announced April 2026.

  4. arXiv:2604.06168  [pdf, ps, other

    cs.CV cs.RO

    Action Images: End-to-End Policy Learning via Multiview Video Generation

    Authors: Haoyu Zhen, Zixian Gao, Qiao Sun, Yilin Zhao, Yuncong Yang, Yilun Du, Pengsheng Guo, Tsun-Hsuan Wang, Yi-Ling Qiao, Chuang Gan

    Abstract: World action models (WAMs) have emerged as a promising direction for robot policy learning, as they can leverage powerful video backbones to model the future states. However, existing approaches often rely on separate action modules, or use action representations that are not pixel-grounded, making it difficult to fully exploit the pretrained knowledge of video models and limiting transfer across… ▽ More

    Submitted 14 April, 2026; v1 submitted 7 April, 2026; originally announced April 2026.

    Comments: Project Page: https://actionimages.github.io/

  5. arXiv:2604.02320  [pdf, ps, other

    cs.CV cs.GR

    Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining

    Authors: Junxuan Li, Rawal Khirodkar, Chengan He, Zhongshi Jiang, Giljoo Nam, Lingchen Yang, Jihyun Lee, Egor Zakharov, Zhaoen Su, Rinat Abdrashitov, Yuan Dong, Julieta Martinez, Kai Li, Qingyang Tan, Takaaki Shiratori, Matthew Hu, Peihong Guo, Xuhua Huang, Ariyan Zarei, Marco Pesavento, Yichen Xu, He Wen, Teng Deng, Wyatt Borsos, Anjali Thakrar , et al. (15 additional authors not shown)

    Abstract: High-quality 3D avatar modeling faces a critical trade-off between fidelity and generalization. On the one hand, multi-view studio data enables high-fidelity modeling of humans with precise control over expressions and poses, but it struggles to generalize to real-world data due to limited scale and the domain gap between the studio environment and the real world. On the other hand, recent large-s… ▽ More

    Submitted 7 April, 2026; v1 submitted 2 April, 2026; originally announced April 2026.

    Comments: Accepted in CVPR2026. Website: https://junxuan-li.github.io/lca

  6. arXiv:2603.21071  [pdf, ps, other

    cs.CV cs.AI

    CTFS : Collaborative Teacher Framework for Forward-Looking Sonar Image Semantic Segmentation with Extremely Limited Labels

    Authors: Ping Guo, Chengzhou Li, Guanchen Meng, Qi Jia, Jinyuan Liu, Zhu Liu, Yu Liu, Zhongxuan Luo, Xin Fan

    Abstract: As one of the most important underwater sensing technologies, forward-looking sonar exhibits unique imaging characteristics. Sonar images are often affected by severe speckle noise, low texture contrast, acoustic shadows, and geometric distortions. These factors make it difficult for traditional teacher-student frameworks to achieve satisfactory performance in sonar semantic segmentation tasks und… ▽ More

    Submitted 22 March, 2026; originally announced March 2026.

    Comments: Accepted to CVPR 2026 Findings

  7. arXiv:2603.19504  [pdf, ps, other

    cs.HC cs.AI

    Beyond the Desk: Barriers and Future Opportunities for AI to Assist Scientists in Embodied Physical Tasks

    Authors: Irene Hou, Alexander Qin, Lauren Cheng, Philip J. Guo

    Abstract: More scientists are now using AI, but prior studies have examined only how they use it 'at the desk' for computer-based work. However, given that scientific work often happens 'beyond the desk' at lab and field sites, we conducted the first study of how scientific practitioners use AI for embodied physical tasks. We interviewed 12 scientific practitioners doing hands-on lab and fieldwork in domain… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: 16 pages, 4 figures, 1 table. Accepted to CHI 2026 (preprint)

  8. arXiv:2603.18260  [pdf, ps, other

    cs.RO

    Manufacturing Micro-Patterned Surfaces with Multi-Robot Systems

    Authors: Annalisa T. Taylor, Malachi Landis, Ping Guo, Todd D. Murphey

    Abstract: Applying micro-patterns to surfaces has been shown to impart useful physical properties such as drag reduction and hydrophobicity. However, current manufacturing techniques cannot produce micro-patterned surfaces at scale due to high-cost machinery and inefficient coverage techniques such as raster-scanning. In this work, we use multiple robots, each equipped with a patterning tool, to manufacture… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  9. arXiv:2603.16822  [pdf, ps, other

    cs.AI

    Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence

    Authors: Zhitao Zeng, Mengya Xu, Jian Jiang, Pengfei Guo, Yunqiu Xu, Zhu Zhuo, Chang Han Low, Yufan He, Dong Yang, Chenxi Lin, Yiming Gu, Jiaxin Guo, Yutong Ban, Daguang Xu, Qi Dou, Yueming Jin

    Abstract: Surgical intelligence has the potential to improve the safety and consistency of surgical care, yet most existing surgical AI frameworks remain task-specific and struggle to generalize across procedures and institutions. Although multimodal foundation models, particularly multimodal large language models, have demonstrated strong cross-task capabilities across various medical domains, their advanc… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

    MSC Class: 68T45 ACM Class: I.2.10

  10. arXiv:2603.13024  [pdf, ps, other

    cs.CV cs.AI cs.LG eess.IV

    SAW: Toward a Surgical Action World Model via Controllable and Scalable Video Generation

    Authors: Sampath Rapuri, Lalithkumar Seenivasan, Dominik Schneider, Roger Soberanis-Mukul, Yufan He, Hao Ding, Jiru Xu, Chenhao Yu, Chenyan Jing, Pengfei Guo, Daguang Xu, Mathias Unberath

    Abstract: A surgical world model capable of generating realistic surgical action videos with precise control over tool-tissue interactions can address fundamental challenges in surgical AI and simulation -- from data scarcity and rare event synthesis to bridging the sim-to-real gap for surgical automation. However, current video generation methods, the very core of such surgical world models, require expens… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

    Comments: The manuscript is under review

  11. arXiv:2603.11992  [pdf, ps, other

    cs.AI cs.LG

    Few-for-Many Personalized Federated Learning

    Authors: Ping Guo, Tiantian Zhang, Xi Lin, Xiang Li, Zhi-Ri Tang, Qingfu Zhang

    Abstract: Personalized Federated Learning (PFL) aims to train customized models for clients with highly heterogeneous data distributions while preserving data privacy. Existing approaches often rely on heuristics like clustering or model interpolation, which lack principled mechanisms for balancing heterogeneous client objectives. Serving $M$ clients with distinct data distributions is inherently a multi-ob… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

  12. arXiv:2603.09166  [pdf, ps, other

    cs.DS cs.CR

    Fast and Optimal Differentially Private Frequent-Substring Mining

    Authors: Peaker Guo, Rayne Holland, Hao Wu

    Abstract: Given a dataset of $n$ user-contributed strings, each of length at most $\ell$, a key problem is how to identify all frequent substrings while preserving each user's privacy. Recent work by Bernardini et al. (PODS'25) introduced a $\varepsilon$-differentially private algorithm achieving near-optimal error, but at the prohibitive cost of $O(n^2\ell^4)$ space and processing time. In this work, we pr… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

    Comments: 21 pages, 2 figures, 1 table

  13. arXiv:2603.05181  [pdf, ps, other

    cs.CV

    Mario: Multimodal Graph Reasoning with Large Language Models

    Authors: Yuanfu Sun, Kang Li, Pengkang Guo, Jiajin Liu, Qiaoyu Tan

    Abstract: Recent advances in large language models (LLMs) have opened new avenues for multimodal reasoning. Yet, most existing methods still rely on pretrained vision-language models (VLMs) to encode image-text pairs in isolation, ignoring the relational structure that real-world multimodal data naturally form. This motivates reasoning on multimodal graphs (MMGs), where each node has textual and visual attr… ▽ More

    Submitted 26 March, 2026; v1 submitted 5 March, 2026; originally announced March 2026.

    Comments: CVPR 2026

  14. arXiv:2603.00057  [pdf, ps, other

    cs.CY cs.AI cs.HC

    "Bespoke Bots": Diverse Instructor Needs for Customizing Generative AI Classroom Chatbots

    Authors: Irene Hou, Zeyu Xiong, Philip J. Guo, April Yi Wang

    Abstract: Instructors are increasingly experimenting with AI chatbots for classroom support. To investigate how instructors adapt chatbots to their own contexts, we first analyzed existing resources that provide prompts for educational purposes. We identified ten common categories of customization, such as persona, guardrails, and personalization. We then conducted interviews with ten university STEM instru… ▽ More

    Submitted 10 February, 2026; originally announced March 2026.

    Comments: 10 pages, 1 figure. Accepted to CHI 2026

  15. arXiv:2602.24156  [pdf, ps, other

    cs.RO

    Humanoid Robots as First Assistants in Endoscopic Surgery

    Authors: Sue Min Cho, Jan Emily Mangulabnan, Han Zhang, Zhekai Mao, Yufan He, Pengfei Guo, Daguang Xu, Gregory Hager, Masaru Ishii, Mathias Unberath

    Abstract: Humanoid robots have become a focal point of technological ambition, with claims of surgical capability within years in mainstream discourse. These projections are aspirational yet lack empirical grounding. To date, no humanoid has assisted a surgeon through an actual procedure, let alone performed one. The work described here breaks this new ground. Here we report a proof of concept in which a te… ▽ More

    Submitted 27 February, 2026; originally announced February 2026.

  16. arXiv:2602.16152  [pdf, ps, other

    math.CO cs.DM cs.FL

    The Smallest String Attractors of Fibonacci and Period-Doubling Words

    Authors: Mutsunori Banbara, Hideo Bannai, Peaker Guo, Dominik Köppl, Takuya Mieno, Yoshio Okamoto

    Abstract: A string attractor of a string $T[1..|T|]$ is a set of positions $Γ$ of $T$ such that any substring $w$ of $T$ has an occurrence that crosses a position in $Γ$, i.e., there is a position $i$ such that $w = T[i..i+|w|-1]$ and the intersection $[i,i+|w|-1]\cap Γ$ is nonempty. The size of the smallest string attractor of Fibonacci words is known to be $2$. We completely characterize the set of all sm… ▽ More

    Submitted 17 February, 2026; originally announced February 2026.

  17. arXiv:2602.14385  [pdf, ps, other

    cs.DS

    Sensitivity of Repetitiveness Measures to String Reversal

    Authors: Hideo Bannai, Yuto Fujie, Peaker Guo, Shunsuke Inenaga, Yuto Nakashima, Simon J. Puglisi, Cristian Urbina

    Abstract: We study the impact that string reversal can have on several repetitiveness measures. First, we exhibit an infinite family of strings where the number, $r$, of runs in the run-length encoding of the Burrows--Wheeler transform (BWT) can increase additively by $Θ(n)$ when reversing the string. This substantially improves the known $Ω(\log n)$ lower-bound for the additive sensitivity of $r$ and it is… ▽ More

    Submitted 15 February, 2026; originally announced February 2026.

  18. arXiv:2602.06570  [pdf, ps, other

    cs.CL

    Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making

    Authors: Baichuan-M3 Team, :, Chengfeng Dou, Fan Yang, Fei Li, Jiyuan Jia, Qiang Ju, Shuai Wang, Tianpeng Li, Xiangrong Zeng, Yijie Zhou, Hongda Zhang, Jinyang Tai, Linzhuang Sun, Peidong Guo, Yichuan Mo, Xiaochuan Wang, Hengfu Cui, Zhishou Zhang

    Abstract: We introduce Baichuan-M3, a medical-enhanced large language model engineered to shift the paradigm from passive question-answering to active, clinical-grade decision support. Addressing the limitations of existing systems in open-ended consultations, Baichuan-M3 utilizes a specialized training pipeline to model the systematic workflow of a physician. Key capabilities include: (i) proactive informa… ▽ More

    Submitted 6 February, 2026; originally announced February 2026.

  19. arXiv:2602.06475  [pdf, ps, other

    cs.LG

    Towards Generalizable Reasoning: Group Causal Counterfactual Policy Optimization for LLM Reasoning

    Authors: Jingyao Wang, Peizheng Guo, Wenwen Qiang, Jiahuan Zhou, Huijie Guo, Changwen Zheng, Hui Xiong

    Abstract: Large language models (LLMs) excel at complex tasks with advances in reasoning capabilities. However, existing reward mechanisms remain tightly coupled to final correctness and pay little attention to the underlying reasoning process: trajectories with sound reasoning but wrong answers receive low credit, while lucky guesses with flawed logic may be highly rewarded, affecting reasoning generalizat… ▽ More

    Submitted 6 February, 2026; originally announced February 2026.

  20. arXiv:2602.01662  [pdf, ps, other

    cs.RO

    AgenticLab: A Real-World Robot Agent Platform that Can See, Think, and Act

    Authors: Pengyuan Guo, Zhonghao Mai, Zhengtong Xu, Kaidi Zhang, Heng Zhang, Zichen Miao, Arash Ajoudani, Zachary Kingston, Qiang Qiu, Yu She

    Abstract: Recent advances in large vision-language models (VLMs) have demonstrated generalizable open-vocabulary perception and reasoning, yet their real-robot manipulation capability remains unclear for long-horizon, closed-loop execution in unstructured, in-the-wild environments. Prior VLM-based manipulation pipelines are difficult to compare across different research groups' setups, and many evaluations… ▽ More

    Submitted 9 March, 2026; v1 submitted 2 February, 2026; originally announced February 2026.

    Comments: Added appendix

  21. arXiv:2602.00478  [pdf, ps, other

    cs.LG cs.AI cs.NE math.OC

    Quality-Diversity Optimization as Multi-Objective Optimization

    Authors: Xi Lin, Ping Guo, Yilu Liu, Qingfu Zhang, Jianyong Sun

    Abstract: The Quality-Diversity (QD) optimization aims to discover a collection of high-performing solutions that simultaneously exhibit diverse behaviors within a user-defined behavior space. This paradigm has stimulated significant research interest and demonstrated practical utility in domains including robot control, creative design, and adversarial sample generation. A variety of QD algorithms with dis… ▽ More

    Submitted 30 January, 2026; originally announced February 2026.

  22. arXiv:2601.22803  [pdf, ps, other

    cs.AI cs.SE

    CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning

    Authors: Ji Shi, Peiming Guo, Meishan Zhang, Miao Zhang, Xuebo Liu, Min Zhang, Weili Guan

    Abstract: Code verifiers play a critical role in post-verification for LLM-based code generation, yet existing supervised fine-tuning methods suffer from data scarcity, high failure rates, and poor inference efficiency. While reinforcement learning (RL) offers a promising alternative by optimizing models through execution-driven rewards without labeled supervision, our preliminary results show that naive RL… ▽ More

    Submitted 30 January, 2026; originally announced January 2026.

    Comments: 17 pages, 3 figures

  23. arXiv:2601.12715  [pdf, ps, other

    cs.CV cs.AI

    RSOD: Reliability-Guided Sonar Image Object Detection with Extremely Limited Labels

    Authors: Chengzhou Li, Ping Guo, Guanchen Meng, Qi Jia, Jinyuan Liu, Zhu Liu, Xiaokang Liu, Yu Liu, Zhongxuan Luo, Xin Fan

    Abstract: Object detection in sonar images is a key technology in underwater detection systems. Compared to natural images, sonar images contain fewer texture details and are more susceptible to noise, making it difficult for non-experts to distinguish subtle differences between classes. This leads to their inability to provide precise annotation data for sonar images. Therefore, designing effective object… ▽ More

    Submitted 18 January, 2026; originally announced January 2026.

    Comments: Accepted by AAAI 2026,9 pages,10 figures

  24. arXiv:2601.12641  [pdf, ps, other

    cs.AI

    STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models

    Authors: Xiangyu Shi, Junyang Ding, Xu Zhao, Sinong Zhan, Payal Mohapatra, Daniel Quispe, Kojo Welbeck, Jian Cao, Wei Chen, Ping Guo, Qi Zhu

    Abstract: Computer-aided design (CAD) is vital to modern manufacturing, yet model creation remains labor-intensive and expertise-heavy. To enable non-experts to translate intuitive design intent into manufacturable artifacts, recent large language models-based text-to-CAD efforts focus on command sequences or script-based formats like CadQuery. However, these formats are kernel-dependent and lack universali… ▽ More

    Submitted 18 January, 2026; originally announced January 2026.

    Comments: Accepted to the Design, Automation & Test in Europe Conference (DATE) 2026

  25. arXiv:2601.08970  [pdf

    cond-mat.mtrl-sci cs.LG

    Machine Learning-Driven Creep Law Discovery Across Alloy Compositional Space

    Authors: Hongshun Chen, Ryan Zhou, Rujing Zha, Zihan Chen, Wenpan Li, Rowan Rolark, John Patrick Reidy, Jian Cao, Ping Guo, David C. Dunand, Horacio D. Espinosa

    Abstract: Hihg-temperature creep characterization of structural alloys traditionally relies on serial uniaxial tests, which are highly inefficient for exploring the large search space of alloy compositions and for material discovery. Here, we introduce a machine-learning-assisted, high-throughput framework for creep law identification based on a dimple array bulge instrument (DABI) configuration, which enab… ▽ More

    Submitted 13 January, 2026; originally announced January 2026.

    Comments: 27 pages, 7 figures

  26. arXiv:2601.06845  [pdf, ps, other

    cs.AI

    Code Evolution for Control: Synthesizing Policies via LLM-Driven Evolutionary Search

    Authors: Ping Guo, Chao Li, Yinglan Feng, Chaoning Zhang

    Abstract: Designing effective control policies for autonomous systems remains a fundamental challenge, traditionally addressed through reinforcement learning or manual engineering. While reinforcement learning has achieved remarkable success, it often suffers from high sample complexity, reward shaping difficulties, and produces opaque neural network policies that are hard to interpret or verify. Manual des… ▽ More

    Submitted 11 January, 2026; originally announced January 2026.

  27. arXiv:2601.03267  [pdf, ps, other

    cs.CL cs.AI

    OpenAI GPT-5 System Card

    Authors: Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, Akshay Nathan, Alan Luo, Alec Helyar, Aleksander Madry, Aleksandr Efremov, Aleksandra Spyra, Alex Baker-Whitcomb, Alex Beutel, Alex Karpenko, Alex Makelov, Alex Neitz, Alex Wei, Alexandra Barr, Alexandre Kirchmeyer, Alexey Ivanov , et al. (459 additional authors not shown)

    Abstract: This is the system card published alongside the OpenAI GPT-5 launch, August 2025. GPT-5 is a unified system with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say 'think hard about this' in… ▽ More

    Submitted 19 December, 2025; originally announced January 2026.

  28. arXiv:2512.23162  [pdf, ps, other

    cs.RO cs.CV

    Cosmos-H-Surgical: Learning Surgical Robot Policies from Videos via World Modeling

    Authors: Yufan He, Pengfei Guo, Mengya Xu, Zhaoshuo Li, Andriy Myronenko, Dillan Imans, Bingjie Liu, Dongren Yang, Mingxue Gu, Yongnan Ji, Yueming Jin, Ren Zhao, Baiyong Shen, Daguang Xu

    Abstract: Data scarcity remains a fundamental barrier to achieving fully autonomous surgical robots. While large scale vision language action (VLA) models have shown impressive generalization in household and industrial manipulation by leveraging paired video action data from diverse domains, surgical robotics suffers from the paucity of datasets that include both visual observations and accurate robot kine… ▽ More

    Submitted 10 March, 2026; v1 submitted 28 December, 2025; originally announced December 2025.

  29. arXiv:2512.21227  [pdf, ps, other

    cond-mat.mtrl-sci cs.AI

    PhononBench:A Large-Scale Phonon-Based Benchmark for Dynamical Stability in Crystal Generation

    Authors: Xiao-Qi Han, Peng-Jie Guo, Ze-Feng Gao, Zhong-Yi Lu

    Abstract: In this work, we introduce PhononBench, the first large-scale benchmark for dynamical stability in AI-generated crystals. Leveraging the recently developed MatterSim interatomic potential, which achieves DFT-level accuracy in phonon predictions across more than 10,000 materials, PhononBench enables efficient large-scale phonon calculations and dynamical-stability analysis for 108,843 crystal struc… ▽ More

    Submitted 26 December, 2025; v1 submitted 24 December, 2025; originally announced December 2025.

    Comments: 19 pages, 6 figures

  30. arXiv:2512.17137  [pdf, ps, other

    cs.CV cs.AI

    SDUM: A Scalable Deep Unrolled Model for Universal MRI Reconstruction

    Authors: Puyang Wang, Pengfei Guo, Keyi Chai, Jinyuan Zhou, Daguang Xu, Shanshan Jiang

    Abstract: Clinical MRI encompasses diverse imaging protocols--spanning anatomical targets (cardiac, brain, knee), contrasts (T1, T2, mapping), sampling patterns (Cartesian, radial, spiral, kt-space), and acceleration factors--yet current deep learning reconstructions are typically protocol-specific, hindering generalization and deployment. We introduce Scalable Deep Unrolled Model (SDUM), a universal framew… ▽ More

    Submitted 11 March, 2026; v1 submitted 18 December, 2025; originally announced December 2025.

    Comments: https://github.com/NVIDIA-Medtech/NV-Raw2insights-MRI

  31. arXiv:2512.16126  [pdf, ps, other

    cs.LG cs.CV

    Dual-View Inference Attack: Machine Unlearning Amplifies Privacy Exposure

    Authors: Lulu Xue, Shengshan Hu, Linqiang Qian, Peijin Guo, Yechao Zhang, Minghui Li, Yanjun Zhang, Dayong Ye, Leo Yu Zhang

    Abstract: Machine unlearning is a newly popularized technique for removing specific training data from a trained model, enabling it to comply with data deletion requests. While it protects the rights of users requesting unlearning, it also introduces new privacy risks. Prior works have primarily focused on the privacy of data that has been unlearned, while the risks to retained data remain largely unexplore… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

    Comments: Accepeted by AAAI2026

  32. arXiv:2512.11645  [pdf, ps, other

    cs.CV

    FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint

    Authors: Jiapeng Tang, Kai Li, Chengxiang Yin, Liuhao Ge, Fei Jiang, Jiu Xu, Matthias Nießner, Christian Häne, Timur Bagautdinov, Egor Zakharov, Peihong Guo

    Abstract: We introduce FactorPortrait, a video diffusion method for controllable portrait animation that enables lifelike synthesis from disentangled control signals of facial expressions, head movement, and camera viewpoints. Given a single portrait image, a driving video, and camera trajectories, our method animates the portrait by transferring facial expressions and head movements from the driving video… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

    Comments: Project page: https://tangjiapeng.github.io/FactorPortrait/

  33. arXiv:2512.04987  [pdf, ps, other

    cs.CL

    Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

    Authors: Nex-AGI Team, :, Yuxuan Cai, Lu Chen, Qiaoling Chen, Yuyang Ding, Liwen Fan, Wenjie Fu, Yufei Gao, Honglin Guo, Pinxue Guo, Zhenhua Han, Zhengfu He, Hanglei Hu, Kai Hu, Shengjia Hua, Tianyu Huai, Baodai Huang, Li Ji, Zhen Jiang, Zhikai Lei, Bufan Li, Jiahang Lin, Lizhi Lin, Jinxiu Liu , et al. (41 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) from passive responders to autonomous agents necessitates a fundamental shift in learning paradigms -- from static imitation to incentive-driven decision making. However, this transition is significantly impeded by the lack of scalable infrastructure capable of constructing high-quality interaction signals for effective policy learning. To address this… ▽ More

    Submitted 4 December, 2025; originally announced December 2025.

  34. arXiv:2512.04483  [pdf, ps, other

    cs.CV

    DeRA: Decoupled Representation Alignment for Video Tokenization

    Authors: Pengbo Guo, Junke Wang, Zhen Xing, Chengxu Liu, Daoguo Dong, Xueming Qian, Zuxuan Wu

    Abstract: This paper presents DeRA, a novel 1D video tokenizer that decouples the spatial-temporal representation learning in video tokenization to achieve better training efficiency and performance. Specifically, DeRA maintains a compact 1D latent space while factorizing video encoding into appearance and motion streams, which are aligned with pretrained vision foundation models to capture the spatial sema… ▽ More

    Submitted 4 December, 2025; originally announced December 2025.

  35. arXiv:2512.03101  [pdf, ps, other

    cs.LG cs.AI cs.CV

    ALARM: Automated MLLM-Based Anomaly Detection in Complex-EnviRonment Monitoring with Uncertainty Quantification

    Authors: Congjing Zhang, Feng Lin, Xinyi Zhao, Pei Guo, Wei Li, Lin Chen, Chaoyue Zhao, Shuai Huang

    Abstract: The advance of Large Language Models (LLMs) has greatly stimulated research interest in developing multi-modal LLM (MLLM)-based visual anomaly detection (VAD) algorithms that can be deployed in complex environments. The challenge is that in these complex environments, the anomalies are sometimes highly contextual and also ambiguous, and thereby, uncertainty quantification (UQ) is a crucial capacit… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

  36. arXiv:2512.02018  [pdf, ps, other

    cs.CV cs.RO

    Data-Centric Visual Development for Self-Driving Labs

    Authors: Anbang Liu, Guanzhong Hu, Jiayi Wang, Ping Guo, Han Liu

    Abstract: Self-driving laboratories offer a promising path toward reducing the labor-intensive, time-consuming, and often irreproducible workflows in the biological sciences. Yet their stringent precision requirements demand highly robust models whose training relies on large amounts of annotated data. However, this kind of data is difficult to obtain in routine practice, especially negative samples. In thi… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

    Comments: 11 pages, 4 figures

  37. arXiv:2511.14774  [pdf, ps, other

    cs.CL cs.AI

    LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs

    Authors: Pei-Fu Guo, Yun-Da Tsai, Chun-Chia Hsu, Kai-Xin Chen, Ya-An Tsai, Kai-Wei Chang, Nanyun Peng, Mi-Yen Yeh, Shou-De Lin

    Abstract: Evaluating cross-lingual knowledge transfer in large language models is challenging, as correct answers in a target language may arise either from genuine transfer or from prior exposure during pre-training. We present LiveCLKTBench, an automated generation pipeline specifically designed to isolate and measure cross-lingual knowledge transfer. Our pipeline identifies self-contained, time-sensitive… ▽ More

    Submitted 11 April, 2026; v1 submitted 3 November, 2025; originally announced November 2025.

  38. arXiv:2511.13787  [pdf, ps, other

    cs.LG cs.CV

    Exploring Transferability of Self-Supervised Learning by Task Conflict Calibration

    Authors: Huijie Guo, Jingyao Wang, Peizheng Guo, Xingchen Shen, Changwen Zheng, Wenwen Qiang

    Abstract: In this paper, we explore the transferability of SSL by addressing two central questions: (i) what is the representation transferability of SSL, and (ii) how can we effectively model this transferability? Transferability is defined as the ability of a representation learned from one task to support the objective of another. Inspired by the meta-learning paradigm, we construct multiple SSL tasks… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

  39. arXiv:2511.12140  [pdf, ps, other

    cs.CL cs.CV

    Seeing is Believing: Rich-Context Hallucination Detection for MLLMs via Backward Visual Grounding

    Authors: Pinxue Guo, Chongruo Wu, Xinyu Zhou, Lingyi Hong, Zhaoyu Chen, Jinglun Li, Kaixun Jiang, Sen-ching Samson Cheung, Wei Zhang, Wenqiang Zhang

    Abstract: Multimodal Large Language Models (MLLMs) have unlocked powerful cross-modal capabilities, but still significantly suffer from hallucinations. As such, accurate detection of hallucinations in MLLMs is imperative for ensuring their reliability in practical applications. To this end, guided by the principle of "Seeing is Believing", we introduce VBackChecker, a novel reference-free hallucination dete… ▽ More

    Submitted 15 November, 2025; originally announced November 2025.

  40. arXiv:2511.10852  [pdf, ps, other

    eess.SY cs.AI

    Adaptive Digital Twin of Sheet Metal Forming via Proper Orthogonal Decomposition-Based Koopman Operator with Model Predictive Control

    Authors: Yi-Ping Chen, Derick Suarez, Ying-Kuan Tsai, Vispi Karkaria, Guanzhong Hu, Zihan Chen, Ping Guo, Jian Cao, Wei Chen

    Abstract: Digital Twin (DT) technologies are transforming manufacturing by enabling real-time prediction, monitoring, and control of complex processes. Yet, applying DT to deformation-based metal forming remains challenging because of the strongly coupled spatial-temporal behavior and the nonlinear relationship between toolpath and material response. For instance, sheet-metal forming by the English wheel, a… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

  41. arXiv:2511.09900  [pdf, ps, other

    cs.AI cs.CE

    Boosting In-Silicon Directed Evolution with Fine-Tuned Protein Language Model and Tree Search

    Authors: Yaodong Yang, Yang Wang, Jinpeng Li, Pei Guo, Da Han, Guangyong Chen, Pheng-Ann Heng

    Abstract: Protein evolution through amino acid mutations is a cornerstone of life sciences. Recent advances in protein language models have shown rich evolutionary patterns, offering unprecedented potential for in-silicon directed evolution. However, existing directed evolution methods largely rely on heuristic evolution strategies and have yet to efficiently integrate the transformative protein language mo… ▽ More

    Submitted 6 January, 2026; v1 submitted 12 November, 2025; originally announced November 2025.

    Comments: working in progress, 20 pages, 6 figures, 16 tables, updating template

  42. arXiv:2511.08496  [pdf, ps, other

    cs.SD cs.AI eess.AS

    HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios

    Authors: Bingsong Bai, Yizhong Geng, Fengping Wang, Cong Wang, Puyuan Guo, Yingming Gao, Ya Li

    Abstract: Zero-shot singing voice conversion (SVC) transforms a source singer's timbre to an unseen target speaker's voice while preserving melodic content without fine-tuning. Existing methods model speaker timbre and vocal content separately, losing essential acoustic information that degrades output quality while requiring significant computational resources. To overcome these limitations, we propose HQ-… ▽ More

    Submitted 15 November, 2025; v1 submitted 11 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI 2026 main technical track

  43. arXiv:2510.27281  [pdf, ps, other

    cs.LG cs.AI

    HiF-DTA: Hierarchical Feature Learning Network for Drug-Target Affinity Prediction

    Authors: Minghui Li, Yuanhang Wang, Peijin Guo, Wei Wan, Shengshan Hu, Shengqing Hu

    Abstract: Accurate prediction of Drug-Target Affinity (DTA) is crucial for reducing experimental costs and accelerating early screening in computational drug discovery. While sequence-based deep learning methods avoid reliance on costly 3D structures, they still overlook simultaneous modeling of global sequence semantic features and local topological structural features within drugs and proteins, and repres… ▽ More

    Submitted 31 October, 2025; originally announced October 2025.

    Comments: Accepted by International Conference on Bioinformatics and Biomedicine (BIBM 25)

  44. arXiv:2510.23968  [pdf, ps, other

    cs.CV

    Reasoning Visual Language Model for Chest X-Ray Analysis

    Authors: Andriy Myronenko, Dong Yang, Baris Turkbey, Mariam Aboian, Sena Azamat, Esra Akcicek, Hongxu Yin, Pavlo Molchanov, Marc Edgar, Yufan He, Pengfei Guo, Yucheng Tang, Daguang Xu

    Abstract: Vision-language models (VLMs) have shown strong promise for medical image analysis, but most remain opaque, offering predictions without the transparent, stepwise reasoning clinicians rely on. We present a framework that brings chain-of-thought (CoT) reasoning to chest X-ray interpretation. Inspired by reasoning-first training paradigms, our approach is designed to learn how experts reason, not ju… ▽ More

    Submitted 29 October, 2025; v1 submitted 27 October, 2025; originally announced October 2025.

    Comments: NV-Reason-CXR-3B

  45. arXiv:2510.21351  [pdf, ps, other

    cs.CV

    Dynamic Semantic-Aware Correlation Modeling for UAV Tracking

    Authors: Xinyu Zhou, Tongxin Pan, Lingyi Hong, Pinxue Guo, Haijing Guo, Zhaoyu Chen, Kaixun Jiang, Wenqiang Zhang

    Abstract: UAV tracking can be widely applied in scenarios such as disaster rescue, environmental monitoring, and logistics transportation. However, existing UAV tracking methods predominantly emphasize speed and lack exploration in semantic awareness, which hinders the search region from extracting accurate localization information from the template. The limitation results in suboptimal performance under ty… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS2025

  46. arXiv:2510.20639  [pdf, ps, other

    cs.CV

    Better Tokens for Better 3D: Advancing Vision-Language Modeling in 3D Medical Imaging

    Authors: Ibrahim Ethem Hamamci, Sezgin Er, Suprosanna Shit, Hadrien Reynaud, Dong Yang, Pengfei Guo, Marc Edgar, Daguang Xu, Bernhard Kainz, Bjoern Menze

    Abstract: Recent progress in vision-language modeling for 3D medical imaging has been fueled by large-scale computed tomography (CT) corpora with paired free-text reports, stronger architectures, and powerful pretrained models. This has enabled applications such as automated report generation and text-conditioned 3D image synthesis. Yet, current approaches struggle with high-resolution, long-sequence volume… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

    Comments: NeurIPS 2025

  47. arXiv:2510.03760  [pdf, ps, other

    cs.LG cs.AI

    EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models

    Authors: Ping Guo, Chenyu Zhu, Siyuan Chen, Fei Liu, Xi Lin, Zhichao Lu, Qingfu Zhang

    Abstract: CUDA kernel optimization has become a critical bottleneck for AI performance, as deep learning training and inference efficiency directly depends on highly optimized GPU kernels. Despite the promise of Large Language Models (LLMs) for automating kernel optimization, this field suffers from a fragmented ecosystem of isolated and incomparable approaches with unclear problem formulations. Further… ▽ More

    Submitted 4 October, 2025; originally announced October 2025.

    Comments: Under Review of ICLR 2026

  48. arXiv:2509.19300  [pdf, ps, other

    cs.CV

    CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching

    Authors: Chen Chen, Pengsheng Guo, Liangchen Song, Jiasen Lu, Rui Qian, Xinze Wang, Tsu-Jui Fu, Wei Liu, Yinfei Yang, Alex Schwing

    Abstract: Conditional generative modeling aims to learn a conditional data distribution from samples containing data-condition pairs. For this, diffusion and flow-based methods have attained compelling results. These methods use a learned (flow) model to transport an initial standard Gaussian noise that ignores the condition to the conditional data distribution. The model is hence required to learn both mas… ▽ More

    Submitted 23 October, 2025; v1 submitted 23 September, 2025; originally announced September 2025.

  49. arXiv:2509.16645  [pdf, ps, other

    cs.CV

    ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents

    Authors: Yichen Wang, Hangtao Zhang, Hewen Pan, Ziqi Zhou, Xianlong Wang, Peijin Guo, Lulu Xue, Shengshan Hu, Minghui Li, Leo Yu Zhang

    Abstract: Vision-Language Models (VLMs), with their strong reasoning and planning capabilities, are widely used in embodied decision-making (EDM) tasks in embodied agents, such as autonomous driving and robotic manipulation. Recent research has increasingly explored adversarial attacks on VLMs to reveal their vulnerabilities. However, these attacks either rely on overly strong assumptions, requiring full kn… ▽ More

    Submitted 20 September, 2025; originally announced September 2025.

  50. arXiv:2509.15556  [pdf, ps, other

    cs.CL cs.AI

    Exploring Polyglot Harmony: On Multilingual Data Allocation for Large Language Models Pretraining

    Authors: Ping Guo, Yubing Ren, Binbin Liu, Fengze Liu, Haobin Lin, Yifan Zhang, Bingni Zhang, Taifeng Wang, Yin Zheng

    Abstract: Large language models (LLMs) have become integral to a wide range of applications worldwide, driving an unprecedented global demand for effective multilingual capabilities. Central to achieving robust multilingual performance is the strategic allocation of language proportions within training corpora. However, determining optimal language ratios is highly challenging due to intricate cross-lingual… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.