Skip to main content

Showing 1–50 of 154 results for author: Liang, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.13707  [pdf, ps, other

    physics.bio-ph cs.LG cs.NE stat.ML

    Modular connectivity in neural networks emerges from Poisson noise-motivated regularisation, and promotes robustness and compositional generalisation

    Authors: Daoyuan Qian, Qiyao Liang, Ila Fiete

    Abstract: Circuits in the brain commonly exhibit modular architectures that factorise complex tasks, resulting in the ability to compositionally generalise and reduce catastrophic forgetting. In contrast, artificial neural networks (ANNs) appear to mix all processing, because modular solutions are difficult to find as they are vanishing subspaces in the space of possible solutions. Here, we draw inspiration… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

  2. arXiv:2512.02780  [pdf, ps, other

    cs.CV

    Rethinking Surgical Smoke: A Smoke-Type-Aware Laparoscopic Video Desmoking Method and Dataset

    Authors: Qifan Liang, Junlin Li, Zhen Han, Xihao Wang, Zhongyuan Wang, Bin Mei

    Abstract: Electrocautery or lasers will inevitably generate surgical smoke, which hinders the visual guidance of laparoscopic videos for surgical procedures. The surgical smoke can be classified into different types based on its motion patterns, leading to distinctive spatio-temporal characteristics across smoky laparoscopic videos. However, existing desmoking methods fail to account for such smoke-type-spe… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

    Comments: 12 pages, 15 figures. Accepted to AAAI-26 (Main Technical Track)

  3. arXiv:2512.02580  [pdf, ps, other

    cs.CL

    From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

    Authors: Changpeng Yang, Jinyang Wu, Yuchen Liu, Shuai Zhang, Yang Li, Qiliang Liang, Hongzhen Wang, Shuai Nie, Jiaming Xu, Runyu Shi, Ying Huang, Guoquan Zhang

    Abstract: Reinforcement learning has emerged as a paradigm for post-training large language models, boosting their reasoning capabilities. Such approaches compute an advantage value for each sample, reflecting better or worse performance than expected, thereby yielding both positive and negative signals for training. However, the indiscriminate mixing of the two signals in existing methods, especially from… ▽ More

    Submitted 15 December, 2025; v1 submitted 2 December, 2025; originally announced December 2025.

    Comments: Accepted by AAAI 2026

  4. arXiv:2512.00074  [pdf, ps, other

    cs.RO cs.CV

    Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning

    Authors: Qiwei Liang, Boyang Cai, Minghao Lai, Sitong Zhuang, Tao Lin, Yan Qin, Yixuan Ye, Jiaming Liang, Renjing Xu

    Abstract: Despite strong results on recognition and segmentation, current 3D visual pre-training methods often underperform on robotic manipulation. We attribute this gap to two factors: the lack of state-action-state dynamics modeling and the unnecessary redundancy of explicit geometric reconstruction. We introduce AFRO, a self-supervised framework that learns dynamics-aware 3D representations without acti… ▽ More

    Submitted 3 December, 2025; v1 submitted 24 November, 2025; originally announced December 2025.

  5. arXiv:2510.02887  [pdf, ps, other

    cs.SE

    GramTrans: A Better Code Representation Approach in Code Generation

    Authors: Zhao Zhang, Qingyuan Liang, Zeyu Sun, Yizhou Chen, Guoqing Wang, Yican Sun, Lu Zhang, Ge Li, Yingfei Xiong

    Abstract: Code generation has shown great promise in assisting software development. A fundamental yet underexplored question is how the choice of code representation affects model performance. While existing studies employ various representations, such as treating code as plain text, grammar rule sequences, or syntax tree sequences, they lack a principled understanding of the relationship between parsing d… ▽ More

    Submitted 3 October, 2025; originally announced October 2025.

  6. arXiv:2509.24372  [pdf, ps, other

    cs.LG cs.AI cs.NE

    Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning

    Authors: Xin Qiu, Yulu Gan, Conor F. Hayes, Qiyao Liang, Elliot Meyerson, Babak Hodjat, Risto Miikkulainen

    Abstract: Fine-tuning pre-trained large language models (LLMs) for down-stream tasks is a critical step in the AI deployment pipeline. Reinforcement learning (RL) is arguably the most prominent fine-tuning method, contributing to the birth of many state-of-the-art LLMs. In contrast, evolution strategies (ES), which once showed comparable performance to RL on models with a few million parameters, was neglect… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: 24 pages, including the appendix

  7. arXiv:2509.19733  [pdf, ps, other

    cs.CV

    Robust RGB-T Tracking via Learnable Visual Fourier Prompt Fine-tuning and Modality Fusion Prompt Generation

    Authors: Hongtao Yang, Bineng Zhong, Qihua Liang, Zhiruo Zhu, Yaozong Zheng, Ning Li

    Abstract: Recently, visual prompt tuning is introduced to RGB-Thermal (RGB-T) tracking as a parameter-efficient finetuning (PEFT) method. However, these PEFT-based RGB-T tracking methods typically rely solely on spatial domain information as prompts for feature extraction. As a result, they often fail to achieve optimal performance by overlooking the crucial role of frequency-domain information in prompt le… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

    Comments: Accepted by TMM2025

  8. arXiv:2509.12809  [pdf, ps, other

    cs.SE

    SateLight: A Satellite Application Update Framework for Satellite Computing

    Authors: Jinfeng Wen, Jianshu Zhao, Zixi Zhu, Xiaomin Zhang, Qi Liang, Ao Zhou, Shangguang Wang

    Abstract: Satellite computing is an emerging paradigm that empowers satellites to perform onboard processing tasks (i.e., \textit{satellite applications}), thereby reducing reliance on ground-based systems and improving responsiveness. However, enabling application software updates in this context remains a fundamental challenge due to application heterogeneity, limited ground-to-satellite bandwidth, and ha… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

    Comments: This paper has been accepted for publication in ASE 2025!

  9. arXiv:2509.09731  [pdf, ps, other

    cs.CL

    Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning

    Authors: Haiyang Yu, Yuchuan Wu, Fan Shi, Lei Liao, Jinghui Lu, Xiaodong Ge, Han Wang, Minghan Zhuo, Xuecheng Wu, Xiang Fei, Hao Feng, Guozhi Tang, An-Lan Wang, Hanshen Zhu, Yangfan He, Quanhuan Liang, Liyuan Meng, Chao Feng, Can Huang, Jingqun Tang, Bin Li

    Abstract: Chinese ancient documents, invaluable carriers of millennia of Chinese history and culture, hold rich knowledge across diverse fields but face challenges in digitization and understanding, i.e., traditional methods only scan images, while current Vision-Language Models (VLMs) struggle with their visual and linguistic complexity. Existing document benchmarks focus on English printed texts or simpli… ▽ More

    Submitted 10 September, 2025; originally announced September 2025.

  10. arXiv:2509.09227  [pdf

    eess.IV cs.CV

    Dynamic Structural Recovery Parameters Enhance Prediction of Visual Outcomes After Macular Hole Surgery

    Authors: Yinzheng Zhao, Zhihao Zhao, Rundong Jiang, Louisa Sackewitz, Quanmin Liang, Mathias Maier, Daniel Zapp, Peter Charbel Issa, Mohammad Ali Nasseri

    Abstract: Purpose: To introduce novel dynamic structural parameters and evaluate their integration within a multimodal deep learning (DL) framework for predicting postoperative visual recovery in idiopathic full-thickness macular hole (iFTMH) patients. Methods: We utilized a publicly available longitudinal OCT dataset at five stages (preoperative, 2 weeks, 3 months, 6 months, and 12 months). A stage specifi… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

    Comments: TVST

    ACM Class: I.4.6

  11. arXiv:2509.08624  [pdf, ps, other

    cs.CV cs.AI

    UOPSL: Unpaired OCT Predilection Sites Learning for Fundus Image Diagnosis Augmentation

    Authors: Zhihao Zhao, Yinzheng Zhao, Junjie Yang, Xiangtong Yao, Quanmin Liang, Daniel Zapp, Kai Huang, Nassir Navab, M. Ali Nasseri

    Abstract: Significant advancements in AI-driven multimodal medical image diagnosis have led to substantial improvements in ophthalmic disease identification in recent years. However, acquiring paired multimodal ophthalmic images remains prohibitively expensive. While fundus photography is simple and cost-effective, the limited availability of OCT data and inherent modality imbalance hinder further progress.… ▽ More

    Submitted 10 September, 2025; originally announced September 2025.

    Comments: BIBM

    ACM Class: I.4.10

  12. arXiv:2509.08618  [pdf, ps, other

    cs.CV

    CLAPS: A CLIP-Unified Auto-Prompt Segmentation for Multi-Modal Retinal Imaging

    Authors: Zhihao Zhao, Yinzheng Zhao, Junjie Yang, Xiangtong Yao, Quanmin Liang, Shahrooz Faghihroohi, Kai Huang, Nassir Navab, M. Ali Nasseri

    Abstract: Recent advancements in foundation models, such as the Segment Anything Model (SAM), have significantly impacted medical image segmentation, especially in retinal imaging, where precise segmentation is vital for diagnosis. Despite this progress, current methods face critical challenges: 1) modality ambiguity in textual disease descriptions, 2) a continued reliance on manual prompting for SAM-based… ▽ More

    Submitted 10 September, 2025; originally announced September 2025.

    Comments: BIBM

    ACM Class: I.4.6

  13. arXiv:2509.00266  [pdf, ps, other

    cs.CR

    CISAF: A Framework for Estimating the Security Posture of Academic and Research Cyberinfrastructure

    Authors: Qishen Liang, Jelena Mirkovic, Brian Kocoloski

    Abstract: Academic and research cyberinfrastructures (AR-CIs) present unique security challenges due to their collaborative nature, heterogeneous components, and the lack of practical security assessment frameworks tailored to their needs. We propose Cyber Infrastructure Security Analysis Framework (CISAF) -- a simple, systematic, mission-centric approach to analyze the security posture of a CI and prioriti… ▽ More

    Submitted 7 November, 2025; v1 submitted 29 August, 2025; originally announced September 2025.

    Comments: 8 pages, 2 figures. This version is a derivative of the previous tech report, with more formal expressions

  14. arXiv:2508.19251  [pdf, ps, other

    cs.SD cs.AI eess.AS

    MuSpike: A Benchmark and Evaluation Framework for Symbolic Music Generation with Spiking Neural Networks

    Authors: Qian Liang, Menghaoran Tang, Yi Zeng

    Abstract: Symbolic music generation has seen rapid progress with artificial neural networks, yet remains underexplored in the biologically plausible domain of spiking neural networks (SNNs), where both standardized benchmarks and comprehensive evaluation methods are lacking. To address this gap, we introduce MuSpike, a unified benchmark and evaluation framework that systematically assesses five representati… ▽ More

    Submitted 8 August, 2025; originally announced August 2025.

  15. M3HG: Multimodal, Multi-scale, and Multi-type Node Heterogeneous Graph for Emotion Cause Triplet Extraction in Conversations

    Authors: Qiao Liang, Ying Shen, Tiantian Chen, Lin Zhang

    Abstract: Emotion Cause Triplet Extraction in Multimodal Conversations (MECTEC) has recently gained significant attention in social media analysis, aiming to extract emotion utterances, cause utterances, and emotion categories simultaneously. However, the scarcity of related datasets, with only one published dataset featuring highly uniform dialogue scenarios, hinders model development in this field. To add… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

    Comments: 16 pages, 8 figures. Accepted to Findings of ACL 2025

    Journal ref: Findings of ACL 2025 (2025) 11416-11431

  16. arXiv:2508.11476  [pdf, ps, other

    cs.GR cs.CV

    SPG: Style-Prompting Guidance for Style-Specific Content Creation

    Authors: Qian Liang, Zichong Chen, Yang Zhou, Hui Huang

    Abstract: Although recent text-to-image (T2I) diffusion models excel at aligning generated images with textual prompts, controlling the visual style of the output remains a challenging task. In this work, we propose Style-Prompting Guidance (SPG), a novel sampling strategy for style-specific image generation. SPG constructs a style noise vector and leverages its directional deviation from unconditional nois… ▽ More

    Submitted 15 August, 2025; originally announced August 2025.

    Comments: Accepted to the Journal track of Pacific Graphics 2025

  17. arXiv:2508.11468  [pdf, ps, other

    cs.SE

    TRACY: Benchmarking Execution Efficiency of LLM-Based Code Translation

    Authors: Zhihao Gong, Zeyu Sun, Dong Huang, Qingyuan Liang, Jie M. Zhang, Dan Hao

    Abstract: Automatic code translation is a fundamental task in modern software development. While the advent of Large Language Models (LLMs) has significantly improved the correctness of code translation, the critical dimension of execution efficiency remains overlooked. To address this gap, we introduce TRACY, the first comprehensive benchmark designed to evaluate the execution efficiency of LLM-translated… ▽ More

    Submitted 15 August, 2025; originally announced August 2025.

  18. arXiv:2508.11433  [pdf, ps, other

    cs.CV

    MM-R1: Unleashing the Power of Unified Multimodal Large Language Models for Personalized Image Generation

    Authors: Qian Liang, Yujia Wu, Kuncheng Li, Jiwei Wei, Shiyuan He, Jinyu Guo, Ning Xie

    Abstract: Multimodal Large Language Models (MLLMs) with unified architectures excel across a wide range of vision-language tasks, yet aligning them with personalized image generation remains a significant challenge. Existing methods for MLLMs are frequently subject-specific, demanding a data-intensive fine-tuning process for every new subject, which limits their scalability. In this paper, we introduce MM-R… ▽ More

    Submitted 26 August, 2025; v1 submitted 15 August, 2025; originally announced August 2025.

  19. arXiv:2508.08328  [pdf, ps, other

    cs.RO

    Whole-Body Coordination for Dynamic Object Grasping with Legged Manipulators

    Authors: Qiwei Liang, Boyang Cai, Rongyi He, Hui Li, Tao Teng, Haihan Duan, Changxin Huang, Runhao Zeng

    Abstract: Quadrupedal robots with manipulators offer strong mobility and adaptability for grasping in unstructured, dynamic environments through coordinated whole-body control. However, existing research has predominantly focused on static-object grasping, neglecting the challenges posed by dynamic targets and thus limiting applicability in dynamic scenarios such as logistics sorting and human-robot collabo… ▽ More

    Submitted 10 August, 2025; originally announced August 2025.

  20. arXiv:2507.21606  [pdf, ps, other

    cs.CV

    Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking

    Authors: Yaozong Zheng, Bineng Zhong, Qihua Liang, Ning Li, Shuxiang Song

    Abstract: The success of visual tracking has been largely driven by datasets with manual box annotations. However, these box annotations require tremendous human effort, limiting the scale and diversity of existing tracking datasets. In this work, we present a novel Self-Supervised Tracking framework named \textbf{\tracker}, designed to eliminate the need of box annotations. Specifically, a decoupled spatio… ▽ More

    Submitted 29 July, 2025; originally announced July 2025.

    Comments: Accepted by AAAI2025

  21. arXiv:2507.20177  [pdf, ps, other

    cs.CV cs.MM

    Towards Universal Modal Tracking with Online Dense Temporal Token Learning

    Authors: Yaozong Zheng, Bineng Zhong, Qihua Liang, Shengping Zhang, Guorong Li, Xianxian Li, Rongrong Ji

    Abstract: We propose a universal video-level modality-awareness tracking model with online dense temporal token learning (called {\modaltracker}). It is designed to support various tracking tasks, including RGB, RGB+Thermal, RGB+Depth, and RGB+Event, utilizing the same model architecture and parameters. Specifically, our model is designed with three core goals: \textbf{Video-level Sampling}. We expand the m… ▽ More

    Submitted 27 July, 2025; originally announced July 2025.

    Comments: arXiv admin note: text overlap with arXiv:2401.01686

  22. arXiv:2507.19427  [pdf, ps, other

    cs.LG cs.AI

    Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding

    Authors: StepFun, :, Bin Wang, Bojun Wang, Changyi Wan, Guanzhe Huang, Hanpeng Hu, Haonan Jia, Hao Nie, Mingliang Li, Nuo Chen, Siyu Chen, Song Yuan, Wuxun Xie, Xiaoniu Song, Xing Chen, Xingping Yang, Xuelin Zhang, Yanbo Yu, Yaoyu Wang, Yibo Zhu, Yimin Jiang, Yu Zhou, Yuanwei Lu, Houyi Li , et al. (175 additional authors not shown)

    Abstract: Large language models (LLMs) face low hardware efficiency during decoding, especially for long-context reasoning tasks. This paper introduces Step-3, a 321B-parameter VLM with hardware-aware model-system co-design optimized for minimizing decoding costs. Step-3 innovates in two key dimensions: (1) A novel Multi-Matrix Factorization Attention (MFA) mechanism that significantly reduces both KV cache… ▽ More

    Submitted 25 July, 2025; originally announced July 2025.

  23. arXiv:2507.15356  [pdf, ps, other

    cs.AI

    RAD: Retrieval High-quality Demonstrations to Enhance Decision-making

    Authors: Lu Guo, Yixiang Shan, Zhengbang Zhu, Qifan Liang, Lichang Song, Ting Long, Weinan Zhang, Yi Chang

    Abstract: Offline reinforcement learning (RL) enables agents to learn policies from fixed datasets, avoiding costly or unsafe environment interactions. However, its effectiveness is often limited by dataset sparsity and the lack of transition overlap between suboptimal and expert trajectories, which makes long-horizon planning particularly challenging. Prior solutions based on synthetic data augmentation or… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  24. arXiv:2507.14975  [pdf, ps, other

    cs.RO cs.AI

    FCRF: Flexible Constructivism Reflection for Long-Horizon Robotic Task Planning with Large Language Models

    Authors: Yufan Song, Jiatao Zhang, Zeng Gu, Qingmiao Liang, Tuocheng Hu, Wei Song, Shiqiang Zhu

    Abstract: Autonomous error correction is critical for domestic robots to achieve reliable execution of complex long-horizon tasks. Prior work has explored self-reflection in Large Language Models (LLMs) for task planning error correction; however, existing methods are constrained by inflexible self-reflection mechanisms that limit their effectiveness. Motivated by these limitations and inspired by human cog… ▽ More

    Submitted 16 September, 2025; v1 submitted 20 July, 2025; originally announced July 2025.

    Comments: 8 pages, 6 figures, IROS 2025

  25. arXiv:2506.20094  [pdf, ps, other

    cs.LG

    MEL: Multi-level Ensemble Learning for Resource-Constrained Environments

    Authors: Krishna Praneet Gudipaty, Walid A. Hanafy, Kaan Ozkara, Qianlin Liang, Jesse Milzman, Prashant Shenoy, Suhas Diggavi

    Abstract: AI inference at the edge is becoming increasingly common for low-latency services. However, edge environments are power- and resource-constrained, and susceptible to failures. Conventional failure resilience approaches, such as cloud failover or compressed backups, often compromise latency or accuracy, limiting their effectiveness for critical edge inference services. In this paper, we propose Mul… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  26. arXiv:2506.18088  [pdf, ps, other

    cs.RO cs.AI cs.CL cs.CV cs.MA

    RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

    Authors: Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Zixuan Li, Qiwei Liang, Xianliang Lin, Yiheng Ge, Zhenyu Gu, Weiliang Deng, Yubin Guo, Tian Nian, Xuanbing Xie, Qiangyu Chen, Kailun Su, Tianling Xu, Guodong Liu, Mengkang Hu, Huan-ang Gao, Kaixuan Wang, Zhixuan Liang, Yusen Qin, Xiaokang Yang, Ping Luo , et al. (1 additional authors not shown)

    Abstract: Simulation-based data synthesis has emerged as a powerful paradigm for advancing real-world robotic manipulation. Yet existing datasets remain insufficient for robust bimanual manipulation due to (1) the lack of scalable task generation methods and (2) oversimplified simulation environments. We present RoboTwin 2.0, a scalable framework for automated, large-scale generation of diverse and realisti… ▽ More

    Submitted 27 August, 2025; v1 submitted 22 June, 2025; originally announced June 2025.

    Comments: Project Page: https://robotwin-platform.github.io/, Code: https://github.com/robotwin-Platform/robotwin, Doc: https://robotwin-platform.github.io/doc/

  27. arXiv:2506.03714  [pdf, other

    cs.CV

    FSHNet: Fully Sparse Hybrid Network for 3D Object Detection

    Authors: Shuai Liu, Mingyue Cui, Boyang Li, Quanmin Liang, Tinghe Hong, Kai Huang, Yunxiao Shan, Kai Huang

    Abstract: Fully sparse 3D detectors have recently gained significant attention due to their efficiency in long-range detection. However, sparse 3D detectors extract features only from non-empty voxels, which impairs long-range interactions and causes the center feature missing. The former weakens the feature extraction capability, while the latter hinders network optimization. To address these challenges, w… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted by CVPR2025

  28. arXiv:2506.02791  [pdf, ps, other

    cs.SE cs.AI

    Rethinking the effects of data contamination in Code Intelligence

    Authors: Zhen Yang, Hongyi Lin, Yifan He, Jie Xu, Zeyu Sun, Shuo Liu, Pengpeng Wang, Zhongxing Yu, Qingyuan Liang

    Abstract: In recent years, code intelligence has gained increasing importance in the field of automated software engineering. Meanwhile, the widespread adoption of Pretrained Language Models (PLMs) and Large Language Models (LLMs) has raised concerns regarding data contamination and its potential impact on model performance evaluation. This paper presents a systematic empirical study to investigate the fine… ▽ More

    Submitted 8 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

  29. arXiv:2506.00096   

    q-bio.GN cs.AI

    PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image Dataset

    Authors: Liangrui Pan, Qingchun Liang, Shen Zhao, Songqing Fan, Shaoliang Peng

    Abstract: Accurately predicting gene mutations, mutation subtypes and their exons in lung cancer is critical for personalized treatment planning and prognostic assessment. Faced with regional disparities in medical resources and the high cost of genomic assays, using artificial intelligence to infer these mutations and exon variants from routine histopathology images could greatly facilitate precision thera… ▽ More

    Submitted 26 November, 2025; v1 submitted 30 May, 2025; originally announced June 2025.

    Comments: Withdrawn due to issues related to data permissions/ethics

  30. arXiv:2506.00034  [pdf, ps, other

    cs.RO cs.CV

    GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving

    Authors: Shuai Liu, Quanmin Liang, Zefeng Li, Boyang Li, Kai Huang

    Abstract: Multi-sensor fusion is crucial for improving the performance and robustness of end-to-end autonomous driving systems. Existing methods predominantly adopt either attention-based flatten fusion or bird's eye view fusion through geometric transformations. However, these approaches often suffer from limited interpretability or dense computational overhead. In this paper, we introduce GaussianFusion,… ▽ More

    Submitted 27 October, 2025; v1 submitted 26 May, 2025; originally announced June 2025.

    Comments: Accepted at NeurIPS2025 (Spotlight)

  31. arXiv:2505.20068  [pdf

    cs.HC cs.AI

    On the Same Page: Dimensions of Perceived Shared Understanding in Human-AI Interaction

    Authors: Qingyu Liang, Jaime Banks

    Abstract: Shared understanding plays a key role in the effective communication in and performance of human-human interactions. With the increasingly common integration of AI into human contexts, the future of personal and workplace interactions will likely see human-AI interaction (HAII) in which the perception of shared understanding is important. Existing literature has addressed the processes and effects… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  32. arXiv:2504.18736  [pdf, ps, other

    cs.CL

    EvidenceBench: A Benchmark for Extracting Evidence from Biomedical Papers

    Authors: Jianyou Wang, Weili Cao, Kaicheng Wang, Xiaoyue Wang, Ashish Dalvi, Gino Prasad, Qishan Liang, Hsuan-lin Her, Ming Wang, Qin Yang, Gene W. Yeo, David E. Neal, Maxim Khan, Christopher D. Rosin, Ramamohan Paturi, Leon Bergen

    Abstract: We study the task of automatically finding evidence relevant to hypotheses in biomedical papers. Finding relevant evidence is an important step when researchers investigate scientific hypotheses. We introduce EvidenceBench to measure models performance on this task, which is created by a novel pipeline that consists of hypothesis generation and sentence-by-sentence annotation of biomedical papers… ▽ More

    Submitted 7 August, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

    Comments: Published at Conference on Language Modeling (COLM) 2025

  33. arXiv:2504.05779  [pdf, other

    cs.CV

    FASR-Net: Unsupervised Shadow Removal Leveraging Inherent Frequency Priors

    Authors: Tao Lin, Qingwang Wang, Qiwei Liang, Minghua Tang, Yuxuan Sun

    Abstract: Shadow removal is challenging due to the complex interaction of geometry, lighting, and environmental factors. Existing unsupervised methods often overlook shadow-specific priors, leading to incomplete shadow recovery. To address this issue, we propose a novel unsupervised Frequency Aware Shadow Removal Network (FASR-Net), which leverages the inherent frequency characteristics of shadow regions. S… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  34. arXiv:2503.18034  [pdf, ps, other

    cs.CV cs.CL

    Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models

    Authors: Qiao Liang, Yanjiang Liu, Weixiang Zhou, Ben He, Yaojie Lu, Hongyu Lin, Jia Zheng, Xianpei Han, Le Sun, Yingfei Sun

    Abstract: Does the prior knowledge of the vision encoder constrain the capability boundary of Multi-modal Large Language Models (MLLMs)? While most existing research treats MLLMs as unified systems optimized through end-to-end training, the impact of vision encoder's prior knowledge is seldom investigated. In this work, we introduce a novel metric, $Rank_e$, to quantify the effect of prior knowledge of the… ▽ More

    Submitted 30 May, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

  35. arXiv:2503.13799  [pdf, other

    cs.CV cs.AI

    SMILE: a Scale-aware Multiple Instance Learning Method for Multicenter STAS Lung Cancer Histopathology Diagnosis

    Authors: Liangrui Pan, Xiaoyu Li, Yutao Dou, Qiya Song, Jiadi Luo, Qingchun Liang, Shaoliang Peng

    Abstract: Spread through air spaces (STAS) represents a newly identified aggressive pattern in lung cancer, which is known to be associated with adverse prognostic factors and complex pathological features. Pathologists currently rely on time consuming manual assessments, which are highly subjective and prone to variation. This highlights the urgent need for automated and precise diag nostic solutions. 2,97… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  36. arXiv:2503.11085  [pdf, other

    cs.SE

    Prompt Alchemy: Automatic Prompt Refinement for Enhancing Code Generation

    Authors: Sixiang Ye, Zeyu Sun, Guoqing Wang, Liwei Guo, Qingyuan Liang, Zheng Li, Yong Liu

    Abstract: Code generation has emerged as a key task to automate software development by converting high-level descriptions into executable code. Large language models (LLMs) excel at this but depend heavily on input prompt quality.Manual prompt engineering can be time-consuming and inconsistent, limiting LLM effectiveness. This paper introduces Prochemy, an innovative method for automatically refining promp… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  37. arXiv:2503.06625  [pdf, other

    cs.CV

    Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking

    Authors: Chaocan Xue, Bineng Zhong, Qihua Liang, Yaozong Zheng, Ning Li, Yuanliang Xue, Shuxiang Song

    Abstract: Vision transformers (ViTs) have emerged as a popular backbone for visual tracking. However, complete ViT architectures are too cumbersome to deploy for unmanned aerial vehicle (UAV) tracking which extremely emphasizes efficiency. In this study, we discover that many layers within lightweight ViT-based trackers tend to learn relatively redundant and repetitive target representations. Based on this… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  38. arXiv:2503.06621  [pdf, other

    cs.CV

    Dynamic Updates for Language Adaptation in Visual-Language Tracking

    Authors: Xiaohai Li, Bineng Zhong, Qihua Liang, Zhiyi Mo, Jian Nong, Shuxiang Song

    Abstract: The consistency between the semantic information provided by the multi-modal reference and the tracked object is crucial for visual-language (VL) tracking. However, existing VL tracking frameworks rely on static multi-modal references to locate dynamic objects, which can lead to semantic discrepancies and reduce the robustness of the tracker. To address this issue, we propose a novel vision-langua… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  39. arXiv:2503.05507  [pdf, ps, other

    cs.PL cs.AI

    Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?

    Authors: Qingyuan Liang, Zhao Zhang, Zeyu Sun, Zheng Lin, Qi Luo, Yueyi Xiao, Yizhou Chen, Yuqun Zhang, Haotian Zhang, Lu Zhang, Bin Chen, Yingfei Xiong

    Abstract: Grammar serves as a cornerstone in programming languages and software engineering, providing frameworks to define the syntactic space and program structure. Existing research demonstrates the effectiveness of grammar-based code representations in small-scale models, showing their ability to reduce syntax errors and enhance performance. However, as language models scale to the billion level or beyo… ▽ More

    Submitted 10 December, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

  40. arXiv:2502.18927  [pdf, other

    cs.IR stat.ME

    A Multifacet Hierarchical Sentiment-Topic Model with Application to Multi-Brand Online Review Analysis

    Authors: Qiao Liang, Xinwei Deng

    Abstract: Multi-brand analysis based on review comments and ratings is a commonly used strategy to compare different brands in marketing. It can help consumers make more informed decisions and help marketers understand their brand's position in the market. In this work, we propose a multifacet hierarchical sentiment-topic model (MH-STM) to detect brand-associated sentiment polarities towards multiple compar… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 21 pages, 6 figures, 4 tables

  41. Corotational Hinge-based Thin Plates/Shells

    Authors: Qixin Liang

    Abstract: We present six thin plate/shell models, derived from three distinct types of curvature operators formulated within the corotational frame, for simulating both rest-flat and rest-curved triangular meshes. Each curvature operator derives a curvature expression corresponding to both a plate model and a shell model. The corotational edge-based hinge model uses an edge-based stencil to compute directio… ▽ More

    Submitted 4 June, 2025; v1 submitted 15 February, 2025; originally announced February 2025.

    Comments: Accepted at Eurographics 2025

    Journal ref: Comput. Graph. Forum 44(2), e70022 (Proc. Eurographics 2025)

  42. arXiv:2502.10248  [pdf, other

    cs.CV cs.CL

    Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

    Authors: Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou, Kaijun Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun, Xin Han, Yanan Wei, Zheng Ge, Aojie Li, Bin Wang , et al. (90 additional authors not shown)

    Abstract: We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length. A deep compression Variational Autoencoder, Video-VAE, is designed for video generation tasks, achieving 16x16 spatial and 8x temporal compression ratios, while maintaining exceptional video reconstruction quality. User prompts are encoded… ▽ More

    Submitted 24 February, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

    Comments: 36 pages, 14 figures

  43. arXiv:2502.06583  [pdf, other

    cs.CV

    Adaptive Perception for Unified Visual Multi-modal Object Tracking

    Authors: Xiantao Hu, Bineng Zhong, Qihua Liang, Zhiyi Mo, Liangtao Shi, Ying Tai, Jian Yang

    Abstract: Recently, many multi-modal trackers prioritize RGB as the dominant modality, treating other modalities as auxiliary, and fine-tuning separately various multi-modal tasks. This imbalance in modality dependence limits the ability of methods to dynamically utilize complementary information from each modality in complex scenarios, making it challenging to fully perceive the advantages of multi-modal.… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  44. arXiv:2501.18797  [pdf, other

    cs.LG cs.AI stat.ML

    Compositional Generalization via Forced Rendering of Disentangled Latents

    Authors: Qiyao Liang, Daoyuan Qian, Liu Ziyin, Ila Fiete

    Abstract: Composition-the ability to generate myriad variations from finite means-is believed to underlie powerful generalization. However, compositional generalization remains a key challenge for deep learning. A widely held assumption is that learning disentangled (factorized) representations naturally supports this kind of extrapolation. Yet, empirical results are mixed, with many generative models faili… ▽ More

    Submitted 24 May, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

    Comments: 9 pages, 4 figures, plus appendix

  45. arXiv:2501.12079  [pdf, ps, other

    cs.SE

    Directional Diffusion-Style Code Editing Pre-training

    Authors: Qingyuan Liang, Zeyu Sun, Qihao Zhu, Junhao Hu, Yifan Zhao, Yizhou Chen, Mingxuan Zhu, Guoqing Wang, Lu Zhang

    Abstract: Code pre-trained models have shown promising effectiveness in various software engineering tasks. Among these tasks, many tasks are related to software evolution and/or code editing. However, existing code pre-trained models often overlook the real-world code editing data and the evolutionary nature of the editing process. In this paper, to simulate the step-by-step code editing process of human d… ▽ More

    Submitted 10 December, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

  46. arXiv:2501.11938  [pdf, ps, other

    cs.RO eess.SY

    Navigating Robot Swarm Through a Virtual Tube with Flow-Adaptive Distribution Control

    Authors: Yongwei Zhang, Shuli Lv, Kairong Liu, Quanyi Liang, Quan Quan, Zhikun She

    Abstract: With the rapid development of robot swarm technology and its diverse applications, navigating robot swarms through complex environments has emerged as a critical research direction. To ensure safe navigation and avoid potential collisions with obstacles, the concept of virtual tubes has been introduced to define safe and navigable regions. However, current control methods in virtual tubes face the… ▽ More

    Submitted 13 August, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

    Comments: 8 pages(brief paper), 12 figures

  47. arXiv:2501.02216  [pdf, other

    cs.SE

    Automatically Learning a Precise Measurement for Fault Diagnosis Capability of Test Cases

    Authors: Yifan Zhao, Zeyu Sun, Guoqing Wang, Qingyuan Liang, Yakun Zhang, Yiling Lou, Dan Hao, Lu Zhang

    Abstract: Prevalent Fault Localization (FL) techniques rely on tests to localize buggy program elements. Tests could be treated as fuel to further boost FL by providing more debugging information. Therefore, it is highly valuable to measure the Fault Diagnosis Capability (FDC) of a test for diagnosing faults, so as to select or generate tests to better help FL. To this end, researchers have proposed many FD… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

    Comments: This paper has been accepted by TOSEM

  48. arXiv:2501.00758  [pdf, other

    cs.CV

    Less is More: Token Context-aware Learning for Object Tracking

    Authors: Chenlong Xu, Bineng Zhong, Qihua Liang, Yaozong Zheng, Guorong Li, Shuxiang Song

    Abstract: Recently, several studies have shown that utilizing contextual information to perceive target states is crucial for object tracking. They typically capture context by incorporating multiple video frames. However, these naive frame-context methods fail to consider the importance of each patch within a reference frame, making them susceptible to noise and redundant tokens, which deteriorates trackin… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

    Comments: Accepted by AAAI 2025

  49. arXiv:2501.00473  [pdf, other

    cs.DL

    Quantifying the Dynamics of Harm Caused by Retracted Research

    Authors: Yunyou Huang, Jiahui Zhao, Dandan Cui, Zhengxin Yang, Bingjie Xia, Qi Liang, Wenjing Liu, Li Ma, Suqin Tang, Tianyong Hao, Zhifei Zhang, Wanling Gao, Jianfeng Zhan

    Abstract: Despite enormous efforts devoted to understand the characteristics and impacts of retracted papers, little is known about the mechanisms underlying the dynamics of their harm and the dynamics of its propagation. Here, we propose a citation-based framework to quantify the harm caused by retracted papers, aiming to uncover why their harm persists and spreads so widely. We uncover an ''attention esca… ▽ More

    Submitted 18 February, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

  50. arXiv:2412.17429  [pdf, ps, other

    cs.SE

    Condor: A Code Discriminator Integrating General Semantics with Code Details

    Authors: Qingyuan Liang, Zhao Zhang, Chen Liu, Zeyu Sun, Wenjie Zhang, Yizhou Chen, Zixiao Zhao, Qi Luo, Wentao Wang, Yanjie Jiang, Yingfei Xiong, Lu Zhang

    Abstract: LLMs demonstrate significant potential across various software engineering tasks. However, they still face challenges in generating correct code on the first attempt when addressing complex requirements. Introducing a discriminator to select reliable outputs from multiple generated results is an effective way to enhance their reliability and stability. Currently, these discriminators fall into two… ▽ More

    Submitted 10 December, 2025; v1 submitted 23 December, 2024; originally announced December 2024.