Skip to main content

Showing 1–50 of 879 results for author: Huang, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.20312  [pdf, ps, other

    cs.LG cs.AI

    TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning

    Authors: Saisai Yang, Qingyi Huang, Jing Yuan, Liangyu Zha, Kai Tang, Yuhang Yang, Ning Wang, Yucheng Wei, Liyao Li, Wentao Ye, Hao Chen, Tao Zhang, Junlin Zhou, Haobo Wang, Gang Chen, Junbo Zhao

    Abstract: Tabular data serves as the backbone of modern data analysis and scientific research. While Large Language Models (LLMs) fine-tuned via Supervised Fine-Tuning (SFT) have significantly improved natural language interaction with such structured data, they often fall short in handling the complex, multi-step reasoning and robust code execution required for real-world table tasks. Reinforcement Learnin… ▽ More

    Submitted 23 December, 2025; originally announced December 2025.

  2. arXiv:2512.19049  [pdf, ps, other

    cs.CV

    Decoupled Generative Modeling for Human-Object Interaction Synthesis

    Authors: Hwanhee Jung, Seunggwan Lee, Jeongyoon Yoon, SeungHyeon Kim, Giljoo Nam, Qixing Huang, Sangpil Kim

    Abstract: Synthesizing realistic human-object interaction (HOI) is essential for 3D computer vision and robotics, underpinning animation and embodied control. Existing approaches often require manually specified intermediate waypoints and place all optimization objectives on a single network, which increases complexity, reduces flexibility, and leads to errors such as unsynchronized human and object motion… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  3. arXiv:2512.18196  [pdf, ps, other

    cs.CL

    Training LLMs with LogicReward for Faithful and Rigorous Reasoning

    Authors: Jundong Xu, Hao Fei, Huichi Zhou, Xin Quan, Qijun Huang, Shengqiong Wu, William Yang Wang, Mong-Li Lee, Wynne Hsu

    Abstract: Although LLMs exhibit strong reasoning capabilities, existing training methods largely depend on outcome-based feedback, which can produce correct answers with flawed reasoning. Prior work introduces supervision on intermediate steps but still lacks guarantees of logical soundness, which is crucial in high-stakes scenarios where logical consistency is paramount. To address this, we propose LogicRe… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

    Comments: Preprint

  4. arXiv:2512.17781  [pdf, ps, other

    cs.CV cs.GR

    LiteGE: Lightweight Geodesic Embedding for Efficient Geodesics Computation and Non-Isometric Shape Correspondence

    Authors: Yohanes Yudhi Adikusuma, Qixing Huang, Ying He

    Abstract: Computing geodesic distances on 3D surfaces is fundamental to many tasks in 3D vision and geometry processing, with deep connections to tasks such as shape correspondence. Recent learning-based methods achieve strong performance but rely on large 3D backbones, leading to high memory usage and latency, which limit their use in interactive or resource-constrained settings. We introduce LiteGE, a lig… ▽ More

    Submitted 23 December, 2025; v1 submitted 19 December, 2025; originally announced December 2025.

  5. arXiv:2512.14503  [pdf, ps, other

    cs.IR cs.CL

    RecGPT-V2 Technical Report

    Authors: Chao Yi, Dian Chen, Gaoyang Guo, Jiakai Tang, Jian Wu, Jing Yu, Mao Zhang, Wen Chen, Wenjun Yang, Yujie Luo, Yuning Jiang, Zhujin Gao, Bo Zheng, Binbin Cao, Changfa Wu, Dixuan Wang, Han Wu, Haoyi Hu, Kewei Zhu, Lang Tian, Lin Yang, Qiqi Huang, Siqi Yang, Wenbo Su, Xiaoxiao He , et al. (10 additional authors not shown)

    Abstract: Large language models (LLMs) have demonstrated remarkable potential in transforming recommender systems from implicit behavioral pattern matching to explicit intent reasoning. While RecGPT-V1 successfully pioneered this paradigm by integrating LLM-based reasoning into user interest mining and item tag prediction, it suffers from four fundamental limitations: (1) computational inefficiency and cogn… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

  6. arXiv:2512.11218  [pdf, ps, other

    cs.RO cs.CV

    Seeing to Act, Prompting to Specify: A Bayesian Factorization of Vision Language Action Policy

    Authors: Kechun Xu, Zhenjie Zhu, Anzhe Chen, Shuqi Zhao, Qing Huang, Yifei Yang, Haojian Lu, Rong Xiong, Masayoshi Tomizuka, Yue Wang

    Abstract: The pursuit of out-of-distribution generalization in Vision-Language-Action (VLA) models is often hindered by catastrophic forgetting of the Vision-Language Model (VLM) backbone during fine-tuning. While co-training with external reasoning data helps, it requires experienced tuning and data-related overhead. Beyond such external dependencies, we identify an intrinsic cause within VLA datasets: mod… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

  7. arXiv:2512.09277  [pdf, ps, other

    cs.DC cs.AR

    Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens

    Authors: Yanpeng Yu, Haiyue Ma, Krish Agarwal, Nicolai Oswald, Qijing Huang, Hugo Linsenmaier, Chunhui Mei, Ritchie Zhao, Ritika Borkar, Bita Darvish Rouhani, David Nellans, Ronny Krashinsky, Anurag Khandelwal

    Abstract: Expert Parallelism (EP) permits Mixture of Experts (MoE) models to scale beyond a single GPU. To address load imbalance across GPUs in EP, existing approaches aim to balance the number of tokens each GPU processes. Surprisingly, we find that this objective degrades performance rather than improving it when processing is memory-bound - a common occurrence in MoE serving, especially in the decode ph… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

  8. arXiv:2512.09200  [pdf, ps, other

    cs.IR

    Meta Lattice: Model Space Redesign for Cost-Effective Industry-Scale Ads Recommendations

    Authors: Liang Luo, Yuxin Chen, Zhengyu Zhang, Mengyue Hang, Andrew Gu, Buyun Zhang, Boyang Liu, Chen Chen, Chengze Fan, Dong Liang, Fan Yang, Feifan Gu, Huayu Li, Jade Nie, Jiayi Xu, Jiyan Yang, Jongsoo Park, Laming Chen, Longhao Jin, Qianru Li, Qin Huang, Shali Jiang, Shiwen Shen, Shuaiwen Wang, Sihan Zeng , et al. (17 additional authors not shown)

    Abstract: The rapidly evolving landscape of products, surfaces, policies, and regulations poses significant challenges for deploying state-of-the-art recommendation models at industry scale, primarily due to data fragmentation across domains and escalating infrastructure costs that hinder sustained quality improvements. To address this challenge, we propose Lattice, a recommendation framework centered aro… ▽ More

    Submitted 14 December, 2025; v1 submitted 9 December, 2025; originally announced December 2025.

    Comments: Accepted to KDD 2026

  9. arXiv:2512.07821  [pdf, ps, other

    cs.CV cs.AI

    WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling

    Authors: Shaoheng Fang, Hanwen Jiang, Yunpeng Bai, Niloy J. Mitra, Qixing Huang

    Abstract: Recent video generators achieve striking photorealism, yet remain fundamentally inconsistent in 3D. We present WorldReel, a 4D video generator that is natively spatio-temporally consistent. WorldReel jointly produces RGB frames together with 4D scene representations, including pointmaps, camera trajectory, and dense flow mapping, enabling coherent geometry and appearance modeling over time. Our ex… ▽ More

    Submitted 8 December, 2025; originally announced December 2025.

  10. arXiv:2512.07360  [pdf, ps, other

    cs.CV cs.AI

    Structure-Aware Feature Rectification with Region Adjacency Graphs for Training-Free Open-Vocabulary Semantic Segmentation

    Authors: Qiming Huang, Hao Ai, Jianbo Jiao

    Abstract: Benefiting from the inductive biases learned from large-scale datasets, open-vocabulary semantic segmentation (OVSS) leverages the power of vision-language models, such as CLIP, to achieve remarkable progress without requiring task-specific training. However, due to CLIP's pre-training nature on image-text pairs, it tends to focus on global semantic alignment, resulting in suboptimal performance w… ▽ More

    Submitted 8 December, 2025; originally announced December 2025.

    Comments: Accepted to WACV2026

  11. arXiv:2512.04248  [pdf, ps, other

    cs.CV cs.AI

    MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

    Authors: Shaoheng Fang, Chaohui Yu, Fan Wang, Qixing Huang

    Abstract: We introduce MVRoom, a controllable novel view synthesis (NVS) pipeline for 3D indoor scenes that uses multi-view diffusion conditioned on a coarse 3D layout. MVRoom employs a two-stage design in which the 3D layout is used throughout to enforce multi-view consistency. The first stage employs novel representations to effectively bridge the 3D layout and consistent image-based condition signals for… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

  12. arXiv:2512.03453  [pdf, ps, other

    cs.CV

    GeoVideo: Introducing Geometric Regularization into Video Generation Model

    Authors: Yunpeng Bai, Shaoheng Fang, Chaohui Yu, Fan Wang, Qixing Huang

    Abstract: Recent advances in video generation have enabled the synthesis of high-quality and visually realistic clips using diffusion transformer models. However, most existing approaches operate purely in the 2D pixel space and lack explicit mechanisms for modeling 3D structures, often resulting in temporally inconsistent geometries, implausible motions, and structural artifacts. In this work, we introduce… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

    Comments: Project Page: https://geovideo.github.io/GeoVideo/

  13. arXiv:2512.03067  [pdf, ps, other

    cs.SI cs.AI

    Quantifying the Potential to Escape Filter Bubbles: A Behavior-Aware Measure via Contrastive Simulation

    Authors: Difu Feng, Qianqian Xu, Zitai Wang, Cong Hua, Zhiyong Yang, Qingming Huang

    Abstract: Nowadays, recommendation systems have become crucial to online platforms, shaping user exposure by accurate preference modeling. However, such an exposure strategy can also reinforce users' existing preferences, leading to a notorious phenomenon named filter bubbles. Given its negative effects, such as group polarization, increasing attention has been paid to exploring reasonable measures to filte… ▽ More

    Submitted 27 November, 2025; originally announced December 2025.

  14. arXiv:2512.01830  [pdf, ps, other

    cs.CV

    OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic

    Authors: Songyan Zhang, Wenhui Huang, Zhan Chen, Chua Jiahao Collister, Qihang Huang, Chen Lv

    Abstract: Recently, two-stage fine-tuning strategies, e.g., acquiring essential driving knowledge through supervised fine-tuning (SFT) and further enhancing decision-making and planning via reinforcement fine-tuning (RFT), have shown strong potential in advancing the knowledge-driven autonomous driving (AD) paradigm. However, the learning nature of SFT still limits the generalization of reasoning, thereby c… ▽ More

    Submitted 2 December, 2025; v1 submitted 1 December, 2025; originally announced December 2025.

  15. Closing the Approximation Gap of Partial AUC Optimization: A Tale of Two Formulations

    Authors: Yangbangyan Jiang, Qianqian Xu, Huiyang Shao, Zhiyong Yang, Shilong Bao, Xiaochun Cao, Qingming Huang

    Abstract: As a variant of the Area Under the ROC Curve (AUC), the partial AUC (PAUC) focuses on a specific range of false positive rate (FPR) and/or true positive rate (TPR) in the ROC curve. It is a pivotal evaluation metric in real-world scenarios with both class imbalance and decision constraints. However, selecting instances within these constrained intervals during its calculation is NP-hard, and thus… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

  16. arXiv:2512.00881  [pdf, ps, other

    cs.AI

    Hybrid-DMKG: A Hybrid Reasoning Framework over Dynamic Multimodal Knowledge Graphs for Multimodal Multihop QA with Knowledge Editing

    Authors: Li Yuan, Qingfei Huang, Bingshan Zhu, Yi Cai, Qingbao Huang, Changmeng Zheng, Zikun Deng, Tao Wang

    Abstract: Multimodal Knowledge Editing (MKE) extends traditional knowledge editing to settings involving both textual and visual modalities. However, existing MKE benchmarks primarily assess final answer correctness while neglecting the quality of intermediate reasoning and robustness to visually rephrased inputs. To address this limitation, we introduce MMQAKE, the first benchmark for multimodal multihop q… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

    Comments: Accepted by AAAI 2026

  17. arXiv:2512.00852  [pdf, ps, other

    cs.AI cs.CL cs.LG

    One Swallow Does Not Make a Summer: Understanding Semantic Structures in Embedding Spaces

    Authors: Yandong Sun, Qiang Huang, Ziwei Xu, Yiqun Sun, Yixuan Tang, Anthony K. H. Tung

    Abstract: Embedding spaces are fundamental to modern AI, translating raw data into high-dimensional vectors that encode rich semantic relationships. Yet, their internal structures remain opaque, with existing approaches often sacrificing semantic coherence for structural regularity or incurring high computational overhead to improve interpretability. To address these challenges, we introduce the Semantic Fi… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

  18. arXiv:2511.23150  [pdf, ps, other

    cs.CV

    Cascaded Robust Rectification for Arbitrary Document Images

    Authors: Chaoyun Wang, Quanxin Huang, I-Chao Shen, Takeo Igarashi, Nanning Zheng, Caigui Jiang

    Abstract: Document rectification in real-world scenarios poses significant challenges due to extreme variations in camera perspectives and physical distortions. Driven by the insight that complex transformations can be decomposed and resolved progressively, we introduce a novel multi-stage framework that progressively reverses distinct distortion types in a coarse-to-fine manner. Specifically, our framework… ▽ More

    Submitted 28 November, 2025; originally announced November 2025.

  19. arXiv:2511.22950  [pdf, ps, other

    cs.CV cs.RO

    RobotSeg: A Model and Dataset for Segmenting Robots in Image and Video

    Authors: Haiyang Mei, Qiming Huang, Hai Ci, Mike Zheng Shou

    Abstract: Accurate robot segmentation is a fundamental capability for robotic perception. It enables precise visual servoing for VLA systems, scalable robot-centric data augmentation, accurate real-to-sim transfer, and reliable safety monitoring in dynamic human-robot environments. Despite the strong capabilities of modern segmentation models, surprisingly it remains challenging to segment robots. This is d… ▽ More

    Submitted 28 November, 2025; originally announced November 2025.

    Comments: Project page: https://github.com/showlab/RobotSeg

  20. arXiv:2511.22170  [pdf, ps, other

    cs.CV

    Partially Shared Concept Bottleneck Models

    Authors: Delong Zhao, Qiang Huang, Di Yan, Yiqun Sun, Jun Yu

    Abstract: Concept Bottleneck Models (CBMs) enhance interpretability by introducing a layer of human-understandable concepts between inputs and predictions. While recent methods automate concept generation using Large Language Models (LLMs) and Vision-Language Models (VLMs), they still face three fundamental challenges: poor visual grounding, concept redundancy, and the absence of principled metrics to balan… ▽ More

    Submitted 27 November, 2025; originally announced November 2025.

    Comments: 14 pages, 7 figures, 11 tables, Accepted to AAAI 2026

  21. arXiv:2511.21631  [pdf, ps, other

    cs.CV cs.AI

    Qwen3-VL Technical Report

    Authors: Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu , et al. (39 additional authors not shown)

    Abstract: We introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benchmarks. It natively supports interleaved contexts of up to 256K tokens, seamlessly integrating text, images, and video. The model family includes both dense (2B/4B/8B/32B) and mixture-of-experts (30B-A3B/235B-A22B) variants to accommodate d… ▽ More

    Submitted 27 November, 2025; v1 submitted 26 November, 2025; originally announced November 2025.

    Comments: 42 pages

  22. arXiv:2511.21519  [pdf, ps, other

    cs.CV

    Self-Paced Learning for Images of Antinuclear Antibodies

    Authors: Yiyang Jiang, Guangwu Qian, Jiaxin Wu, Qi Huang, Qing Li, Yongkang Wu, Xiao-Yong Wei

    Abstract: Antinuclear antibody (ANA) testing is a crucial method for diagnosing autoimmune disorders, including lupus, Sjögren's syndrome, and scleroderma. Despite its importance, manual ANA detection is slow, labor-intensive, and demands years of training. ANA detection is complicated by over 100 coexisting antibody types, resulting in vast fluorescent pattern combinations. Although machine learning and de… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

    Comments: IEEE Transactions on Medical Imaging

  23. arXiv:2511.21002  [pdf, ps, other

    cs.CV cs.AI

    Knowledge Completes the Vision: A Multimodal Entity-aware Retrieval-Augmented Generation Framework for News Image Captioning

    Authors: Xiaoxing You, Qiang Huang, Lingyu Li, Chi Zhang, Xiaopeng Liu, Min Zhang, Jun Yu

    Abstract: News image captioning aims to produce journalistically informative descriptions by combining visual content with contextual cues from associated articles. Despite recent advances, existing methods struggle with three key challenges: (1) incomplete information coverage, (2) weak cross-modal alignment, and (3) suboptimal visual-entity grounding. To address these issues, we introduce MERGE, the first… ▽ More

    Submitted 25 November, 2025; originally announced November 2025.

    Comments: Accepted to AAAI 2026

  24. arXiv:2511.20280  [pdf, ps, other

    cs.CV

    Bootstrapping Physics-Grounded Video Generation through VLM-Guided Iterative Self-Refinement

    Authors: Yang Liu, Xilin Zhao, Peisong Wen, Siran Dai, Qingming Huang

    Abstract: Recent progress in video generation has led to impressive visual quality, yet current models still struggle to produce results that align with real-world physical principles. To this end, we propose an iterative self-refinement framework that leverages large language models and vision-language models to provide physics-aware guidance for video generation. Specifically, we introduce a multimodal ch… ▽ More

    Submitted 25 November, 2025; originally announced November 2025.

    Comments: ICCV 2025 Physics-IQ Challenge Third Place Solution

  25. arXiv:2511.19343  [pdf, ps, other

    cs.CV

    Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning

    Authors: Qihan Huang, Haofei Zhang, Rong Wei, Yi Wang, Rui Tang, Mingli Song, Jie Song

    Abstract: RL (reinforcement learning) methods (e.g., GRPO) for MLLM (Multimodal LLM) perception ability has attracted wide research interest owing to its remarkable generalization ability. Nevertheless, existing reinforcement learning methods still face the problem of low data quality, where data samples cannot elicit diverse responses from MLLMs, thus restricting the exploration scope for MLLM reinforcemen… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

  26. arXiv:2511.19221  [pdf, ps, other

    cs.CV cs.RO

    Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving

    Authors: Jianhua Han, Meng Tian, Jiangtong Zhu, Fan He, Huixin Zhang, Sitong Guo, Dechang Zhu, Hao Tang, Pei Xu, Yuze Guo, Minzhe Niu, Haojie Zhu, Qichao Dong, Xuechao Yan, Siyuan Dong, Lu Hou, Qingqiu Huang, Xiaosong Jia, Hang Xu

    Abstract: Autonomous driving heavily relies on accurate and robust spatial perception. Many failures arise from inaccuracies and instability, especially in long-tail scenarios and complex interactions. However, current vision-language models are weak at spatial grounding and understanding, and VLA systems built on them therefore show limited perception and localization ability. To address these challenges,… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

  27. arXiv:2511.16123  [pdf, ps, other

    cs.SE

    Domain-constrained Synthesis of Inconsistent Key Aspects in Textual Vulnerability Descriptions

    Authors: Linyi Han, Shidong Pan, Zhenchang Xing, Sofonias Yitagesu, Xiaowang Zhang, Zhiyong Feng, Jiamou Sun, Qing Huang

    Abstract: Textual Vulnerability Descriptions (TVDs) are crucial for security analysts to understand and address software vulnerabilities. However, the key aspect inconsistencies in TVDs from different repositories pose challenges for achieving a comprehensive understanding of vulnerabilities. Existing approaches aim to mitigate inconsistencies by aligning TVDs with external knowledge bases, but they often d… ▽ More

    Submitted 20 November, 2025; originally announced November 2025.

  28. arXiv:2511.12547  [pdf, ps, other

    cs.CV

    HiGFA: Hierarchical Guidance for Fine-grained Data Augmentation with Diffusion Models

    Authors: Zhiguang Lu, Qianqian Xu, Peisong Wen, Siran Dai, Qingming Huang

    Abstract: Generative diffusion models show promise for data augmentation. However, applying them to fine-grained tasks presents a significant challenge: ensuring synthetic images accurately capture the subtle, category-defining features critical for high fidelity. Standard approaches, such as text-based Classifier-Free Guidance (CFG), often lack the required specificity, potentially generating misleading ex… ▽ More

    Submitted 30 November, 2025; v1 submitted 16 November, 2025; originally announced November 2025.

  29. arXiv:2511.11238  [pdf, ps, other

    cs.LG cs.AI

    Virtual Width Networks

    Authors: Seed, Baisheng Li, Banggu Wu, Bole Ma, Bowen Xiao, Chaoyi Zhang, Cheng Li, Chengyi Wang, Chengyin Xu, Chi Zhang, Chong Hu, Daoguang Zan, Defa Zhu, Dongyu Xu, Du Li, Faming Wu, Fan Xia, Ge Zhang, Guang Shi, Haobin Chen, Hongyu Zhu, Hongzhi Huang, Huan Zhou, Huanzhang Dou, Jianhui Duan , et al. (94 additional authors not shown)

    Abstract: We introduce Virtual Width Networks (VWN), a framework that delivers the benefits of wider representations without incurring the quadratic cost of increasing the hidden size. VWN decouples representational width from backbone width, expanding the embedding space while keeping backbone compute nearly constant. In our large-scale experiment, an 8-times expansion accelerates optimization by over 2 ti… ▽ More

    Submitted 17 November, 2025; v1 submitted 14 November, 2025; originally announced November 2025.

  30. arXiv:2511.09837  [pdf, ps, other

    cs.DC

    MoFa: A Unified Performance Modeling Framework for LLM Pretraining

    Authors: Lu Zhao, Rong Shi, Shaoqing Zhang, Shangchao Su, Ziqing Yin, Zhiyan Cui, Hongfeng Sun, Baoguo He, Yueqiang Chen, Liang Dong, Xiyuan Li, Lingbin Wang, Lijun Ma, Qiang Huang, Ting Liu, Chong Wang, Can Wei

    Abstract: The exponential growth in LLM scales, with parameters soaring from billions to trillions, has necessitated distributed pretraining across large clusters comprising thousands to tens of thousands of devices. While hybrid parallelization strategies enable such pretraining, the vast combinatorial strategy space introduces significant optimization challenges. Traditional manual tuning methods incur pr… ▽ More

    Submitted 20 November, 2025; v1 submitted 12 November, 2025; originally announced November 2025.

  31. arXiv:2511.07901  [pdf, ps, other

    cs.AI

    DANS-KGC: Diffusion Based Adaptive Negative Sampling for Knowledge Graph Completion

    Authors: Haoning Li, Qinghua Huang

    Abstract: Negative sampling (NS) strategies play a crucial role in knowledge graph representation. In order to overcome the limitations of existing negative sampling strategies, such as vulnerability to false negatives, limited generalization, and lack of control over sample hardness, we propose DANS-KGC (Diffusion-based Adaptive Negative Sampling for Knowledge Graph Completion). DANS-KGC comprises three ke… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

  32. arXiv:2511.07665  [pdf, ps, other

    cs.AR cs.AI

    FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing

    Authors: Yuzhe Fu, Changchun Zhou, Hancheng Ye, Bowen Duan, Qiyu Huang, Chiyue Wei, Cong Guo, Hai "Helen'' Li, Yiran Chen

    Abstract: Three-dimensional (3D) point clouds are increasingly used in applications such as autonomous driving, robotics, and virtual reality (VR). Point-based neural networks (PNNs) have demonstrated strong performance in point cloud analysis, originally targeting small-scale inputs. However, as PNNs evolve to process large-scale point clouds with hundreds of thousands of points, all-to-all computation and… ▽ More

    Submitted 15 December, 2025; v1 submitted 10 November, 2025; originally announced November 2025.

    Comments: Accepted for publication in HPCA2026. Codes are released at https://github.com/Yuzhe-Fu/FractalCloud

  33. arXiv:2511.06859  [pdf, ps, other

    cs.LG cs.AI

    TuckA: Hierarchical Compact Tensor Experts for Efficient Fine-Tuning

    Authors: Qifeng Lei, Zhiyong Yang, Qianqian Xu, Cong Hua, Peisong Wen, Qingming Huang

    Abstract: Efficiently fine-tuning pre-trained models for downstream tasks is a key challenge in the era of foundation models. Parameter-efficient fine-tuning (PEFT) presents a promising solution, achieving performance comparable to full fine-tuning by updating only a small number of adaptation weights per layer. Traditional PEFT methods typically rely on a single expert, where the adaptation weight is a low… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

  34. arXiv:2511.02206  [pdf, ps, other

    cs.CV

    Language-Enhanced Generative Modeling for Amyloid PET Synthesis from MRI and Blood Biomarkers

    Authors: Zhengjie Zhang, Xiaoxie Mao, Qihao Guo, Shaoting Zhang, Qi Huang, Mu Zhou, Fang Xie, Mianxin Liu

    Abstract: Background: Alzheimer's disease (AD) diagnosis heavily relies on amyloid-beta positron emission tomography (Abeta-PET), which is limited by high cost and limited accessibility. This study explores whether Abeta-PET spatial patterns can be predicted from blood-based biomarkers (BBMs) and MRI scans. Methods: We collected Abeta-PET images, T1-weighted MRI scans, and BBMs from 566 participants. A lang… ▽ More

    Submitted 16 November, 2025; v1 submitted 3 November, 2025; originally announced November 2025.

    Comments: 31 pages, 8 figures

  35. arXiv:2511.01866  [pdf, ps, other

    cs.DC cs.AI cs.AR

    EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs

    Authors: Benjamin Kubwimana, Qijing Huang

    Abstract: Edge intelligence paradigm is increasingly demanded by the emerging autonomous systems, such as robotics. Beyond ensuring privacy-preserving operation and resilience in connectivity-limited environments, edge deployment offers significant energy and cost advantages over cloud-based solutions. However, deploying large language models (LLMs) for reasoning tasks on edge GPUs faces critical challenges… ▽ More

    Submitted 21 October, 2025; originally announced November 2025.

    Comments: Published in the Proceedings of the 2025 IEEE International Symposium on Workload Characterization (IISWC 2025)

  36. arXiv:2511.01078  [pdf, ps, other

    cs.MA

    Predictive Auxiliary Learning for Belief-based Multi-Agent Systems

    Authors: Qinwei Huang, Stefan Wang, Simon Khan, Garrett Katz, Qinru Qiu

    Abstract: The performance of multi-agent reinforcement learning (MARL) in partially observable environments depends on effectively aggregating information from observations, communications, and reward signals. While most existing multi-agent systems primarily rely on rewards as the only feedback for policy training, our research shows that introducing auxiliary predictive tasks can significantly enhance lea… ▽ More

    Submitted 2 November, 2025; originally announced November 2025.

  37. arXiv:2510.27140  [pdf, ps, other

    cs.CR

    Measuring the Security of Mobile LLM Agents under Adversarial Prompts from Untrusted Third-Party Channels

    Authors: Chenghao Du, Quanfeng Huang, Tingxuan Tang, Zihao Wang, Adwait Nadkarni, Yue Xiao

    Abstract: Large Language Models (LLMs) have transformed software development, enabling AI-powered applications known as LLM-based agents that promise to automate tasks across diverse apps and workflows. Yet, the security implications of deploying such agents in adversarial mobile environments remain poorly understood. In this paper, we present the first systematic study of security risks in mobile LLM agent… ▽ More

    Submitted 5 November, 2025; v1 submitted 30 October, 2025; originally announced October 2025.

  38. arXiv:2510.26231  [pdf

    cs.IR

    DiSE: A diffusion probabilistic model for automatic structure elucidation of organic compounds

    Authors: Haochen Chen, Qi Huang, Anan Wu, Wenhao Zhang, Jianliang Ye, Jianming Wu, Kai Tan, Xin Lu, Xin Xu

    Abstract: Automatic structure elucidation is essential for self-driving laboratories as it enables the system to achieve truly autonomous. This capability closes the experimental feedback loop, ensuring that machine learning models receive reliable structure information for real-time decision-making and optimization. Herein, we present DiSE, an end-to-end diffusion-based generative model that integrates mul… ▽ More

    Submitted 30 October, 2025; originally announced October 2025.

  39. arXiv:2510.25193  [pdf, ps, other

    eess.SP cs.SD

    State Space and Self-Attention Collaborative Network with Feature Aggregation for DOA Estimation

    Authors: Qi You, Qinghua Huang, Yi-Cheng Lin

    Abstract: Accurate direction-of-arrival (DOA) estimation for sound sources is challenging due to the continuous changes in acoustic characteristics across time and frequency. In such scenarios, accurate localization relies on the ability to aggregate relevant features and model temporal dependencies effectively. In time series modeling, achieving a balance between model performance and computational efficie… ▽ More

    Submitted 29 October, 2025; originally announced October 2025.

  40. arXiv:2510.24105  [pdf, ps, other

    cs.CV cs.LG

    Enhancing Pre-trained Representation Classifiability can Boost its Interpretability

    Authors: Shufan Shen, Zhaobo Qi, Junshu Sun, Qingming Huang, Qi Tian, Shuhui Wang

    Abstract: The visual representation of a pre-trained model prioritizes the classifiability on downstream tasks, while the widespread applications for pre-trained visual models have posed new requirements for representation interpretability. However, it remains unclear whether the pre-trained representations can achieve high interpretability and classifiability simultaneously. To answer this question, we qua… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

    Comments: ICLR 2025 (Spotlight)

  41. arXiv:2510.24037  [pdf, ps, other

    cs.CV cs.LG

    Kernelized Sparse Fine-Tuning with Bi-level Parameter Competition for Vision Models

    Authors: Shufan Shen, Junshu Sun, Shuhui Wang, Qingming Huang

    Abstract: Parameter-efficient fine-tuning (PEFT) aims to adapt pre-trained vision models to downstream tasks. Among PEFT paradigms, sparse tuning achieves remarkable performance by adjusting only the weights most relevant to downstream tasks, rather than densely tuning the entire weight matrix. Current methods follow a two-stage paradigm. First, it locates task-relevant weights by gradient information, whic… ▽ More

    Submitted 27 October, 2025; originally announced October 2025.

  42. arXiv:2510.23382  [pdf, ps, other

    cs.CV

    An Efficient Remote Sensing Super Resolution Method Exploring Diffusion Priors and Multi-Modal Constraints for Crop Type Mapping

    Authors: Songxi Yang, Tang Sui, Qunying Huang

    Abstract: Super resolution offers a way to harness medium even lowresolution but historically valuable remote sensing image archives. Generative models, especially diffusion models, have recently been applied to remote sensing super resolution (RSSR), yet several challenges exist. First, diffusion models are effective but require expensive training from scratch resources and have slow inference speeds. Seco… ▽ More

    Submitted 27 October, 2025; originally announced October 2025.

    Comments: 41 pages

  43. arXiv:2510.22200  [pdf, ps, other

    cs.CV

    LongCat-Video Technical Report

    Authors: Meituan LongCat Team, Xunliang Cai, Qilong Huang, Zhuoliang Kang, Hongyu Li, Shijun Liang, Liya Ma, Siyu Ren, Xiaoming Wei, Rixu Xie, Tong Zhang

    Abstract: Video generation is a critical pathway toward world models, with efficient long video inference as a key capability. Toward this end, we introduce LongCat-Video, a foundational video generation model with 13.6B parameters, delivering strong performance across multiple video generation tasks. It particularly excels in efficient and high-quality long video generation, representing our first step tow… ▽ More

    Submitted 28 October, 2025; v1 submitted 25 October, 2025; originally announced October 2025.

  44. arXiv:2510.22115  [pdf, ps, other

    cs.CL cs.AI

    Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

    Authors: Ling Team, Ang Li, Ben Liu, Binbin Hu, Bing Li, Bingwei Zeng, Borui Ye, Caizhi Tang, Changxin Tian, Chao Huang, Chao Zhang, Chen Qian, Chenchen Ju, Chenchen Li, Chengfu Tang, Chilin Fu, Chunshao Ren, Chunwei Wu, Cong Zhang, Cunyin Peng, Dafeng Xu, Daixin Wang, Dalong Zhang, Dingnan Jin, Dingyuan Zhu , et al. (117 additional authors not shown)

    Abstract: We introduce Ling 2.0, a series reasoning-oriented language foundation built upon the principle that every activation boosts reasoning capability. Designed to scale from tens of billions to one trillion parameters under a unified Mixture-of-Experts (MoE) paradigm, Ling 2.0 emphasizes high sparsity, cross-scale consistency, and efficiency guided by empirical scaling laws. The series includes three… ▽ More

    Submitted 6 November, 2025; v1 submitted 24 October, 2025; originally announced October 2025.

    Comments: Ling 2.0 Technical Report

  45. arXiv:2510.21324  [pdf, ps, other

    cs.AI cs.MA

    CXRAgent: Director-Orchestrated Multi-Stage Reasoning for Chest X-Ray Interpretation

    Authors: Jinhui Lou, Yan Yang, Zhou Yu, Zhenqi Fu, Weidong Han, Qingming Huang, Jun Yu

    Abstract: Chest X-ray (CXR) plays a pivotal role in clinical diagnosis, and a variety of task-specific and foundation models have been developed for automatic CXR interpretation. However, these models often struggle to adapt to new diagnostic tasks and complex reasoning scenarios. Recently, LLM-based agent models have emerged as a promising paradigm for CXR analysis, enhancing model's capability through too… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

    Comments: 10 pages, 4 figures, 7 Tables

  46. arXiv:2510.21323  [pdf, ps, other

    cs.CV cs.LG

    VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set

    Authors: Shufan Shen, Junshu Sun, Qingming Huang, Shuhui Wang

    Abstract: The alignment of vision-language representations endows current Vision-Language Models (VLMs) with strong multi-modal reasoning capabilities. However, the interpretability of the alignment component remains uninvestigated due to the difficulty in mapping the semantics of multi-modal representations into a unified concept set. To address this problem, we propose VL-SAE, a sparse autoencoder that en… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS 2025

  47. arXiv:2510.21267  [pdf, ps, other

    cs.LG

    Relieving the Over-Aggregating Effect in Graph Transformers

    Authors: Junshu Sun, Wanxing Chang, Chenxue Yang, Qingming Huang, Shuhui Wang

    Abstract: Graph attention has demonstrated superior performance in graph learning tasks. However, learning from global interactions can be challenging due to the large number of nodes. In this paper, we discover a new phenomenon termed over-aggregating. Over-aggregating arises when a large volume of messages is aggregated into a single node with less discrimination, leading to the dilution of the key messag… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS 2025

  48. arXiv:2510.20385  [pdf, ps, other

    cs.CV

    Positional Encoding Field

    Authors: Yunpeng Bai, Haoxiang Li, Qixing Huang

    Abstract: Diffusion Transformers (DiTs) have emerged as the dominant architecture for visual generation, powering state-of-the-art image and video models. By representing images as patch tokens with positional encodings (PEs), DiTs combine Transformer scalability with spatial and temporal inductive biases. In this work, we revisit how DiTs organize visual content and discover that patch tokens exhibit a sur… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

    Comments: 8 pages, 9 figures

  49. arXiv:2510.19405  [pdf

    cs.CY

    Designing Knowledge Tools: How Students Transition from Using to Creating Generative AI in STEAM classroom

    Authors: Qian Huang, Nachamma Sockalingam, Thijs Willems, King Wang Poon

    Abstract: This study explores how graduate students in an urban planning program transitioned from passive users of generative AI to active creators of custom GPT-based knowledge tools. Drawing on Self-Determination Theory (SDT), which emphasizes the psychological needs of autonomy, competence, and relatedness as foundations for intrinsic motivation, the research investigates how the act of designing AI too… ▽ More

    Submitted 22 October, 2025; originally announced October 2025.

    Comments: to be published in IEEE TALE 2025

  50. arXiv:2510.19342  [pdf

    cs.CY cs.AI

    To Use or to Refuse? Re-Centering Student Agency with Generative AI in Engineering Design Education

    Authors: Thijs Willems, Sumbul Khan, Qian Huang, Bradley Camburn, Nachamma Sockalingam, King Wang Poon

    Abstract: This pilot study traces students' reflections on the use of AI in a 13-week foundational design course enrolling over 500 first-year engineering and architecture students at the Singapore University of Technology and Design. The course was an AI-enhanced design course, with several interventions to equip students with AI based design skills. Students were required to reflect on whether the technol… ▽ More

    Submitted 22 October, 2025; originally announced October 2025.

    Comments: to be published in IEEE TALE 2025