Skip to main content

Showing 1–50 of 917 results for author: Sun, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.05695  [pdf, ps, other

    cs.CV

    Let Geometry GUIDE: Layer-wise Unrolling of Geometric Priors in Multimodal LLMs

    Authors: Chongyu Wang, Ting Huang, Chunyu Sun, Xinyu Ning, Di Wang, Hao Tang

    Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable progress in 2D visual tasks but still exhibit limited physical spatial awareness when processing real-world visual streams. Recently, feed-forward geometric foundation models, which implicitly extract geometric priors, have provided a new pathway to address this issue. However, existing geometry-aware MLLMs are predominantly constra… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  2. arXiv:2604.01562  [pdf, ps, other

    cs.SD cs.AI cs.CL cs.CY cs.HC

    Acoustic and perceptual differences between standard and accented Chinese speech and their voice clones

    Authors: Tianle Yang, Chengzhe Sun, Phil Rose, Siwei Lyu

    Abstract: Voice cloning is often evaluated in terms of overall quality, but less is known about accent preservation and its perceptual consequences. We compare standard and heavily accented Mandarin speech and their voice clones using a combined computational and perceptual design. Embedding-based analyses show no reliable accented-standard difference in original-clone distances across systems. In the perce… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  3. arXiv:2604.00601  [pdf, ps, other

    cs.CV

    KG-CMI: Knowledge graph enhanced cross-Mamba interaction for medical visual question answering

    Authors: Xianyao Zheng, Hong Yu, Hui Cui, Changming Sun, Xiangyu Li, Ran Su, Leyi Wei, Jia Zhou, Junbo Wang, Qiangguo Jin

    Abstract: Medical visual question answering (Med-VQA) is a crucial multimodal task in clinical decision support and telemedicine. Recent methods fail to fully leverage domain-specific medical knowledge, making it difficult to accurately associate lesion features in medical images with key diagnostic criteria. Additionally, classification-based approaches typically rely on predefined answer sets. Treating Me… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  4. arXiv:2603.29432  [pdf, ps, other

    cs.LG eess.SP

    mtslearn: Machine Learning in Python for Medical Time Series

    Authors: Zhongheng Jiang, Yuechao Zhao, Donglin Xie, Chenxi Sun, Rongchen Lu, Silu Luo, Zisheng Liang, Shenda Hong

    Abstract: Medical time-series data captures the dynamic progression of patient conditions, playing a vital role in modern clinical decision support systems. However, real-world clinical data is highly heterogeneous and inconsistently formatted. Furthermore, existing machine learning tools often have steep learning curves and fragmented workflows. Consequently, a significant gap remains between cutting-edge… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

  5. arXiv:2603.29171  [pdf, ps, other

    cs.CV cs.LG

    Segmentation of Gray Matters and White Matters from Brain MRI data

    Authors: Chang Sun, Rui Shi, Tsukasa Koike, Tetsuro Sekine, Akio Morita, Tetsuya Sakai

    Abstract: Accurate segmentation of brain tissues such as gray matter and white matter from magnetic resonance imaging is essential for studying brain anatomy, diagnosing neurological disorders, and monitoring disease progression. Traditional methods, such as FSL FAST, produce tissue probability maps but often require task-specific adjustments and face challenges with diverse imaging conditions. Recent found… ▽ More

    Submitted 4 April, 2026; v1 submitted 30 March, 2026; originally announced March 2026.

  6. arXiv:2603.27998  [pdf, ps, other

    eess.AS cs.LG

    BiFormer3D: Grid-Free Time-Domain Reconstruction of Head-Related Impulse Responses with a Spatially Encoded Transformer

    Authors: Shaoheng Xu, Chunyi Sun, Jihui Zhang, Amy Bastine, Prasanga N. Samarasinghe, Thushara D. Abhayapala, Hongdong Li

    Abstract: Individualized head-related impulse responses (HRIRs) enable binaural rendering, but dense per-listener measurements are costly. We address HRIR spatial up-sampling from sparse per-listener measurements: given a few measured HRIRs for a listener, predict HRIRs at unmeasured target directions. Prior learning methods often work in the frequency domain, rely on minimum-phase assumptions or separate t… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

    Comments: The paper was submitted for review to Interspeech 2026

  7. arXiv:2603.27690  [pdf, ps, other

    cs.CV

    Customized Visual Storytelling with Unified Multimodal LLMs

    Authors: Wei-Hua Li, Cheng Sun, Chu-Song Chen

    Abstract: Multimodal story customization aims to generate coherent story flows conditioned on textual descriptions, reference identity images, and shot types. While recent progress in story generation has shown promising results, most approaches rely on text-only inputs. A few studies incorporate character identity cues (e.g., facial ID), but lack broader multimodal conditioning. In this work, we introduce… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

    Comments: Paper accepted to the CVPR 2026 Workshop on Generative AI for Storytelling (CVPRW)

  8. arXiv:2603.27500  [pdf, ps, other

    cs.CV

    Streamlined Open-Vocabulary Human-Object Interaction Detection

    Authors: Chang Sun, Dongliang Liao, Changxing Ding

    Abstract: Open-vocabulary human-object interaction (HOI) detection aims to localize and recognize all human-object interactions in an image, including those unseen during training. Existing approaches usually rely on the collaboration between a conventional HOI detector and a Vision-Language Model (VLM) to recognize unseen HOI categories. However, feature fusion in this paradigm is challenging due to signif… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

  9. arXiv:2603.27420  [pdf, ps, other

    cs.DC cs.AI cs.LG

    CarbonEdge: Carbon-Aware Deep Learning Inference Framework for Sustainable Edge Computing

    Authors: Guilin Zhang, Wulan Guo, Ziqi Tan, Chuanyi Sun, Hailong Jiang

    Abstract: Deep learning applications at the network edge lead to a significant growth in AI-related carbon emissions, presenting a critical sustainability challenge. The existing edge computing frameworks optimize for latency and throughput, but they largely ignore the environmental impact of inference workloads. This paper introduces CarbonEdge, a carbon-aware deep learning inference framework that extends… ▽ More

    Submitted 31 March, 2026; v1 submitted 28 March, 2026; originally announced March 2026.

    ACM Class: C.2.4; I.2.6

  10. arXiv:2603.26595  [pdf, ps, other

    cs.LG hep-ex

    PQuantML: A Tool for End-to-End Hardware-aware Model Compression

    Authors: Roope Niemi, Anastasiia Petrovych, Arghya Ranjan Das, Enrico Lupi, Chang Sun, Dimitrios Danopoulos, Marlon Joshua Helbing, Mia Liu, Sebastian Dittmeier, Michael Kagan, Vladimir Loncar, Maurizio Pierini

    Abstract: PQuantML is a new open-source, hardware-aware neural network model compression library tailored to end-to-end workflows. Motivated by the need to deploy performant models to environments with strict latency constraints, PQuantML simplifies training of compressed models by providing a unified interface to apply pruning and quantization, either jointly or individually. The library implements multipl… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

  11. arXiv:2603.26183  [pdf, ps, other

    cs.CV

    DUGAE: Unified Geometry and Attribute Enhancement via Spatiotemporal Correlations for G-PCC Compressed Dynamic Point Clouds

    Authors: Pan Zhao, Hui Yuan, Chang Sun, Chongzhen Tian, Raouf Hamzaoui, Sam Kwong

    Abstract: Existing post-decoding quality enhancement methods for point clouds are designed for static data and typically process each frame independently. As a result, they cannot effectively exploit the spatiotemporal correlations present in point cloud sequences.We propose a unified geometry and attribute enhancement framework (DUGAE) for G-PCC compressed dynamic point clouds that explicitly exploits inte… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

  12. arXiv:2603.25607  [pdf

    cs.CV cs.AI

    DeepFAN, a transformer-based deep learning model for human-artificial intelligence collaborative assessment of incidental pulmonary nodules in CT scans: a multi-reader, multi-case trial

    Authors: Zhenchen Zhu, Ge Hu, Weixiong Tan, Kai Gao, Chao Sun, Zhen Zhou, Kepei Xu, Wei Han, Meixia Shang, Xiaoming Qiu, Yiqing Tan, Jinhua Wang, Zhoumeng Ying, Li Peng, Wei Song, Lan Song, Zhengyu Jin, Nan Hong, Yizhou Yu

    Abstract: The widespread adoption of CT has notably increased the number of detected lung nodules. However, current deep learning methods for classifying benign and malignant nodules often fail to comprehensively integrate global and local features, and most of them have not been validated through clinical trials. To address this, we developed DeepFAN, a transformer-based model trained on over 10K pathology… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

    Comments: 28 pages for main text and 37 pages for supplementary information, 7 figures in main text and 9 figures in supplementary information

  13. arXiv:2603.25551  [pdf, ps, other

    cs.AI

    Voxtral TTS

    Authors: Mistral-AI, :, Alexander H. Liu, Alexis Tacnet, Andy Ehrenberg, Andy Lo, Chen-Yo Sun, Guillaume Lample, Henry Lagarde, Jean-Malo Delignon, Jaeyoung Kim, John Harvill, Khyathi Raghavi Chandu, Lorenzo Signoretti, Margaret Jennings, Patrick von Platen, Pavankumar Reddy Muddireddy, Rohin Arora, Sanchit Gandhi, Samuel Humeau, Soham Ghosh, Srijan Mishra, Van Phung, Abdelaziz Bounhar, Abhinav Rastogi , et al. (164 additional authors not shown)

    Abstract: We introduce Voxtral TTS, an expressive multilingual text-to-speech model that generates natural speech from as little as 3 seconds of reference audio. Voxtral TTS adopts a hybrid architecture that combines auto-regressive generation of semantic speech tokens with flow-matching for acoustic tokens. These tokens are encoded and decoded with Voxtral Codec, a speech tokenizer trained from scratch wit… ▽ More

    Submitted 6 April, 2026; v1 submitted 26 March, 2026; originally announced March 2026.

  14. arXiv:2603.25226  [pdf, ps, other

    cs.SE cs.AI cs.CL cs.MA

    WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing

    Authors: Fanheng Kong, Jingyuan Zhang, Yang Yue, Chenxi Sun, Yang Tian, Shi Feng, Xiaocui Yang, Daling Wang, Yu Tian, Jun Du, Wenchong Zeng, Han Li, Kun Gai

    Abstract: The emergence of Large Language Models (LLMs) has catalyzed a paradigm shift in programming, giving rise to "vibe coding", where users can build complete projects and even control computers using natural language instructions. This paradigm has driven automated webpage development, but it introduces a new requirement about how to automatically verify whether the web functionalities are reliably im… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

    Comments: 24 pages, code: https://github.com/friedrichor/WebTestBench

  15. arXiv:2603.23249  [pdf, ps, other

    cs.LG cs.AI math.OC

    A Learning Method with Gap-Aware Generation for Heterogeneous DAG Scheduling

    Authors: Ruisong Zhou, Haijun Zou, Li Zhou, Chumin Sun, Zaiwen Wen

    Abstract: Efficient scheduling of directed acyclic graphs (DAGs) in heterogeneous environments is challenging due to resource capacities and dependencies. In practice, the need for adaptability across environments with varying resource pools and task types, alongside rapid schedule generation, complicates these challenges. We propose WeCAN, an end-to-end reinforcement learning framework for heterogeneous DA… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

    Comments: 30pages, 8 figures

    MSC Class: 90C27; 68T20

  16. arXiv:2603.22687  [pdf, ps, other

    cs.CV

    GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning

    Authors: Jiayin Sun, Caixia Sun, Boyu Yang, Hailin Li, Xiao Chen, Yi Zhang, Errui Ding, Liang Li, Chao Deng, Junlan Feng

    Abstract: Multimodal Large Language Models (MLLMs) have recently demonstrated remarkable perceptual and reasoning abilities. However, they struggle to perceive fine-grained geometric structures, constraining their ability of geometric understanding and visual reasoning. To address this, we propose GeoTikzBridge, a framework that enhances local geometric perception and visual reasoning through tikz-based cod… ▽ More

    Submitted 26 March, 2026; v1 submitted 23 March, 2026; originally announced March 2026.

    Comments: accepted by CVPR 2026

  17. arXiv:2603.21887  [pdf, ps, other

    cs.RO

    IGV-RRT: Prior-Real-Time Observation Fusion for Active Object Search in Changing Environments

    Authors: Wei Zhang, Ping Gong, Yujie Wang, Minghui Bai, Rongfeng Ye, Yinchuan Wang, Yachao Wang, Leilei Yao, Teng Chen, Chen Sun, Chaoqun Wang

    Abstract: Object Goal Navigation (ObjectNav) in temporally changing indoor environments is challenging because object relocation can invalidate historical scene knowledge. To address this issue, we propose a probabilistic planning framework that combines uncertainty-aware scene priors with online target relevance estimates derived from a Vision Language Model (VLM). The framework contains a dual-layer seman… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

  18. arXiv:2603.21078  [pdf, ps, other

    cs.CL cs.AI cs.SD

    Assessing the Ability of Neural TTS Systems to Model Consonant-Induced F0 Perturbation

    Authors: Tianle Yang, Chengzhe Sun, Phil Rose, Cassandra L. Jacobs, Siwei Lyu

    Abstract: This study proposes a segmental-level prosodic probing framework to evaluate neural TTS models' ability to reproduce consonant-induced f0 perturbation, a fine-grained segmental-prosodic effect that reflects local articulatory mechanisms. We compare synthetic and natural speech realizations for thousands of words, stratified by lexical frequency, using Tacotron 2 and FastSpeech 2 trained on the sam… ▽ More

    Submitted 22 March, 2026; originally announced March 2026.

    Comments: Accepted for publication in Computer Speech & Language

    Journal ref: Tianle Yang, Chengzhe Sun, Phil Rose, Cassandra L. Jacobs, and Siwei Lyu. 2026. Assessing the Ability of Neural TTS Systems to Model Consonant-Induced F0 Perturbation. Computer Speech & Language 100: 101983

  19. arXiv:2603.19637  [pdf, ps, other

    cs.CV

    UniBioTransfer: A Unified Framework for Multiple Biometrics Transfer

    Authors: Caiyi Sun, Yujing Sun, Xiangyu Li, Yuhang Zheng, Yiming Ren, Jiamin Wang, Yuexin Ma, Siu-Ming Yiu

    Abstract: Deepface generation has traditionally followed a task-driven paradigm, where distinct tasks (e.g., face transfer and hair transfer) are addressed by task-specific models. Nevertheless, this single-task setting severely limits model generalization and scalability. A unified model capable of solving multiple deepface generation tasks in a single pass represents a promising and practical direction, y… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

  20. arXiv:2603.19584  [pdf, ps, other

    cs.AI eess.SY

    PowerLens: Taming LLM Agents for Safe and Personalized Mobile Power Management

    Authors: Xingyu Feng, Chang Sun, Yuzhu Wang, Zhangbing Zhou, Chengwen Luo, Zhuangzhuang Chen, Xiaomin Ouyang, Huanqi Yang

    Abstract: Battery life remains a critical challenge for mobile devices, yet existing power management mechanisms rely on static rules or coarse-grained heuristics that ignore user activities and personal preferences. We present PowerLens, a system that tames the reasoning power of Large Language Models (LLMs) for safe and personalized mobile power management on Android devices. The key idea is that LLMs' co… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  21. arXiv:2603.18743  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Memento-Skills: Let Agents Design Agents

    Authors: Huichi Zhou, Siyuan Guo, Anjie Liu, Zhongwei Yu, Ziqin Gong, Bowen Zhao, Zhixun Chen, Menglong Zhang, Yihang Chen, Jinsong Li, Runyu Yang, Qiangbin Liu, Xinlei Yu, Jianmin Zhou, Na Wang, Chunyang Sun, Jun Wang

    Abstract: We introduce \emph{Memento-Skills}, a generalist, continually-learnable LLM agent system that functions as an \emph{agent-designing agent}: it autonomously constructs, adapts, and improves task-specific agents through experience. The system is built on a memory-based reinforcement learning framework with \emph{stateful prompts}, where reusable skills (stored as structured markdown files) serve as… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: Memento-Skills Technical Report

  22. arXiv:2603.18697  [pdf, ps, other

    cs.LG

    OCP: Orthogonal Constrained Projection for Sparse Scaling in Industrial Commodity Recommendation

    Authors: Chen Sun, Beilin Xu, Boheng Tan, Jiacheng Wang, Yuefeng Sun, Rite Bo, Ying He, Yaqiang Zang, Pinghua Gong

    Abstract: In industrial commodity recommendation systems, the representation quality of Item-Id vocabularies directly impacts the scalability and generalization ability of recommendation models. A key challenge is that traditional Item-Id vocabularies, when subjected to sparse scaling, suffer from low-frequency information interference, which restricts their expressive power for massive item sets and leads… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: 5 pages, 4 figures

  23. arXiv:2603.18477  [pdf, ps, other

    cs.PL

    Leveraging Large Language Models for Generalizing Peephole Optimizations

    Authors: Chunhao Liao, Hongxu Xu, Xintong Zhou, Zhenyang Xu, Chengnian Sun

    Abstract: Peephole optimizations are a core component of modern optimizing compilers. It rewrites specific instruction into semantically equivalent but more efficient forms. In practice, creating a new peephole optimization often starts from a concrete optimization instance and requires lifting it into a more general rewrite rule that matches a wider range of instruction patterns. This generalization step i… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  24. arXiv:2603.17610  [pdf, ps, other

    cs.LG

    AdaMuS: Adaptive Multi-view Sparsity Learning for Dimensionally Unbalanced Data

    Authors: Cai Xu, Changhao Sun, Ziyu Guan, Wei Zhao

    Abstract: Multi-view learning primarily aims to fuse multiple features to describe data comprehensively. Most prior studies implicitly assume that different views share similar dimensions. In practice, however, severe dimensional disparities often exist among different views, leading to the unbalanced multi-view learning issue. For example, in emotion recognition tasks, video frames often reach dimensions o… ▽ More

    Submitted 31 March, 2026; v1 submitted 18 March, 2026; originally announced March 2026.

    Comments: 15 pages. Submitted to IEEE Transactions on Image Processing

  25. arXiv:2603.16551  [pdf, ps, other

    cs.CV cs.AI

    CompDiff: Hierarchical Compositional Diffusion for Fair and Zero-Shot Intersectional Medical Image Generation

    Authors: Mahmoud Ibrahim, Bart Elen, Chang Sun, Gokhan Ertaylan, Michel Dumontier

    Abstract: Generative models are increasingly used to augment medical imaging datasets for fairer AI. Yet a key assumption often goes unexamined: that generators themselves produce equally high-quality images across demographic groups. Models trained on imbalanced data can inherit these imbalances, yielding degraded synthesis quality for rare subgroups and struggling with demographic intersections absent fro… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

  26. arXiv:2603.15822  [pdf, ps, other

    cs.CV

    Beyond the Embedding Bottleneck: Adaptive Retrieval-Augmented 3D CT Report Generation

    Authors: Renjie Liang, Yiling Ma, Yang Xing, Zhengkang Fan, Jinqian Pan, Chengkun Sun, Li Li, Kuang Gong, Jie Xu

    Abstract: Automated radiology report generation from 3D CT volumes often suffers from incomplete pathology coverage. We provide empirical evidence that this limitation stems from a representational bottleneck: contrastive 3D CT embeddings encode discriminative pathology signals, yet exhibit severe dimensional concentration, with as few as 2 effective dimensions out of 512. Corroborating this, scaling the la… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

  27. arXiv:2603.12702  [pdf, ps, other

    cs.IR cs.CL cs.LG

    FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning

    Authors: Chaojie Sun, Bin Cao, Tiantian Li, Chenyu Hou, Ruizhe Li, Jing Fan

    Abstract: With the rapid advancement of large language models (LLMs), growing efforts have been made on LLM-based table retrieval. However, existing studies typically focus on single-table query, and implement it by similarity matching after encoding the entire table. These methods usually result in low accuracy due to their coarse-grained encoding which incorporates much query-irrelated data, and are also… ▽ More

    Submitted 30 March, 2026; v1 submitted 13 March, 2026; originally announced March 2026.

    Comments: work in process;10pages, 5 figures, 4 tables

  28. arXiv:2603.07787  [pdf, ps, other

    cs.LG

    Vision Transformers that Never Stop Learning

    Authors: Caihao Sun, Mingqi Yuan, Shiyuan Wang, Jiayu Chen

    Abstract: Loss of plasticity refers to the progressive inability of a model to adapt to new tasks and poses a fundamental challenge for continual learning. While this phenomenon has been extensively studied in homogeneous neural architectures, such as multilayer perceptrons, its mechanisms in structurally heterogeneous, attention-based models such as Vision Transformers (ViTs) remain underexplored. In this… ▽ More

    Submitted 8 March, 2026; originally announced March 2026.

  29. arXiv:2603.07093  [pdf, ps, other

    cs.CV

    Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction

    Authors: Xu Chen, Rui Gao, Xinjie Zhang, Haoyu Zhang, Che Sun, Zhi Gao, Yuwei Wu, Yunde Jia

    Abstract: Achieving natural dyadic interaction requires generating facial expressions that are emotionally appropriate and socially aligned with human preference. Human feedback offers a compelling mechanism to guide such alignment, yet how to effectively incorporate this feedback into facial expression generation remains underexplored. In this paper, we propose a facial expression generation method aligned… ▽ More

    Submitted 7 March, 2026; originally announced March 2026.

  30. arXiv:2603.07068  [pdf, ps, other

    cs.RO

    Morphology-Independent Facial Expression Imitation for Human-Face Robots

    Authors: Xu Chen, Rui Gao, Che Sun, Zhehang Liu, Yuwei Wu, Shuo Yang, Yunde Jia

    Abstract: Accurate facial expression imitation on human-face robots is crucial for achieving natural human-robot interaction. Most existing methods have achieved photorealistic expression imitation through mapping 2D facial landmarks to a robot's actuator commands. Their imitation of landmark trajectories is susceptible to interference from facial morphology, which would lead to a performance drop. In this… ▽ More

    Submitted 7 March, 2026; originally announced March 2026.

  31. arXiv:2603.07043  [pdf, ps, other

    cs.CV

    Fine-Grained 3D Facial Reconstruction for Micro-Expressions

    Authors: Che Sun, Xinjie Zhang, Rui Gao, Xu Chen, Yuwei Wu, Yunde Jia

    Abstract: Recent advances in 3D facial expression reconstruction have demonstrated remarkable performance in capturing macro-expressions, yet the reconstruction of micro-expressions remains unexplored. This novel task is particularly challenging due to the subtle, transient, and low-intensity nature of micro-expressions, which complicate the extraction of stable and discriminative features essential for acc… ▽ More

    Submitted 7 March, 2026; originally announced March 2026.

  32. arXiv:2603.04383  [pdf, ps, other

    cs.CY cs.CR cs.IR cs.LG cs.SI

    Turning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy

    Authors: Chen Sun, Yash Vekaria, Zubair Shafiq, Rishab Nithyanand

    Abstract: YouTube has evolved into a powerful platform that where creators monetize their influence through affiliate marketing, raising concerns about transparency and ethics, especially when creators fail to disclose their affiliate relationships. Although regulatory agencies like the US Federal Trade Commission (FTC) have issued guidelines to address these issues, non-compliance and consumer harm persist… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

    Comments: ICWSM 2026

  33. arXiv:2603.03978  [pdf, ps, other

    cs.RO cs.GR

    Map-Agnostic And Interactive Safety-Critical Scenario Generation via Multi-Objective Tree Search

    Authors: Wenyun Li, Zejian Deng, Chen Sun

    Abstract: Generating safety-critical scenarios is essential for validating the robustness of autonomous driving systems, yet existing methods often struggle to produce collisions that are both realistic and diverse while ensuring explicit interaction logic among traffic participants. This paper presents a novel framework for traffic-flow level safety-critical scenario generation via multi-objective Monte Ca… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

  34. arXiv:2603.03881  [pdf, ps, other

    cs.CR cs.AI cs.CL cs.CY cs.HC

    On the Suitability of LLM-Driven Agents for Dark Pattern Audits

    Authors: Chen Sun, Yash Vekaria, Rishab Nithyanand

    Abstract: As LLM-driven agents begin to autonomously navigate the web, their ability to interpret and respond to manipulative interface design becomes critical. A fundamental question that emerges is: can such agents reliably recognize patterns of friction, misdirection, and coercion in interface design (i.e., dark patterns)? We study this question in a setting where the workflows are consequential: website… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

  35. arXiv:2603.03269  [pdf, ps, other

    cs.CV cs.LG

    LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

    Authors: Junyi Zhang, Charles Herrmann, Junhwa Hur, Chen Sun, Ming-Hsuan Yang, Forrester Cole, Trevor Darrell, Deqing Sun

    Abstract: Feedforward geometric foundation models achieve strong short-window reconstruction, yet scaling them to minutes-long videos is bottlenecked by quadratic attention complexity or limited effective memory in recurrent designs. We present LoGeR (Long-context Geometric Reconstruction), a novel architecture that scales dense 3D reconstruction to extremely long sequences without post-optimization. LoGeR… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

    Comments: Project page: https://LoGeR-project.github.io/

  36. arXiv:2603.02532  [pdf, ps, other

    cs.CV

    EIMC: Efficient Instance-aware Multi-modal Collaborative Perception

    Authors: Kang Yang, Peng Wang, Lantao Li, Tianci Bu, Chen Sun, Deying Li, Yongcai Wang

    Abstract: Multi-modal collaborative perception calls for great attention to enhancing the safety of autonomous driving. However, current multi-modal approaches remain a ``local fusion to communication'' sequence, which fuses multi-modal data locally and needs high bandwidth to transmit an individual's feature data before collaborative fusion. EIMC innovatively proposes an early collaborative paradigm. It in… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

    Comments: 9 pages, 8 figures, 7 tables

  37. arXiv:2602.22381  [pdf, ps, other

    cs.CV cs.AI

    Enhancing Renal Tumor Malignancy Prediction: Deep Learning with Automatic 3D CT Organ Focused Attention

    Authors: Zhengkang Fan, Chengkun Sun, Russell Terry, Jie Xu, Longin Jan Latecki

    Abstract: Accurate prediction of malignancy in renal tumors is crucial for informing clinical decisions and optimizing treatment strategies. However, existing imaging modalities lack the necessary accuracy to reliably predict malignancy before surgical intervention. While deep learning has shown promise in malignancy prediction using 3D CT images, traditional approaches often rely on manual segmentation to… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

    Comments: 5 pages, 2 figures, Accepted at IEEE ISBI 2026

  38. arXiv:2602.21611  [pdf, ps, other

    cs.SE cs.AI

    Structurally Aligned Subtask-Level Memory for Software Engineering Agents

    Authors: Kangning Shen, Jingyuan Zhang, Chenxi Sun, Wencong Zeng, Yang Yue

    Abstract: Large Language Models (LLMs) have demonstrated significant potential as autonomous software engineering (SWE) agents. Recent work has further explored augmenting these agents with memory mechanisms to support long-horizon reasoning. However, these approaches typically operate at a coarse instance granularity, treating the entire problem-solving episode as the atomic unit of storage and retrieval.… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

  39. arXiv:2602.20537  [pdf, ps, other

    cs.CV

    PFGNet: A Fully Convolutional Frequency-Guided Peripheral Gating Network for Efficient Spatiotemporal Predictive Learning

    Authors: Xinyong Cai, Changbin Sun, Yong Wang, Hongyu Yang, Yuankai Wu

    Abstract: Spatiotemporal predictive learning (STPL) aims to forecast future frames from past observations and is essential across a wide range of applications. Compared with recurrent or hybrid architectures, pure convolutional models offer superior efficiency and full parallelism, yet their fixed receptive fields limit their ability to adaptively capture spatially varying motion patterns. Inspired by biolo… ▽ More

    Submitted 19 March, 2026; v1 submitted 23 February, 2026; originally announced February 2026.

    Comments: Accepted to CVPR 2026

  40. arXiv:2602.18746  [pdf, ps, other

    cs.CV

    MIRROR: Multimodal Iterative Reasoning via Reflection on Visual Regions

    Authors: Haoyu Zhang, Yuwei Wu, Pengxiang Li, Xintong Zhang, Zhi Gao, Rui Gao, Mingyang Gao, Che Sun, Yunde Jia

    Abstract: In the era of Vision-Language Models (VLMs), enhancing multimodal reasoning capabilities remains a critical challenge, particularly in handling ambiguous or complex visual inputs, where initial inferences often lead to hallucinations or logic errors. Existing VLMs often produce plausible yet ungrounded answers, and even when prompted to "reflect", their corrections may remain detached from the ima… ▽ More

    Submitted 24 February, 2026; v1 submitted 21 February, 2026; originally announced February 2026.

  41. arXiv:2602.18422  [pdf, ps, other

    cs.CV

    Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

    Authors: Linxi Xie, Lisong C. Sun, Ashley Neall, Tong Wu, Shengqu Cai, Gordon Wetzstein

    Abstract: Extended reality (XR) demands generative models that respond to users' tracked real-world motion, yet current video world models accept only coarse control signals such as text or keyboard input, limiting their utility for embodied interaction. We introduce a human-centric video world model that is conditioned on both tracked head pose and joint-level hand poses. For this purpose, we evaluate exis… ▽ More

    Submitted 20 February, 2026; originally announced February 2026.

    Comments: Project page here: https://codeysun.github.io/generated-reality

  42. arXiv:2602.14135  [pdf, ps, other

    cs.AI cs.CR cs.CY

    ForesightSafety Bench: A Frontier Risk Evaluation and Governance Framework towards Safe AI

    Authors: Haibo Tong, Feifei Zhao, Linghao Feng, Ruoyu Wu, Ruolin Chen, Lu Jia, Zhou Zhao, Jindong Li, Tenglong Li, Erliang Lin, Shuai Yang, Enmeng Lu, Yinqian Sun, Qian Zhang, Zizhe Ruan, Jinyu Fan, Zeyang Yue, Ping Wu, Huangrui Li, Chengyi Sun, Yi Zeng

    Abstract: Rapidly evolving AI exhibits increasingly strong autonomy and goal-directed capabilities, accompanied by derivative systemic risks that are more unpredictable, difficult to control, and potentially irreversible. However, current AI safety evaluation systems suffer from critical limitations such as restricted risk dimensions and failed frontier risk detection. The lagging safety benchmarks and alig… ▽ More

    Submitted 26 February, 2026; v1 submitted 15 February, 2026; originally announced February 2026.

  43. arXiv:2602.13335  [pdf, ps, other

    cs.CV

    Meningioma Analysis and Diagnosis using Limited Labeled Samples

    Authors: Jiamiao Lu, Wei Wu, Ke Gao, Ping Mao, Weichuan Zhang, Tuo Wang, Lingkun Ma, Jiapan Guo, Zanyi Wu, Yuqing Hu, Changming Sun

    Abstract: The biological behavior and treatment response of meningiomas depend on their grade, making an accurate diagnosis essential for treatment planning and prognosis assessment. We observed that the weighted fusion of spatial-frequency domain features significantly influences meningioma classification performance. Notably, the contribution of specific frequency bands obtained by discrete wavelet transf… ▽ More

    Submitted 11 February, 2026; originally announced February 2026.

    Comments: 19 pages,7 figures

  44. arXiv:2602.12732  [pdf, ps, other

    cs.NI

    PEMI: Transparent Performance Enhancements for QUIC

    Authors: Jie Zhang, Lei Zhang, Ziyi Wang, Chenxiang Sun, Yuming Hu, Xiaohui Xie, Zeqi Lai, Yong Cui

    Abstract: QUIC, as the transport layer of the next-generation Web stack (HTTP/3), natively provides security and performance improvements over TCP-based stacks. However, since QUIC provides end-to-end encryption for both data and packet headers, in-network assistance like Performance-Enhancing Proxy (PEP) is unavailable for QUIC. To achieve the similar optimization as TCP, some works seek to collaborate end… ▽ More

    Submitted 13 February, 2026; originally announced February 2026.

  45. arXiv:2602.11298  [pdf, ps, other

    cs.AI

    Voxtral Realtime

    Authors: Mistral-AI, :, Alexander H. Liu, Andy Ehrenberg, Andy Lo, Chen-Yo Sun, Guillaume Lample, Jean-Malo Delignon, Khyathi Raghavi Chandu, Patrick von Platen, Pavankumar Reddy Muddireddy, Rohin Arora, Sanchit Gandhi, Sandeep Subramanian, Soham Ghosh, Srijan Mishra, Abhinav Rastogi, Adrien Sadé, Alan Jeffares, Albert Jiang, Alexandre Cahill, Alexandre Gavaudan, Alexandre Sablayrolles, Amélie Héliou, Amos You , et al. (144 additional authors not shown)

    Abstract: We introduce Voxtral Realtime, a natively streaming automatic speech recognition model that matches offline transcription quality at sub-second latency. Unlike approaches that adapt offline models through chunking or sliding windows, Voxtral Realtime is trained end-to-end for streaming, with explicit alignment between audio and text streams. Our architecture builds on the Delayed Streams Modeling… ▽ More

    Submitted 6 April, 2026; v1 submitted 11 February, 2026; originally announced February 2026.

  46. Fast Person Detection Using YOLOX With AI Accelerator For Train Station Safety

    Authors: Mas Nurul Achmadiah, Novendra Setyawan, Achmad Arif Bryantono, Chi-Chia Sun, Wen-Kai Kuo

    Abstract: Recently, Image processing has advanced Faster and applied in many fields, including health, industry, and transportation. In the transportation sector, object detection is widely used to improve security, for example, in traffic security and passenger crossings at train stations. Some accidents occur in the train crossing area at the station, like passengers uncarefully when passing through the y… ▽ More

    Submitted 11 February, 2026; originally announced February 2026.

    Comments: 6 pages, 8 figures, 2 tables. Presented at 2024 International Electronics Symposium (IES). IEEE DOI: 10.1109/IES63037.2024.10665874

    Journal ref: 2024 International Electronics Symposium (IES), pp. 504-509, 2024

  47. arXiv:2602.09870  [pdf, ps, other

    cs.CL

    Steer2Edit: From Activation Steering to Component-Level Editing

    Authors: Chung-En Sun, Ge Yan, Zimo Wang, Tsui-Wei Weng

    Abstract: Steering methods influence Large Language Model behavior by identifying semantic directions in hidden representations, but are typically realized through inference-time activation interventions that apply a fixed, global modification to the model's internal states. While effective, such interventions often induce unfavorable attribute-utility trade-offs under strong control, as they ignore the fac… ▽ More

    Submitted 2 March, 2026; v1 submitted 10 February, 2026; originally announced February 2026.

  48. arXiv:2602.09712  [pdf, ps, other

    cs.CL

    TraceMem: Weaving Narrative Memory Schemata from User Conversational Traces

    Authors: Yiming Shu, Pei Liu, Tiange Zhang, Ruiyang Gao, Jun Ma, Chen Sun

    Abstract: Sustaining long-term interactions remains a bottleneck for Large Language Models (LLMs), as their limited context windows struggle to manage dialogue histories that extend over time. Existing memory systems often treat interactions as disjointed snippets, failing to capture the underlying narrative coherence of the dialogue stream. We propose TraceMem, a cognitively-inspired framework that weaves… ▽ More

    Submitted 10 February, 2026; originally announced February 2026.

  49. Energy-Efficient Fast Object Detection on Edge Devices for IoT Systems

    Authors: Mas Nurul Achmadiah, Afaroj Ahamad, Chi-Chia Sun, Wen-Kai Kuo

    Abstract: This paper presents an Internet of Things (IoT) application that utilizes an AI classifier for fast-object detection using the frame difference method. This method, with its shorter duration, is the most efficient and suitable for fast-object detection in IoT systems, which require energy-efficient applications compared to end-to-end methods. We have implemented this technique on three edge device… ▽ More

    Submitted 10 February, 2026; originally announced February 2026.

    Comments: 14 pages, 12 figures

    Journal ref: IEEE Internet of Things Journal, vol. 12, no. 11, pp. 16681-16693, June 2025

  50. arXiv:2602.07774  [pdf, ps, other

    cs.IR cs.AI

    Generative Reasoning Re-ranker

    Authors: Mingfu Liang, Yufei Li, Jay Xu, Kavosh Asadi, Xi Liu, Shuo Gu, Kaushik Rangadurai, Frank Shyu, Shuaiwen Wang, Song Yang, Zhijing Li, Jiang Liu, Mengying Sun, Fei Tian, Xiaohan Wei, Chonglin Sun, Jacob Tao, Shike Mei, Wenlin Chen, Santanu Kolay, Sandeep Pandey, Hamed Firooz, Luke Simon

    Abstract: Recent studies increasingly explore Large Language Models (LLMs) as a new paradigm for recommendation systems due to their scalability and world knowledge. However, existing work has three key limitations: (1) most efforts focus on retrieval and ranking, while the reranking phase, critical for refining final recommendations, is largely overlooked; (2) LLMs are typically used in zero-shot or superv… ▽ More

    Submitted 22 February, 2026; v1 submitted 7 February, 2026; originally announced February 2026.

    Comments: 31 pages