Skip to main content

Showing 1–50 of 65 results for author: Sheng, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.28474  [pdf, ps, other

    cs.CV cs.AI

    CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains

    Authors: Wenhan Wang, Zhixiang Zhou, Zhongtian Ma, Yanzhu Chen, Ziyu Lin, Hao Sheng, Pengfei Liu, Honglin Ma, Wenqi Shao, Qiaosheng Zhang, Yu Qiao

    Abstract: The connoisseurship of antique Chinese porcelain demands extensive historical expertise, material understanding, and aesthetic sensitivity, making it difficult for non-specialists to engage. To democratize cultural-heritage understanding and assist expert connoisseurship, we introduce CiQi-Agent -- a domain-specific Porcelain Connoisseurship Agent for intelligent analysis of antique Chinese porcel… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

  2. arXiv:2603.25188  [pdf, ps, other

    cs.CV

    AnyID: Ultra-Fidelity Universal Identity-Preserving Video Generation from Any Visual References

    Authors: Jiahao Wang, Hualian Sheng, Sijia Cai, Yuxiao Yang, Weizhan Zhang, Caixia Yan, Bing Deng, Jieping Ye

    Abstract: Identity-preserving video generation offers powerful tools for creative expression, allowing users to customize videos featuring their beloved characters. However, prevailing methods are typically designed and optimized for a single identity reference. This underlying assumption restricts creative flexibility by inadequately accommodating diverse real-world input formats. Relying on a single sourc… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

  3. arXiv:2603.10178  [pdf, ps, other

    cs.CV cs.CL

    Video-Based Reward Modeling for Computer-Use Agents

    Authors: Linxin Song, Jieyu Zhang, Huanxin Sheng, Taiwei Shi, Gupta Rahul, Yang Liu, Ranjay Krishna, Jian Kang, Jieyu Zhao

    Abstract: Computer-using agents (CUAs) are becoming increasingly capable; however, it remains difficult to scale evaluation of whether a trajectory truly fulfills a user instruction. In this work, we study reward modeling from execution video: a sequence of keyframes from an agent trajectory that is independent of the agent's internal reasoning or actions. Although video-execution modeling is method-agnosti… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

  4. arXiv:2602.20530  [pdf, ps, other

    cs.LG cs.SD eess.AS

    Memory-guided Prototypical Co-occurrence Learning for Mixed Emotion Recognition

    Authors: Ming Li, Yong-Jin Liu, Fang Liu, Huankun Sheng, Yeying Fan, Yixiang Wei, Minnan Luo, Weizhan Zhang, Wenping Wang

    Abstract: Emotion recognition from multi-modal physiological and behavioral signals plays a pivotal role in affective computing, yet most existing models remain constrained to the prediction of singular emotions in controlled laboratory settings. Real-world human emotional experiences, by contrast, are often characterized by the simultaneous presence of multiple affective states, spurring recent interest in… ▽ More

    Submitted 23 February, 2026; originally announced February 2026.

  5. arXiv:2602.10801  [pdf

    cs.CL cs.LG

    Deep Learning-based Method for Expressing Knowledge Boundary of Black-Box LLM

    Authors: Haotian Sheng, Heyong Wang, Ming Hong, Hongman He, Junqiu Liu

    Abstract: Large Language Models (LLMs) have achieved remarkable success, however, the emergence of content generation distortion (hallucination) limits their practical applications. The core cause of hallucination lies in LLMs' lack of awareness regarding their stored internal knowledge, preventing them from expressing their knowledge state on questions beyond their internal knowledge boundaries, as humans… ▽ More

    Submitted 11 February, 2026; originally announced February 2026.

  6. arXiv:2602.03237  [pdf, ps, other

    cs.LG cs.CL

    Merging Beyond: Streaming LLM Updates via Activation-Guided Rotations

    Authors: Yuxuan Yao, Haonan Sheng, Qingsong Lv, Han Wu, Shuqi Liu, Zehua Liu, Zengyan Liu, Jiahui Gao, Haochen Tan, Xiaojin Fu, Haoli Bai, Hing Cheung So, Zhijiang Guo, Linqi Song

    Abstract: The escalating scale of Large Language Models (LLMs) necessitates efficient adaptation techniques. Model merging has gained prominence for its efficiency and controllability. However, existing merging techniques typically serve as post-hoc refinements or focus on mitigating task interference, often failing to capture the dynamic optimization benefits of supervised fine-tuning (SFT). In this work,… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

  7. arXiv:2601.03267  [pdf, ps, other

    cs.CL cs.AI

    OpenAI GPT-5 System Card

    Authors: Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, Akshay Nathan, Alan Luo, Alec Helyar, Aleksander Madry, Aleksandr Efremov, Aleksandra Spyra, Alex Baker-Whitcomb, Alex Beutel, Alex Karpenko, Alex Makelov, Alex Neitz, Alex Wei, Alexandra Barr, Alexandre Kirchmeyer, Alexey Ivanov , et al. (459 additional authors not shown)

    Abstract: This is the system card published alongside the OpenAI GPT-5 launch, August 2025. GPT-5 is a unified system with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say 'think hard about this' in… ▽ More

    Submitted 19 December, 2025; originally announced January 2026.

  8. arXiv:2512.18814  [pdf, ps, other

    cs.CV

    EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer

    Authors: Yuxiao Yang, Hualian Sheng, Sijia Cai, Jing Lin, Jiahao Wang, Bing Deng, Junzhe Lu, Haoqian Wang, Jieping Ye

    Abstract: Video generation models have advanced significantly, yet they still struggle to synthesize complex human movements due to the high degrees of freedom in human articulation. This limitation stems from the intrinsic constraints of pixel-only training objectives, which inherently bias models toward appearance fidelity at the expense of learning underlying kinematic principles. To address this, we int… ▽ More

    Submitted 21 December, 2025; originally announced December 2025.

    Comments: 26 pages, 16 figures

  9. arXiv:2512.02685  [pdf, ps, other

    cs.CV

    Unsupervised Structural Scene Decomposition via Foreground-Aware Slot Attention with Pseudo-Mask Guidance

    Authors: Huankun Sheng, Ming Li, Yixiang Wei, Yeying Fan, Yu-Hui Wen, Tieliang Gong, Yong-Jin Liu

    Abstract: Recent advances in object-centric representation learning have shown that slot attention-based methods can effectively decompose visual scenes into object slot representations without supervision. However, existing approaches typically process foreground and background regions indiscriminately, often resulting in background interference and suboptimal instance discovery performance on real-world d… ▽ More

    Submitted 10 December, 2025; v1 submitted 2 December, 2025; originally announced December 2025.

  10. arXiv:2510.02204  [pdf, ps, other

    cs.CL

    Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents

    Authors: Lingzhong Dong, Ziqi Zhou, Shuaibo Yang, Haiyue Sheng, Pengzhou Cheng, Zongru Wu, Zheng Wu, Gongshen Liu, Zhuosheng Zhang

    Abstract: Mobile-use agents powered by vision-language models (VLMs) have shown great potential in interpreting natural language instructions and generating corresponding actions based on mobile graphical user interface. Recent studies suggest that incorporating chain-of-thought (CoT) reasoning tends to improve the execution accuracy. However, existing evaluations emphasize execution accuracy while neglecti… ▽ More

    Submitted 2 October, 2025; originally announced October 2025.

  11. arXiv:2510.00120  [pdf

    cs.HC cs.RO

    The Formation of Trust in Autonomous Vehicles after Interacting with Robotaxis on Public Roads

    Authors: Xiang Chang, Zhijie Yi, Yichang Liu, Hongling Sheng, Dengbo He

    Abstract: This study investigates how pedestrian trust, receptivity, and behavior evolve during interactions with Level-4 autonomous vehicles (AVs) at uncontrolled urban intersections in a naturalistic setting. While public acceptance is critical for AV adoption, most prior studies relied on simplified simulations or field tests. We conducted a real-world experiment in a commercial Robotaxi operation zone,… ▽ More

    Submitted 30 September, 2025; originally announced October 2025.

    Comments: Proceedings of the 69th HFES International Annual Meeting

  12. arXiv:2509.18658  [pdf, ps, other

    cs.CL

    Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction

    Authors: Huanxin Sheng, Xinyi Liu, Hangfeng He, Jieyu Zhao, Jian Kang

    Abstract: LLM-as-a-judge has become a promising paradigm for using large language models (LLMs) to evaluate natural language generation (NLG), but the uncertainty of its evaluation remains underexplored. This lack of reliability may limit its deployment in many applications. This work presents the first framework to analyze the uncertainty by offering a prediction interval of LLM-based scoring via conformal… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

    Comments: To appear in EMNLP 2025. Our code and data are available at \url{https://github.com/BruceSheng1202/Analyzing_Uncertainty_of_LLM-as-a-Judge

  13. arXiv:2509.13615  [pdf, ps, other

    cs.AI cs.CL cs.HC

    See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles

    Authors: Zongru Wu, Rui Mao, Zhiyuan Tian, Pengzhou Cheng, Tianjie Ju, Zheng Wu, Lingzhong Dong, Haiyue Sheng, Zhuosheng Zhang, Gongshen Liu

    Abstract: The advent of multimodal agents facilitates effective interaction within graphical user interface (GUI), especially in ubiquitous GUI control. However, their inability to reliably execute toggle control instructions remains a key bottleneck. To investigate this, we construct a state control benchmark with binary toggle instructions derived from public datasets. Evaluation results of existing agent… ▽ More

    Submitted 18 March, 2026; v1 submitted 16 September, 2025; originally announced September 2025.

    Comments: Accepted at CVPR 2026

  14. arXiv:2508.16420  [pdf, ps, other

    cs.LG

    Double Check My Desired Return: Transformer with Target Alignment for Offline Reinforcement Learning

    Authors: Yue Pei, Hongming Zhang, Chao Gao, Martin Müller, Mengxiao Zhu, Hao Sheng, Ziliang Chen, Liang Lin, Haogang Zhu

    Abstract: Offline reinforcement learning (RL) has achieved significant advances in domains such as robotic control, autonomous driving, and medical decision-making. Most existing methods primarily focus on training policies that maximize cumulative returns from a given dataset. However, many real-world applications require precise control over policy performance levels, rather than simply pursuing the best… ▽ More

    Submitted 28 September, 2025; v1 submitted 22 August, 2025; originally announced August 2025.

  15. arXiv:2508.11467  [pdf, ps, other

    cs.DC cs.PF

    Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method

    Authors: Shifang Liu, Huiyuan Li, Hongjiao Sheng, Haoyuan Gui, Xiaoyu Zhang

    Abstract: Singular Value Decomposition (SVD) is a fundamental matrix factorization technique in linear algebra, widely applied in numerous matrix-related problems. However, traditional SVD approaches are hindered by slow panel factorization and frequent CPU-GPU data transfers in heterogeneous systems, despite advancements in GPU computational capabilities. In this paper, we introduce a GPU-centered SVD algo… ▽ More

    Submitted 15 August, 2025; originally announced August 2025.

  16. arXiv:2506.15838  [pdf, ps, other

    cs.CV

    EchoShot: Multi-Shot Portrait Video Generation

    Authors: Jiahao Wang, Hualian Sheng, Sijia Cai, Weizhan Zhang, Caixia Yan, Yachuang Feng, Bing Deng, Jieping Ye

    Abstract: Video diffusion models substantially boost the productivity of artistic workflows with high-quality portrait video generative capacity. However, prevailing pipelines are primarily constrained to single-shot creation, while real-world applications urge for multiple shots with identity consistency and flexible content controllability. In this work, we propose EchoShot, a native and scalable multi-sh… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  17. arXiv:2505.03807  [pdf, other

    cs.HC cs.AI cs.CV cs.MA

    Facilitating Video Story Interaction with Multi-Agent Collaborative System

    Authors: Yiwen Zhang, Jianing Hao, Zhan Wang, Hongling Sheng, Wei Zeng

    Abstract: Video story interaction enables viewers to engage with and explore narrative content for personalized experiences. However, existing methods are limited to user selection, specially designed narratives, and lack customization. To address this, we propose an interactive system based on user intent. Our system uses a Vision Language Model (VLM) to enable machines to understand video stories, combini… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: Prepared and submitted in 2024

  18. arXiv:2504.17384  [pdf, other

    physics.geo-ph cs.AI

    On the workflow, opportunities and challenges of developing foundation model in geophysics

    Authors: Hanlin Sheng, Xinming Wu, Hang Gao, Haibin Di, Sergey Fomel, Jintao Li, Xu Si

    Abstract: Foundation models, as a mainstream technology in artificial intelligence, have demonstrated immense potential across various domains in recent years, particularly in handling complex tasks and multimodal data. In the field of geophysics, although the application of foundation models is gradually expanding, there is currently a lack of comprehensive reviews discussing the full workflow of integrati… ▽ More

    Submitted 25 April, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

  19. Knitting Robots: A Deep Learning Approach for Reverse-Engineering Fabric Patterns

    Authors: Haoliang Sheng, Songpu Cai, Xingyu Zheng, Meng Cheng Lau

    Abstract: Knitting, a cornerstone of textile manufacturing, is uniquely challenging to automate, particularly in terms of converting fabric designs into precise, machine-readable instructions. This research bridges the gap between textile production and robotic automation by proposing a novel deep learning-based pipeline for reverse knitting to integrate vision-based robotic systems into textile manufacturi… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Journal ref: Electronics, 14(8), 1605 (2025)

  20. arXiv:2504.05343  [pdf, other

    cs.LG cs.AI

    AROMA: Autonomous Rank-one Matrix Adaptation

    Authors: Hao Nan Sheng, Zhi-yong Wang, Mingrui Yang, Hing Cheung So

    Abstract: As large language models continue to grow in size, parameter-efficient fine-tuning (PEFT) has become increasingly crucial. While low-rank adaptation (LoRA) offers a solution through low-rank updates, its static rank allocation may yield suboptimal results. Adaptive low-rank adaptation (AdaLoRA) improves this with dynamic allocation but remains sensitive to initial and target rank configurations. W… ▽ More

    Submitted 11 April, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

  21. arXiv:2412.16720  [pdf, other

    cs.AI

    OpenAI o1 System Card

    Authors: OpenAI, :, Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, Alex Iftimie, Alex Karpenko, Alex Tachard Passos, Alexander Neitz, Alexander Prokofiev, Alexander Wei, Allison Tam, Ally Bennett, Ananya Kumar, Andre Saraiva, Andrea Vallone, Andrew Duberstein, Andrew Kondrich , et al. (238 additional authors not shown)

    Abstract: The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-ar… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

  22. EVA-S3PC: Efficient, Verifiable, Accurate Secure Matrix Multiplication Protocol Assembly and Its Application in Regression

    Authors: Shizhao Peng, Tianrui Liu, Tianle Tao, Derun Zhao, Hao Sheng, Haogang Zhu

    Abstract: Efficient multi-party secure matrix multiplication is crucial for privacy-preserving machine learning, but existing mixed-protocol frameworks often face challenges in balancing security, efficiency, and accuracy. This paper presents an efficient, verifiable and accurate secure three-party computing (EVA-S3PC) framework that addresses these challenges with elementary 2-party and 3-party matrix oper… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

    Comments: 18 pages,22 figures

  23. arXiv:2410.19488  [pdf, other

    cs.CV

    MM-WLAuslan: Multi-View Multi-Modal Word-Level Australian Sign Language Recognition Dataset

    Authors: Xin Shen, Heming Du, Hongwei Sheng, Shuyun Wang, Hui Chen, Huiqiang Chen, Zhuojie Wu, Xiaobiao Du, Jiaying Ying, Ruihan Lu, Qingzheng Xu, Xin Yu

    Abstract: Isolated Sign Language Recognition (ISLR) focuses on identifying individual sign language glosses. Considering the diversity of sign languages across geographical regions, developing region-specific ISLR datasets is crucial for supporting communication and research. Auslan, as a sign language specific to Australia, still lacks a dedicated large-scale word-level dataset for the ISLR task. To fill t… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  24. arXiv:2409.04962  [pdf, other

    physics.geo-ph cs.LG

    A foundation model enpowered by a multi-modal prompt engine for universal seismic geobody interpretation across surveys

    Authors: Hang Gao, Xinming Wu, Luming Liang, Hanlin Sheng, Xu Si, Gao Hui, Yaxing Li

    Abstract: Seismic geobody interpretation is crucial for structural geology studies and various engineering applications. Existing deep learning methods show promise but lack support for multi-modal inputs and struggle to generalize to different geobody types or surveys. We introduce a promptable foundation model for interpreting any geobodies across seismic surveys. This model integrates a pre-trained visio… ▽ More

    Submitted 13 September, 2024; v1 submitted 7 September, 2024; originally announced September 2024.

  25. arXiv:2408.17274  [pdf, ps, other

    cs.LG eess.SP

    The Transferability of Downsamped Sparse Graph Convolutional Networks

    Authors: Qinji Shu, Hang Sheng, Feng Ji, Hui Feng, Bo Hu

    Abstract: To accelerate the training of graph convolutional networks (GCNs) on real-world large-scale sparse graphs, downsampling methods are commonly employed as a preprocessing step. However, the effects of graph sparsity and topological structure on the transferability of downsampling methods have not been rigorously analyzed or theoretically guaranteed, particularly when the topological structure is aff… ▽ More

    Submitted 8 September, 2024; v1 submitted 30 August, 2024; originally announced August 2024.

  26. arXiv:2408.12396  [pdf, other

    cs.CV physics.geo-ph

    Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis

    Authors: Zhixiang Guo, Xinming Wu, Luming Liang, Hanlin Sheng, Nuo Chen, Zhengfa Bi

    Abstract: We explore adapting foundation models (FMs) from the computer vision domain to geoscience. FMs, large neural networks trained on massive datasets, excel in diverse tasks with remarkable adaptability and generality. However, geoscience faces challenges like lacking curated training datasets and high computational costs for developing specialized FMs. This study considers adapting FMs from computer… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  27. FTF-ER: Feature-Topology Fusion-Based Experience Replay Method for Continual Graph Learning

    Authors: Jinhui Pang, Changqing Lin, Xiaoshuai Hao, Rong Yin, Zixuan Wang, Zhihui Zhang, Jinglin He, Huang Tai Sheng

    Abstract: Continual graph learning (CGL) is an important and challenging task that aims to extend static GNNs to dynamic task flow scenarios. As one of the mainstream CGL methods, the experience replay (ER) method receives widespread attention due to its superior performance. However, existing ER methods focus on identifying samples by feature significance or topological relevance, which limits their utiliz… ▽ More

    Submitted 8 August, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Multimedia 2024

  28. arXiv:2407.06109  [pdf, ps, other

    cs.CV

    PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models

    Authors: Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye, Shuhang Gu

    Abstract: Controllable generation is considered a potentially vital approach to address the challenge of annotating 3D data, and the precision of such controllable generation becomes particularly imperative in the context of data production for autonomous driving. Existing methods focus on the integration of diverse generative information into controlling inputs, utilizing frameworks such as GLIGEN or Contr… ▽ More

    Submitted 15 July, 2025; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by ICCV 2025

  29. arXiv:2406.08152  [pdf, other

    cs.CV

    CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer

    Authors: Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Qiao Liang, Min-Jian Zhao, Jieping Ye

    Abstract: The field of 3D object detection from point clouds is rapidly advancing in computer vision, aiming to accurately and efficiently detect and localize objects in three-dimensional space. Current 3D detectors commonly fall short in terms of flexibility and scalability, with ample room for advancements in performance. In this paper, our objective is to address these limitations by introducing two fram… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 19 pages, 8 figures

  30. arXiv:2406.04875  [pdf, ps, other

    cs.CV

    3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

    Authors: Xiaobiao Du, Yida Wang, Haiyang Sun, Zhuojie Wu, Hongwei Sheng, Shuyun Wang, Jiaying Ying, Ming Lu, Tianqing Zhu, Kun Zhan, Xin Yu

    Abstract: 3D cars are commonly used in self-driving systems, virtual/augmented reality, and games. However, existing 3D car datasets are either synthetic or low-quality, limiting their applications in practical scenarios and presenting a significant gap toward high-quality real-world 3D car datasets. In this paper, we propose the first large-scale 3D real car dataset, termed 3DRealCar, offering three distin… ▽ More

    Submitted 29 June, 2025; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Project Page: https://xiaobiaodu.github.io/3drealcar

    Journal ref: ICCV2025

  31. arXiv:2405.10681  [pdf, other

    cs.IR

    Know in AdVance: Linear-Complexity Forecasting of Ad Campaign Performance with Evolving User Interest

    Authors: XiaoYu Wang, YongHui Guo, Hui Sheng, Peili Lv, Chi Zhou, Wei Huang, ShiQin Ta, Dongbo Huang, XiuJin Yang, Lan Xu, Hao Zhou, Yusheng Ji

    Abstract: Real-time Bidding (RTB) advertisers wish to \textit{know in advance} the expected cost and yield of ad campaigns to avoid trial-and-error expenses. However, Campaign Performance Forecasting (CPF), a sequence modeling task involving tens of thousands of ad auctions, poses challenges of evolving user interest, auction representation, and long context, making coarse-grained and static-modeling method… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures, accepted at ACM SIGKDD 2024

  32. arXiv:2405.09883  [pdf, other

    cs.CV

    RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception

    Authors: Xiaosu Zhu, Hualian Sheng, Sijia Cai, Bing Deng, Shaopeng Yang, Qiao Liang, Ken Chen, Lianli Gao, Jingkuan Song, Jieping Ye

    Abstract: We introduce RoScenes, the largest multi-view roadside perception dataset, which aims to shed light on the development of vision-centric Bird's Eye View (BEV) approaches for more challenging traffic scenes. The highlights of RoScenes include significantly large perception area, full scene coverage and crowded traffic. More specifically, our dataset achieves surprising 21.13M 3D annotations within… ▽ More

    Submitted 4 July, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: ECCV 2024. Extended version. 33 pages, 21 figures, 13 tables. https://github.com/xiaosu-zhu/RoScenes

  33. arXiv:2402.06499  [pdf, other

    cs.CV

    BarlowTwins-CXR : Enhancing Chest X-Ray abnormality localization in heterogeneous data with cross-domain self-supervised learning

    Authors: Haoyue Sheng, Linrui Ma, Jean-Francois Samson, Dianbo Liu

    Abstract: Background: Chest X-ray imaging-based abnormality localization, essential in diagnosing various diseases, faces significant clinical challenges due to complex interpretations and the growing workload of radiologists. While recent advances in deep learning offer promising solutions, there is still a critical issue of domain inconsistency in cross-domain transfer learning, which hampers the efficien… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 15 pages, 7 figures, 3 tables

    ACM Class: I.2.1; J.3; I.4.9

  34. arXiv:2310.05425  [pdf, other

    cs.AI cs.CV

    Divide and Ensemble: Progressively Learning for the Unknown

    Authors: Hu Zhang, Xin Shen, Heming Du, Huiqiang Chen, Chen Liu, Hongwei Sheng, Qingzheng Xu, MD Wahiduzzaman Khan, Qingtao Yu, Tianqing Zhu, Scott Chapman, Zi Huang, Xin Yu

    Abstract: In the wheat nutrient deficiencies classification challenge, we present the DividE and EnseMble (DEEM) method for progressive test data predictions. We find that (1) test images are provided in the challenge; (2) samples are equipped with their collection dates; (3) the samples of different dates show notable discrepancies. Based on the findings, we partition the dataset into discrete groups by th… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  35. arXiv:2309.02320  [pdf, other

    physics.geo-ph cs.AI cs.LG

    SeisCLIP: A seismology foundation model pre-trained by multi-modal data for multi-purpose seismic feature extraction

    Authors: Xu Si, Xinming Wu, Hanlin Sheng, Jun Zhu, Zefeng Li

    Abstract: Training specific deep learning models for particular tasks is common across various domains within seismology. However, this approach encounters two limitations: inadequate labeled data for certain tasks and limited generalization across regions. To address these challenges, we develop SeisCLIP, a seismology foundation model trained through contrastive learning from multi-modal data. It consists… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: 27 pages, 9 figures, 4 tables

  36. arXiv:2307.06577  [pdf, other

    cs.CV cs.AI

    RVD: A Handheld Device-Based Fundus Video Dataset for Retinal Vessel Segmentation

    Authors: MD Wahiduzzaman Khan, Hongwei Sheng, Hu Zhang, Heming Du, Sen Wang, Minas Theodore Coroneo, Farshid Hajati, Sahar Shariflou, Michael Kalloniatis, Jack Phu, Ashish Agar, Zi Huang, Mojtaba Golzan, Xin Yu

    Abstract: Retinal vessel segmentation is generally grounded in image-based datasets collected with bench-top devices. The static images naturally lose the dynamic characteristics of retina fluctuation, resulting in diminished dataset richness, and the usage of bench-top devices further restricts dataset scalability due to its limited accessibility. Considering these limitations, we introduce the first video… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

  37. arXiv:2306.14397  [pdf, other

    cs.SE cs.CY

    Discriminating Human-authored from ChatGPT-Generated Code Via Discernable Feature Analysis

    Authors: Li Ke, Hong Sheng, Fu Cai, Zhang Yunhe, Liu Ming

    Abstract: The ubiquitous adoption of Large Language Generation Models (LLMs) in programming has underscored the importance of differentiating between human-written code and code generated by intelligent models. This paper specifically aims to distinguish code generated by ChatGPT from that authored by humans. Our investigation reveals disparities in programming style, technical level, and readability betwee… ▽ More

    Submitted 4 July, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

    Comments: 11 pages, 8 figures, 3 tables

  38. arXiv:2306.14176  [pdf, other

    cs.CL

    Sentence-level Event Detection without Triggers via Prompt Learning and Machine Reading Comprehension

    Authors: Tongtao Ling, Lei Chen, Huangxu Sheng, Zicheng Cai, Hai-Lin Liu

    Abstract: The traditional way of sentence-level event detection involves two important subtasks: trigger identification and trigger classifications, where the identified event trigger words are used to classify event types from sentences. However, trigger classification highly depends on abundant annotated trigger words and the accuracy of trigger identification. In a real scenario, annotating trigger words… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: 14 pages, accepted by ADMA 2023

  39. arXiv:2305.06043  [pdf, other

    cs.CV

    Autonomous Stabilization of Retinal Videos for Streamlining Assessment of Spontaneous Venous Pulsations

    Authors: Hongwei Sheng, Xin Yu, Feiyu Wang, MD Wahiduzzaman Khan, Hexuan Weng, Sahar Shariflou, S. Mojtaba Golzan

    Abstract: Spontaneous retinal Venous Pulsations (SVP) are rhythmic changes in the caliber of the central retinal vein and are observed in the optic disc region (ODR) of the retina. Its absence is a critical indicator of various ocular or neurological abnormalities. Recent advances in imaging technology have enabled the development of portable smartphone-based devices for observing the retina and assessment… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: EMBC, 4 pages, 6 figures

  40. arXiv:2301.01842  [pdf, other

    cs.CV cs.CY

    Detecting Neighborhood Gentrification at Scale via Street-level Visual Data

    Authors: Tianyuan Huang, Timothy Dai, Zhecheng Wang, Hesu Yoon, Hao Sheng, Andrew Y. Ng, Ram Rajagopal, Jackelyn Hwang

    Abstract: Neighborhood gentrification plays a significant role in shaping the social and economic well-being of both individuals and communities at large. While some efforts have been made to detect gentrification in cities, existing approaches rely mainly on estimated measures from survey data, require substantial work of human labeling, and are limited in characterizing the neighborhood as a whole. We pro… ▽ More

    Submitted 4 January, 2023; originally announced January 2023.

  41. arXiv:2210.17163  [pdf, other

    cs.LO

    HHLPy: Practical Verification of Hybrid Systems using Hoare Logic

    Authors: Huanhuan Sheng, Alexander Bentkamp, Bohua Zhan

    Abstract: We present a tool for verification of hybrid systems expressed in the sequential fragment of HCSP (Hybrid Communicating Sequential Processes). The tool permits annotating HCSP programs with pre- and postconditions, invariants, and proof rules for reasoning about ordinary differential equations. Verification conditions are generated from the annotations following the rules of hybrid Hoare logic. We… ▽ More

    Submitted 21 February, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

  42. arXiv:2208.14027  [pdf, ps, other

    cs.IT math.NA

    Optimal Probabilistic Constellation Shaping for Covert Communications

    Authors: Shuai Ma, Yunqi Zhang, Haihong Sheng, Hang Li, Jia Shi, Long Yang, Youlong Wu, Naofal Al-Dhahir, Shiyin Li

    Abstract: In this paper, we investigate the optimal probabilistic constellation shaping design for covert communication systems from a practical view. Different from conventional covert communications with equiprobable constellations modulation, we propose nonequiprobable constellations modulation schemes to further enhance the covert rate. Specifically, we derive covert rate expressions for practical discr… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

  43. arXiv:2208.05878  [pdf, ps, other

    cs.IT eess.SP

    Covert Beamforming Design for Integrated Radar Sensing and Communication Systems

    Authors: Shuai Ma, Haihong Sheng, Ruixin Yang, Hang Li, Youlong Wu, Chao Shen, Naofal Al-Dhahir, Shiyin Li

    Abstract: We propose covert beamforming design frameworks for integrated radar sensing and communication (IRSC) systems, where the radar can covertly communicate with legitimate users under the cover of the probing waveforms without being detected by the eavesdropper. Specifically, by jointly designing the target detection beamformer and communication beamformer, we aim to maximize the radar detection mutua… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

  44. arXiv:2207.09332  [pdf, other

    cs.CV

    Rethinking IoU-based Optimization for Single-stage 3D Object Detection

    Authors: Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Jianqiang Huang, Xian-Sheng Hua, Min-Jian Zhao, Gim Hee Lee

    Abstract: Since Intersection-over-Union (IoU) based optimization maintains the consistency of the final IoU prediction metric and losses, it has been widely used in both regression and classification branches of single-stage 2D object detectors. Recently, several 3D object detection methods adopt IoU-based optimization and directly replace the 2D IoU with 3D IoU. However, such a direct computation in 3D is… ▽ More

    Submitted 20 July, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV2022. The code is available at https://github.com/hlsheng1/RDIoU

  45. arXiv:2201.12693  [pdf, other

    cs.CV cs.CY

    Extracting Built Environment Features for Planning Research with Computer Vision: A Review and Discussion of State-of-the-Art Approaches

    Authors: Meiqing Li, Hao Sheng

    Abstract: This is an extended abstract for a presentation at The 17th International Conference on CUPUM - Computational Urban Planning and Urban Management in June 2021. This study presents an interdisciplinary synthesis of the state-of-the-art approaches in computer vision technologies to extract built environment features that could improve the robustness of empirical research in planning. We discussed th… ▽ More

    Submitted 21 March, 2022; v1 submitted 29 January, 2022; originally announced January 2022.

    Comments: CUPUM 2021 (The 17th International Conference on Computational Urban Planning and Urban Management)

  46. arXiv:2108.10723  [pdf, other

    cs.CV

    Improving 3D Object Detection with Channel-wise Transformer

    Authors: Hualian Sheng, Sijia Cai, Yuan Liu, Bing Deng, Jianqiang Huang, Xian-Sheng Hua, Min-Jian Zhao

    Abstract: Though 3D object detection from point clouds has achieved rapid progress in recent years, the lack of flexible and high-performance proposal refinement remains a great hurdle for existing state-of-the-art two-stage detectors. Previous works on refining 3D proposals have relied on human-designed components such as keypoints sampling, set abstraction and multi-scale feature fusion to produce powerfu… ▽ More

    Submitted 14 September, 2021; v1 submitted 22 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV2021

  47. arXiv:2106.06515  [pdf, other

    cs.LG stat.ME stat.ML

    Probability Paths and the Structure of Predictions over Time

    Authors: Zhiyuan Jerry Lin, Hao Sheng, Sharad Goel

    Abstract: In settings ranging from weather forecasts to political prognostications to financial projections, probability estimates of future binary outcomes often evolve over time. For example, the estimated likelihood of rain on a specific day changes by the hour as new information becomes available. Given a collection of such probability paths, we introduce a Bayesian framework -- which we call the Gaussi… ▽ More

    Submitted 4 November, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  48. arXiv:2105.02489  [pdf, other

    cs.LG cs.CV

    Learning Neighborhood Representation from Multi-Modal Multi-Graph: Image, Text, Mobility Graph and Beyond

    Authors: Tianyuan Huang, Zhecheng Wang, Hao Sheng, Andrew Y. Ng, Ram Rajagopal

    Abstract: Recent urbanization has coincided with the enrichment of geotagged data, such as street view and point-of-interest (POI). Region embedding enhanced by the richer data modalities has enabled researchers and city administrators to understand the built environment, socioeconomics, and the dynamics of cities better. While some efforts have been made to simultaneously use multi-modal inputs, existing m… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

  49. arXiv:2105.01764  [pdf, other

    cs.CY cs.CV

    Surveilling Surveillance: Estimating the Prevalence of Surveillance Cameras with Street View Data

    Authors: Hao Sheng, Keniel Yao, Sharad Goel

    Abstract: The use of video surveillance in public spaces -- both by government agencies and by private citizens -- has attracted considerable attention in recent years, particularly in light of rapid advances in face-recognition technology. But it has been difficult to systematically measure the prevalence and placement of cameras, hampering efforts to assess the implications of surveillance on privacy and… ▽ More

    Submitted 30 August, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: We now credit Turtiainen et al. (2020) both for creating a state-of-the-art camera detection model and for suggesting that computer vision could, in theory, be applied to street view data to map surveillance cameras. Also, we discovered a coding error in our image sampling strategy that corrupted our analysis of camera density over time. We have now removed the results of that analysis

  50. arXiv:2101.09645  [pdf, other

    cs.LG stat.ML

    Multi-Task Time Series Forecasting With Shared Attention

    Authors: Zekai Chen, Jiaze E, Xiao Zhang, Hao Sheng, Xiuzheng Cheng

    Abstract: Time series forecasting is a key component in many industrial and business decision processes and recurrent neural network (RNN) based models have achieved impressive progress on various time series forecasting tasks. However, most of the existing methods focus on single-task forecasting problems by learning separately based on limited supervised objectives, which often suffer from insufficient tr… ▽ More

    Submitted 23 January, 2021; originally announced January 2021.

    Comments: Accepted by ICDMW 2020