Skip to main content

Showing 1–50 of 118 results for author: Ke, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.20073  [pdf, ps, other

    cs.AR

    3D Stack In-Sensor-Computing (3DS-ISC): Accelerating Time-Surface Construction for Neuromorphic Event Cameras

    Authors: Hongyang Shang, Shuai Dong, Ye Ke, Arindam Basu

    Abstract: This work proposes a 3D Stack In-Sensor-Computing (3DS-ISC) architecture for efficient event-based vision processing. A real-time normalization method using an exponential decay function is introduced to construct the time-surface, reducing hardware usage while preserving temporal information. The circuit design utilizes the leakage characterization of Dynamic Random Access Memory(DRAM) for timest… ▽ More

    Submitted 23 December, 2025; originally announced December 2025.

  2. arXiv:2512.06613  [pdf, ps, other

    cs.CV

    Hierarchical Deep Learning for Diatom Image Classification: A Multi-Level Taxonomic Approach

    Authors: Yueying Ke

    Abstract: Accurate taxonomic identification of diatoms is essential for aquatic ecosystem monitoring, yet conventional methods depend heavily on expert taxonomists. Recent deep learning approaches improve automation, but most treat diatom recognition as flat classification, predicting only one taxonomic rank. We investigate whether embedding taxonomic hierarchy into neural network architectures can improve… ▽ More

    Submitted 11 December, 2025; v1 submitted 6 December, 2025; originally announced December 2025.

    Comments: Version 2: Corrected reference details, improved architectural diagram, and enhanced writing for clarity and precision. Added a table illustrating the masking mechanism. No changes to experimental results or conclusions. 11 pages, 6 figures, 3 tables

    ACM Class: I.4.8; I.5.4

  3. arXiv:2512.06362  [pdf, ps, other

    cs.AR

    A 33.6-136.2 TOPS/W Nonlinear Analog Computing-In-Memory Macro for Multi-bit LSTM Accelerator in 65 nm CMOS

    Authors: Junyi Yang, Xinyu Luo, Ye Ke, Zheng Wang, Hongyang Shang, Shuai Dong, Zhengnan Fu, Xiaofeng Yang, Hongjie Liu, Arindam Basu

    Abstract: The energy efficiency of analog computing-in-memory (ACIM) accelerator for recurrent neural networks, particularly long short-term memory (LSTM) network, is limited by the high proportion of nonlinear (NL) operations typically executed digitally. To address this, we propose an LSTM accelerator incorporating an ACIM macro with reconfigurable (1-5 bit) nonlinear in-memory (NLIM) analog-to-digital co… ▽ More

    Submitted 6 December, 2025; originally announced December 2025.

  4. arXiv:2512.02346  [pdf, ps, other

    cs.AR

    Near-Memory Architecture for Threshold-Ordinal Surface-Based Corner Detection of Event Cameras

    Authors: Hongyang Shang, An Guo, Shuai Dong, Junyi Yang, Ye Ke, Arindam Basu

    Abstract: Event-based Cameras (EBCs) are widely utilized in surveillance and autonomous driving applications due to their high speed and low power consumption. Corners are essential low-level features in event-driven computer vision, and novel algorithms utilizing event-based representations, such as Threshold-Ordinal Surface (TOS), have been developed for corner detection. However, the implementation of th… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

  5. arXiv:2511.22166  [pdf, ps, other

    cs.AR

    CADC: Crossbar-Aware Dendritic Convolution for Efficient In-memory Computing

    Authors: Shuai Dong, Junyi Yang, Ye Ke, Hongyang Shang, Arindam Basu

    Abstract: Convolutional neural networks (CNNs) are computationally intensive and often accelerated using crossbar-based in-memory computing (IMC) architectures. However, large convolutional layers must be partitioned across multiple crossbars, generating numerous partial sums (psums) that require additional buffer, transfer, and accumulation, thus introducing significant system-level overhead. Inspired by d… ▽ More

    Submitted 27 November, 2025; originally announced November 2025.

  6. arXiv:2511.22117  [pdf, ps, other

    cs.CR cs.CC

    Privacy-preserving formal concept analysis: A homomorphic encryption-based concept construction

    Authors: Qiangqiang Chen, Yunfeng Ke, Shen Li, Jinhai Li

    Abstract: Formal Concept Analysis (FCA) is extensively used in knowledge extraction, cognitive concept learning, and data mining. However, its computational demands on large-scale datasets often require outsourcing to external computing services, raising concerns about the leakage of sensitive information. To address this challenge, we propose a novel approach to enhance data security and privacy in FCA-bas… ▽ More

    Submitted 27 November, 2025; originally announced November 2025.

  7. arXiv:2511.12626  [pdf, ps, other

    cs.CR cs.GT

    Prrr: Personal Random Rewards for Blockchain Reporting

    Authors: Hongyin Chen, Yubin Ke, Xiaotie Deng, Ittay Eyal

    Abstract: Smart contracts, the stateful programs running on blockchains, often rely on reports. Publishers are paid to publish these reports on the blockchain. Designing protocols that incentivize timely reporting is the prevalent reporting problem. But existing solutions face a security-performance trade-off: Relying on a small set of trusted publishers introduces centralization risks, while allowing open… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

  8. arXiv:2511.09147  [pdf, ps, other

    cs.CV cs.AI

    PressTrack-HMR: Pressure-Based Top-Down Multi-Person Global Human Mesh Recovery

    Authors: Jiayue Yuan, Fangting Xie, Guangwen Ouyang, Changhai Ma, Ziyu Wu, Heyu Ding, Quan Wan, Yi Ke, Yuchen Wu, Xiaohui Cai

    Abstract: Multi-person global human mesh recovery (HMR) is crucial for understanding crowd dynamics and interactions. Traditional vision-based HMR methods sometimes face limitations in real-world scenarios due to mutual occlusions, insufficient lighting, and privacy concerns. Human-floor tactile interactions offer an occlusion-free and privacy-friendly alternative for capturing human motion. Existing resear… ▽ More

    Submitted 13 November, 2025; v1 submitted 12 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI-2026

  9. arXiv:2511.05901  [pdf

    cs.CL cs.AI

    Retrieval-Augmented Generation in Medicine: A Scoping Review of Technical Implementations, Clinical Applications, and Ethical Considerations

    Authors: Rui Yang, Matthew Yu Heng Wong, Huitao Li, Xin Li, Wentao Zhu, Jingchi Liao, Kunyu Yu, Jonathan Chong Kai Liew, Weihao Xuan, Yingjian Chen, Yuhe Ke, Jasmine Chiat Ling Ong, Douglas Teodoro, Chuan Hong, Daniel Shi Wei Ting, Nan Liu

    Abstract: The rapid growth of medical knowledge and increasing complexity of clinical practice pose challenges. In this context, large language models (LLMs) have demonstrated value; however, inherent limitations remain. Retrieval-augmented generation (RAG) technologies show potential to enhance their clinical applicability. This study reviewed RAG applications in medicine. We found that research primarily… ▽ More

    Submitted 13 November, 2025; v1 submitted 8 November, 2025; originally announced November 2025.

  10. arXiv:2510.26830  [pdf, ps, other

    cs.LG cs.CR

    SmoothGuard: Defending Multimodal Large Language Models with Noise Perturbation and Clustering Aggregation

    Authors: Guangzhi Su, Shuchang Huang, Yutong Ke, Zhuohang Liu, Long Qian, Kaizhu Huang

    Abstract: Multimodal large language models (MLLMs) have achieved impressive performance across diverse tasks by jointly reasoning over textual and visual inputs. Despite their success, these models remain highly vulnerable to adversarial manipulations, raising concerns about their safety and reliability in deployment. In this work, we first generalize an approach for generating adversarial images within the… ▽ More

    Submitted 29 October, 2025; originally announced October 2025.

  11. arXiv:2510.22033  [pdf, ps, other

    cs.LG q-bio.QM stat.ML

    Linearized Optimal Transport for Analysis of High-Dimensional Point-Cloud and Single-Cell Data

    Authors: Tianxiang Wang, Yingtong Ke, Dhananjay Bhaskar, Smita Krishnaswamy, Alexander Cloninger

    Abstract: Single-cell technologies generate high-dimensional point clouds of cells, enabling detailed characterization of complex patient states and treatment responses. Yet each patient is represented by an irregular point cloud rather than a simple vector, making it difficult to directly quantify and compare biological differences between individuals. Nonlinear methods such as kernels and neural networks… ▽ More

    Submitted 29 October, 2025; v1 submitted 24 October, 2025; originally announced October 2025.

    Comments: 11 pages, 5 figures

    MSC Class: 68T05

  12. arXiv:2510.08614  [pdf

    cs.CL

    Gender Bias in Large Language Models for Healthcare: Assignment Consistency and Clinical Implications

    Authors: Mingxuan Liu, Yuhe Ke, Wentao Zhu, Mayli Mertens, Yilin Ning, Jingchi Liao, Chuan Hong, Daniel Shu Wei Ting, Yifan Peng, Danielle S. Bitterman, Marcus Eng Hock Ong, Nan Liu

    Abstract: The integration of large language models (LLMs) into healthcare holds promise to enhance clinical decision-making, yet their susceptibility to biases remains a critical concern. Gender has long influenced physician behaviors and patient outcomes, raising concerns that LLMs assuming human-like roles, such as clinicians or medical educators, may replicate or amplify gender-related biases. Using case… ▽ More

    Submitted 7 October, 2025; originally announced October 2025.

  13. arXiv:2509.24350  [pdf, ps, other

    cs.CV cs.AI

    Dynamic Orchestration of Multi-Agent System for Real-World Multi-Image Agricultural VQA

    Authors: Yan Ke, Xin Yu, Heming Du, Scott Chapman, Helen Huang

    Abstract: Agricultural visual question answering is essential for providing farmers and researchers with accurate and timely knowledge. However, many existing approaches are predominantly developed for evidence-constrained settings such as text-only queries or single-image cases. This design prevents them from coping with real-world agricultural scenarios that often require multi-image inputs with complemen… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: 13 pages, 2 figures, 2 tables

  14. arXiv:2509.24231  [pdf

    cs.CV

    EVLF-FM: Explainable Vision Language Foundation Model for Medicine

    Authors: Yang Bai, Haoran Cheng, Yang Zhou, Jun Zhou, Arun Thirunavukarasu, Yuhe Ke, Jie Yao, Kanae Fukutsu, Chrystie Wan Ning Quek, Ashley Hong, Laura Gutierrez, Zhen Ling Teo, Darren Shu Jeng Ting, Brian T. Soetikno, Christopher S. Nielsen, Tobias Elze, Zengxiang Li, Linh Le Dinh, Hiok Hong Chan, Victor Koh, Marcus Tan, Kelvin Z. Li, Leonard Yip, Ching Yu Cheng, Yih Chung Tham , et al. (18 additional authors not shown)

    Abstract: Despite the promise of foundation models in medical AI, current systems remain limited - they are modality-specific and lack transparent reasoning processes, hindering clinical adoption. To address this gap, we present EVLF-FM, a multimodal vision-language foundation model (VLM) designed to unify broad diagnostic capability with fine-grain explainability. The development and testing of EVLF-FM enc… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  15. arXiv:2509.18910  [pdf, ps, other

    cs.CV

    MoiréNet: A Compact Dual-Domain Network for Image Demoiréing

    Authors: Shuwei Guo, Simin Luan, Yan Ke, Zeyd Boukhers, John See, Cong Yang

    Abstract: Moiré patterns arise from spectral aliasing between display pixel lattices and camera sensor grids, manifesting as anisotropic, multi-scale artifacts that pose significant challenges for digital image demoiréing. We propose MoiréNet, a convolutional neural U-Net-based framework that synergistically integrates frequency and spatial domain features for effective artifact removal. MoiréNet introduces… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

  16. arXiv:2509.10059  [pdf, ps, other

    cs.CV cs.AI

    Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration

    Authors: Yue Zhou, Litong Feng, Mengcheng Lan, Xue Yang, Qingyun Li, Yiping Ke, Xue Jiang, Wayne Zhang

    Abstract: Mathematical reasoning is critical for tasks such as precise distance and area computations, trajectory estimations, and spatial analysis in unmanned aerial vehicle (UAV) based remote sensing, yet current vision-language models (VLMs) have not been adequately tested in this domain. To address this gap, we introduce AVI-Math, the first benchmark to rigorously evaluate multimodal mathematical reason… ▽ More

    Submitted 12 September, 2025; originally announced September 2025.

    Comments: 17 pages, 16 figures

  17. arXiv:2509.06321  [pdf, ps, other

    cs.CV

    Text4Seg++: Advancing Image Segmentation via Generative Language Modeling

    Authors: Mengcheng Lan, Chaofeng Chen, Jiaxing Xu, Zongrui Li, Yiping Ke, Xudong Jiang, Yingchen Yu, Yunqing Zhao, Song Bai

    Abstract: Multimodal Large Language Models (MLLMs) have shown exceptional capabilities in vision-language tasks. However, effectively integrating image segmentation into these models remains a significant challenge. In this work, we propose a novel text-as-mask paradigm that casts image segmentation as a text generation problem, eliminating the need for additional decoders and significantly simplifying the… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

    Comments: Extended version of our conference paper arXiv:2410.09855

  18. arXiv:2508.16648  [pdf, ps, other

    cs.LG cs.AI physics.flu-dyn

    LatentFlow: Cross-Frequency Experimental Flow Reconstruction from Sparse Pressure via Latent Mapping

    Authors: Junle Liu, Chang Liu, Yanyu Ke, Qiuxiang Huang, Jiachen Zhao, Wenliang Chen, K. T. Tse, Gang Hu

    Abstract: Acquiring temporally high-frequency and spatially high-resolution turbulent wake flow fields in particle image velocimetry (PIV) experiments remains a significant challenge due to hardware limitations and measurement noise. In contrast, temporal high-frequency measurements of spatially sparse wall pressure are more readily accessible in wind tunnel experiments. In this study, we propose a novel cr… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

    Comments: The paper is submitted to IAAI26. Total 9 pages with 8 figures

  19. arXiv:2508.03183  [pdf

    physics.flu-dyn cs.AI cs.CE

    Spatiotemporal wall pressure forecast of a rectangular cylinder with physics-aware DeepU-Fourier neural network

    Authors: Junle Liu, Chang Liu, Yanyu Ke, Wenliang Chen, Kihing Shum, Tim K. T. Tse, Gang Hu

    Abstract: The wall pressure is of great importance in understanding the forces and structural responses induced by fluid. Recent works have investigated the potential of deep learning techniques in predicting mean pressure coefficients and fluctuating pressure coefficients, but most of existing deep learning frameworks are limited to predicting a single snapshot using full spatial information. To forecast s… ▽ More

    Submitted 7 December, 2025; v1 submitted 5 August, 2025; originally announced August 2025.

  20. arXiv:2508.01285  [pdf, ps, other

    cs.AI cs.IR stat.AP

    BioDisco: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation

    Authors: Yujing Ke, Kevin George, Kathan Pandya, David Blumenthal, Maximilian Sprang, Gerrit Großmann, Sebastian Vollmer, David Antony Selby

    Abstract: Identifying novel hypotheses is essential to scientific research, yet this process risks being overwhelmed by the sheer volume and complexity of available information. Existing automated methods often struggle to generate novel and evidence-grounded hypotheses, lack robust iterative refinement and rarely undergo rigorous temporal evaluation for future discovery potential. To address this, we propo… ▽ More

    Submitted 24 November, 2025; v1 submitted 2 August, 2025; originally announced August 2025.

    Comments: 12 pages main content, 31 including appendices. 8 figures

  21. arXiv:2507.00185  [pdf

    eess.IV cs.AI cs.CV

    Multimodal, Multi-Disease Medical Imaging Foundation Model (MerMED-FM)

    Authors: Yang Zhou, Chrystie Wan Ning Quek, Jun Zhou, Yan Wang, Yang Bai, Yuhe Ke, Jie Yao, Laura Gutierrez, Zhen Ling Teo, Darren Shu Jeng Ting, Brian T. Soetikno, Christopher S. Nielsen, Tobias Elze, Zengxiang Li, Linh Le Dinh, Lionel Tim-Ee Cheng, Tran Nguyen Tuan Anh, Chee Leong Cheng, Tien Yin Wong, Nan Liu, Iain Beehuat Tan, Tony Kiat Hon Lim, Rick Siow Mong Goh, Yong Liu, Daniel Shu Wei Ting

    Abstract: Current artificial intelligence models for medical imaging are predominantly single modality and single disease. Attempts to create multimodal and multi-disease models have resulted in inconsistent clinical accuracy. Furthermore, training these models typically requires large, labour-intensive, well-labelled datasets. We developed MerMED-FM, a state-of-the-art multimodal, multi-specialty foundatio… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: 42 pages, 3 composite figures, 4 tables

  22. arXiv:2506.23667  [pdf, ps, other

    cs.CL

    L0: Reinforcement Learning to Become General Agents

    Authors: Junjie Zhang, Jingyi Xi, Zhuoyang Song, Junyu Lu, Yuhua Ke, Ting Sun, Yukun Yang, Jiaxing Zhang, Songxin Zhang, Zejian Xie

    Abstract: Training large language models (LLMs) to act as autonomous agents for multi-turn, long-horizon tasks remains significant challenges in scalability and training efficiency. To address this, we introduce L-Zero (L0), a scalable, end-to-end training pipeline for general-purpose agents. Featuring a low-cost, extensible, and sandboxed concurrent agent worker pool, L0 lowers the barrier for applying rei… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  23. arXiv:2506.10014  [pdf, ps, other

    cs.LG

    NOCL: Node-Oriented Conceptualization LLM for Graph Tasks without Message Passing

    Authors: Wei Li, Mengcheng Lan, Jiaxing Xu, Yiping Ke

    Abstract: Graphs are essential for modeling complex interactions across domains such as social networks, biology, and recommendation systems. Traditional Graph Neural Networks, particularly Message Passing Neural Networks (MPNNs), rely heavily on supervised learning, limiting their generalization and applicability in label-scarce scenarios. Recent self-supervised approaches still require labeled fine-tuning… ▽ More

    Submitted 28 May, 2025; originally announced June 2025.

    Comments: 10 pages, 4 figures. arXiv admin note: text overlap with arXiv:1703.00552, arXiv:1403.2844 by other authors

  24. arXiv:2506.06221  [pdf, ps, other

    cs.RO cs.LG

    BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly

    Authors: Yan Shen, Ruihai Wu, Yubin Ke, Xinyuan Song, Zeyi Li, Xiaoqi Li, Hongwei Fan, Haoran Lu, Hao dong

    Abstract: Shape assembly, the process of combining parts into a complete whole, is a crucial robotic skill with broad real-world applications. Among various assembly tasks, geometric assembly--where broken parts are reassembled into their original form (e.g., reconstructing a shattered bowl)--is particularly challenging. This requires the robot to recognize geometric cues for grasping, assembly, and subsequ… ▽ More

    Submitted 10 June, 2025; v1 submitted 6 June, 2025; originally announced June 2025.

    Comments: ICML 2025

  25. arXiv:2505.22362  [pdf, ps, other

    cs.LG

    Directed Homophily-Aware Graph Neural Network

    Authors: Aihu Zhang, Jiaxing Xu, Mengcheng Lan, Shili Xiang, Yiping Ke

    Abstract: Graph Neural Networks (GNNs) have achieved significant success in various learning tasks on graph-structured data. Nevertheless, most GNNs struggle to generalize to heterophilic neighborhoods. Additionally, many GNNs ignore the directional nature of real-world graphs, resulting in suboptimal performance on directed graphs with asymmetric structures. In this work, we propose Directed Homophily-awar… ▽ More

    Submitted 30 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  26. arXiv:2505.14190  [pdf, ps, other

    cs.LG cs.AI

    $α$-GAN by Rényi Cross Entropy

    Authors: Ni Ding, Miao Qiao, Jiaxing Xu, Yiping Ke, Xiaoyu Zhang

    Abstract: This paper proposes $α$-GAN, a generative adversarial network using Rényi measures. The value function is formulated, by Rényi cross entropy, as an expected certainty measure incurred by the discriminator's soft decision as to where the sample is from, true population or the generator. The discriminator tries to maximize the Rényi certainty about sample source, while the generator wants to reduce… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  27. arXiv:2505.10261  [pdf

    cs.CL cs.AI

    The Evolving Landscape of Generative Large Language Models and Traditional Natural Language Processing in Medicine

    Authors: Rui Yang, Huitao Li, Matthew Yu Heng Wong, Yuhe Ke, Xin Li, Kunyu Yu, Jingchi Liao, Jonathan Chong Kai Liew, Sabarinath Vinod Nair, Jasmine Chiat Ling Ong, Irene Li, Douglas Teodoro, Chuan Hong, Daniel Shu Wei Ting, Nan Liu

    Abstract: Natural language processing (NLP) has been traditionally applied to medicine, and generative large language models (LLMs) have become prominent recently. However, the differences between them across different medical tasks remain underexplored. We analyzed 19,123 studies, finding that generative LLMs demonstrate advantages in open-ended tasks, while traditional NLP dominates in information extract… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  28. Deep Learning Empowered Sub-Diffraction Terahertz Backpropagation Single-Pixel Imaging

    Authors: Yongsheng Zhu, Shaojing Liu, Ximiao Wang, Runli Li, Haili Yang, Jiali Wang, Hongjia Zhu, Yanlin Ke, Ningsheng Xu, Huanjun Chen, Shaozhi Deng

    Abstract: Terahertz single-pixel imaging (THz SPI) has garnered widespread attention for its potential to overcome challenges associated with THz focal plane arrays. However, the inherently long wavelength of THz waves limits imaging resolution, while achieving subwavelength resolution requires harsh experimental conditions and time-consuming processes. Here, we propose a sub-diffraction THz backpropagation… ▽ More

    Submitted 3 August, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  29. arXiv:2505.06544  [pdf, ps, other

    eess.SP cs.NE

    Event-based Neural Spike Detection Using Spiking Neural Networks for Neuromorphic iBMI Systems

    Authors: Chanwook Hwang, Biyan Zhou, Ye Ke, Vivek Mohan, Jong Hwan Ko, Arindam Basu

    Abstract: Implantable brain-machine interfaces (iBMIs) are evolving to record from thousands of neurons wirelessly but face challenges in data bandwidth, power consumption, and implant size. We propose a novel Spiking Neural Network Spike Detector (SNN-SPD) that processes event-based neural data generated via delta modulation and pulse count modulation, converting signals into sparse events. By leveraging t… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 4 pages, 2 figures, to be published in 2025 IEEE International Symposium on Circuits and Systems (ISCAS) proceedings

  30. arXiv:2504.18829  [pdf, ps, other

    cs.RO cs.CV

    Dexonomy: Synthesizing All Dexterous Grasp Types in a Grasp Taxonomy

    Authors: Jiayi Chen, Yubin Ke, Lin Peng, He Wang

    Abstract: Generalizable dexterous grasping with suitable grasp types is a fundamental skill for intelligent robots. Developing such skills requires a large-scale and high-quality dataset that covers numerous grasp types (i.e., at least those categorized by the GRASP taxonomy), but collecting such data is extremely challenging. Existing automatic grasp synthesis methods are often limited to specific grasp ty… ▽ More

    Submitted 2 September, 2025; v1 submitted 26 April, 2025; originally announced April 2025.

    Comments: Accepted by Robotics: Science and Systems (RSS 2025)

  31. arXiv:2504.17261  [pdf, other

    cs.LG cs.AI

    Symbolic Representation for Any-to-Any Generative Tasks

    Authors: Jiaqi Chen, Xiaoye Zhu, Yue Wang, Tianyang Liu, Xinhui Chen, Ying Chen, Chak Tou Leong, Yifei Ke, Joseph Liu, Yiwen Yuan, Julian McAuley, Li-jia Li

    Abstract: We propose a symbolic generative task description language and a corresponding inference engine capable of representing arbitrary multimodal tasks as structured symbolic flows. Unlike conventional generative models that rely on large-scale training and implicit neural representations to learn cross-modal mappings, often at high computational cost and with limited flexibility, our framework introdu… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  32. arXiv:2504.16096  [pdf, other

    q-bio.NC cs.AI cs.CV

    BrainPrompt: Multi-Level Brain Prompt Enhancement for Neurological Condition Identification

    Authors: Jiaxing Xu, Kai He, Yue Tang, Wei Li, Mengcheng Lan, Xia Dong, Yiping Ke, Mengling Feng

    Abstract: Neurological conditions, such as Alzheimer's Disease, are challenging to diagnose, particularly in the early stages where symptoms closely resemble healthy controls. Existing brain network analysis methods primarily focus on graph-based models that rely solely on imaging data, which may overlook important non-imaging factors and limit the model's predictive power and interpretability. In this pape… ▽ More

    Submitted 19 May, 2025; v1 submitted 12 April, 2025; originally announced April 2025.

    Comments: Early accepted by MICCAI 2025

  33. arXiv:2504.05344  [pdf

    cs.LG cs.AI

    Divergent Paths: Separating Homophilic and Heterophilic Learning for Enhanced Graph-level Representations

    Authors: Han Lei, Jiaxing Xu, Xia Dong, Yiping Ke

    Abstract: Graph Convolutional Networks (GCNs) are predominantly tailored for graphs displaying homophily, where similar nodes connect, but often fail on heterophilic graphs. The strategy of adopting distinct approaches to learn from homophilic and heterophilic components in node-level tasks has been widely discussed and proven effective both theoretically and experimentally. However, in graph-level tasks, r… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: 10 pages, 6 figures

  34. arXiv:2503.14881  [pdf, other

    cs.LG cs.AI cs.CV

    Exploring the Limits of KV Cache Compression in Visual Autoregressive Transformers

    Authors: Bo Chen, Xiaoyu Li, Yekun Ke, Yingyu Liang, Zhenmei Shi, Zhao Song

    Abstract: A fundamental challenge in Visual Autoregressive models is the substantial memory overhead required during inference to store previously generated representations. Despite various attempts to mitigate this issue through compression techniques, prior works have not explicitly formalized the problem of KV-cache compression in this context. In this work, we take the first step in formally defining th… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  35. arXiv:2503.05505  [pdf, other

    cs.CL

    Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework

    Authors: Yusong Ke, Hongru Lin, Yuting Ruan, Junya Tang, Li Li

    Abstract: Large language models (LLMs) are increasingly adopted in medical question-answering (QA) scenarios. However, LLMs can generate hallucinations and nonfactual information, undermining their trustworthiness in high-stakes medical tasks. Conformal Prediction (CP) provides a statistically rigorous framework for marginal (average) coverage guarantees but has limited exploration in medical QA. This paper… ▽ More

    Submitted 8 May, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

    Comments: Published by Mathematics

  36. arXiv:2502.16490  [pdf, ps, other

    cs.LG cs.AI cs.CC cs.CV

    On Computational Limits of FlowAR Models: Expressivity and Efficiency

    Authors: Chengyue Gong, Yekun Ke, Xiaoyu Li, Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song

    Abstract: The expressive power and computational complexity of deep visual generative models, such as flow-based and autoregressive (AR) models, have gained considerable interest for their wide-ranging applications in generative tasks. However, the theoretical characterization of their expressiveness through the lens of circuit complexity remains underexplored, particularly for the state-of-the-art architec… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  37. arXiv:2502.01688  [pdf, other

    cs.LG q-bio.NC

    BrainOOD: Out-of-distribution Generalizable Brain Network Analysis

    Authors: Jiaxing Xu, Yongqiang Chen, Xia Dong, Mengcheng Lan, Tiancheng Huang, Qingtian Bian, James Cheng, Yiping Ke

    Abstract: In neuroscience, identifying distinct patterns linked to neurological disorders, such as Alzheimer's and Autism, is critical for early diagnosis and effective intervention. Graph Neural Networks (GNNs) have shown promising in analyzing brain networks, but there are two major challenges in using GNNs: (1) distribution shifts in multi-site brain network data, leading to poor Out-of-Distribution (OOD… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

  38. arXiv:2502.00693  [pdf, other

    cs.CR

    DPBloomfilter: Securing Bloom Filters with Differential Privacy

    Authors: Yekun Ke, Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song

    Abstract: The Bloom filter is a simple yet space-efficient probabilistic data structure that supports membership queries for dramatically large datasets. It is widely utilized and implemented across various industrial scenarios, often handling massive datasets that include sensitive user information necessitating privacy preservation. To address the challenge of maintaining privacy within the Bloom filter,… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

  39. ABXI: Invariant Interest Adaptation for Task-Guided Cross-Domain Sequential Recommendation

    Authors: Qingtian Bian, Marcus Vinícius de Carvalho, Tieying Li, Jiaxing Xu, Hui Fang, Yiping Ke

    Abstract: Cross-Domain Sequential Recommendation (CDSR) has recently gained attention for countering data sparsity by transferring knowledge across domains. A common approach merges domain-specific sequences into cross-domain sequences, serving as bridges to connect domains. One key challenge is to correctly extract the shared knowledge among these sequences and appropriately transfer it. Most existing work… ▽ More

    Submitted 13 February, 2025; v1 submitted 25 January, 2025; originally announced January 2025.

    Comments: Accepted by WebConf '25 (WWW '25)

  40. arXiv:2501.04377  [pdf, other

    cs.LG cs.AI cs.CC cs.CV

    On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis

    Authors: Yekun Ke, Xiaoyu Li, Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song

    Abstract: Recently, Visual Autoregressive ($\mathsf{VAR}$) Models introduced a groundbreaking advancement in the field of image generation, offering a scalable approach through a coarse-to-fine ``next-scale prediction'' paradigm. Suppose that $n$ represents the height and width of the last VQ code map generated by $\mathsf{VAR}$ models, the state-of-the-art algorithm in [Tian, Jiang, Yuan, Peng and Wang, Ne… ▽ More

    Submitted 2 February, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

  41. arXiv:2501.04299  [pdf, ps, other

    stat.ML cs.AI cs.CC cs.CL cs.LG

    Circuit Complexity Bounds for Visual Autoregressive Model

    Authors: Yekun Ke, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song

    Abstract: Understanding the expressive ability of a specific model is essential for grasping its capacity limitations. Recently, several studies have established circuit complexity bounds for Transformer architecture. Besides, the Visual AutoRegressive (VAR) model has risen to be a prominent method in the field of image generation, outperforming previous techniques, such as Diffusion Transformers, in genera… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  42. arXiv:2412.18096  [pdf

    cs.AI

    Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine

    Authors: Yu He Ke, Liyuan Jin, Kabilan Elangovan, Bryan Wen Xi Ong, Chin Yang Oh, Jacqueline Sim, Kenny Wei-Tsen Loh, Chai Rick Soh, Jonathan Ming Hua Cheng, Aaron Kwang Yang Lee, Daniel Shu Wei Ting, Nan Liu, Hairil Rizal Abdullah

    Abstract: Large Language Models (LLMs) are emerging as powerful tools in healthcare, particularly for complex, domain-specific tasks. This study describes the development and evaluation of the PErioperative AI CHatbot (PEACH), a secure LLM-based system integrated with local perioperative guidelines to support preoperative clinical decision-making. PEACH was embedded with 35 institutional perioperative proto… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: 21 pages, 3 figures, 1 graphical abstract

  43. arXiv:2412.16490  [pdf, ps, other

    cs.RO

    BODex: Scalable and Efficient Robotic Dexterous Grasp Synthesis Using Bilevel Optimization

    Authors: Jiayi Chen, Yubin Ke, He Wang

    Abstract: Robotic dexterous grasping is important for interacting with the environment. To unleash the potential of data-driven models for dexterous grasping, a large-scale, high-quality dataset is essential. While gradient-based optimization offers a promising way for constructing such datasets, previous works suffer from limitations, such as inefficiency, strong assumptions in the grasp quality energy, or… ▽ More

    Submitted 2 September, 2025; v1 submitted 21 December, 2024; originally announced December 2024.

    Comments: ICRA 2025

  44. arXiv:2412.07804  [pdf, other

    eess.IV cs.AI cs.CV

    XLSTM-HVED: Cross-Modal Brain Tumor Segmentation and MRI Reconstruction Method Using Vision XLSTM and Heteromodal Variational Encoder-Decoder

    Authors: Shenghao Zhu, Yifei Chen, Shuo Jiang, Weihong Chen, Chang Liu, Yuanhan Wang, Xu Chen, Yifan Ke, Feiwei Qin, Changmiao Wang, Zhu Zhu

    Abstract: Neurogliomas are among the most aggressive forms of cancer, presenting considerable challenges in both treatment and monitoring due to their unpredictable biological behavior. Magnetic resonance imaging (MRI) is currently the preferred method for diagnosing and monitoring gliomas. However, the lack of specific imaging techniques often compromises the accuracy of tumor segmentation during the imagi… ▽ More

    Submitted 5 March, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

    Comments: 5 pages, 2 figures

    Journal ref: ISBI 2025

  45. arXiv:2412.06061  [pdf, other

    cs.LG cs.AI

    Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond

    Authors: Yekun Ke, Yingyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang

    Abstract: The application of transformer-based models on time series forecasting (TSF) tasks has long been popular to study. However, many of these works fail to beat the simple linear residual model, and the theoretical understanding of this issue is still limited. In this work, we propose the first theoretical explanation of the inefficiency of transformers on TSF tasks. We attribute the mechanism behind… ▽ More

    Submitted 28 February, 2025; v1 submitted 8 December, 2024; originally announced December 2024.

    Comments: CPAL 2025

  46. arXiv:2411.13050  [pdf, other

    cs.AR

    Topkima-Former: Low-energy, Low-Latency Inference for Transformers using top-k In-memory ADC

    Authors: Shuai Dong, Junyi Yang, Xiaoqi Peng, Hongyang Shang, Ye Ke, Xiaofeng Yang, Hongjie Liu, Arindam Basu

    Abstract: Transformer model has gained prominence as a popular deep neural network architecture for neural language processing (NLP) and computer vision (CV) applications. However, the extensive use of nonlinear operations, like softmax, poses a performance bottleneck during transformer inference and comprises up to 40% of the total latency. Hence, we propose innovations at the circuit, architecture, and al… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

    Comments: 7 pages

  47. arXiv:2411.11904  [pdf, other

    cs.CV

    GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding

    Authors: Yue Zhou, Mengcheng Lan, Xiang Li, Litong Feng, Yiping Ke, Xue Jiang, Qingyun Li, Xue Yang, Wayne Zhang

    Abstract: Remote sensing (RS) visual grounding aims to use natural language expression to locate specific objects (in the form of the bounding box or segmentation mask) in RS images, enhancing human interaction with intelligent RS interpretation systems. Early research in this area was primarily based on horizontal bounding boxes (HBBs), but as more diverse RS datasets have become available, tasks involving… ▽ More

    Submitted 10 May, 2025; v1 submitted 16 November, 2024; originally announced November 2024.

    Comments: 9 pages, 5 figures

  48. arXiv:2410.11279  [pdf, other

    cs.LG cs.AI math.NA

    Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study

    Authors: Yekun Ke, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song

    Abstract: Recent empirical studies have identified fixed point iteration phenomena in deep neural networks, where the hidden state tends to stabilize after several layers, showing minimal change in subsequent layers. This observation has spurred the development of practical methodologies, such as accelerating inference by bypassing certain layers once the hidden state stabilizes, selectively fine-tuning lay… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  49. arXiv:2410.10117  [pdf, other

    cs.CV cs.CR

    StegaINR4MIH: steganography by implicit neural representation for multi-image hiding

    Authors: Weina Dong, Jia Liu, Lifeng Chen, Wenquan Sun, Xiaozhong Pan, Yan Ke

    Abstract: Multi-image hiding, which embeds multiple secret images into a cover image and is able to recover these images with high quality, has gradually become a research hotspot in the field of image steganography. However, due to the need to embed a large amount of data in a limited cover image space, issues such as contour shadowing or color distortion often arise, posing significant challenges for mult… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

    Comments: 46pages,14figures

  50. arXiv:2410.09855  [pdf, other

    cs.CV

    Text4Seg: Reimagining Image Segmentation as Text Generation

    Authors: Mengcheng Lan, Chaofeng Chen, Yue Zhou, Jiaxing Xu, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang

    Abstract: Multimodal Large Language Models (MLLMs) have shown exceptional capabilities in vision-language tasks; however, effectively integrating image segmentation into these models remains a significant challenge. In this paper, we introduce Text4Seg, a novel text-as-mask paradigm that casts image segmentation as a text generation problem, eliminating the need for additional decoders and significantly sim… ▽ More

    Submitted 17 February, 2025; v1 submitted 13 October, 2024; originally announced October 2024.

    Comments: ICLR 2025. Project page: https://mc-lan.github.io/Text4Seg/