Skip to main content

Showing 1–50 of 165 results for author: Bai, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.15784  [pdf, ps, other

    cs.AI cs.LG

    Beyond Training: Enabling Self-Evolution of Agents with MOBIMEM

    Authors: Zibin Liu, Cheng Zhang, Xi Zhao, Yunfei Feng, Bingyu Bai, Dahu Feng, Erhu Feng, Yubin Xia, Haibo Chen

    Abstract: Large Language Model (LLM) agents are increasingly deployed to automate complex workflows in mobile and desktop environments. However, current model-centric agent architectures struggle to self-evolve post-deployment: improving personalization, capability, and efficiency typically requires continuous model retraining/fine-tuning, which incurs prohibitive computational overheads and suffers from an… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  2. arXiv:2512.13070  [pdf, ps, other

    cs.AI cs.CL

    M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization

    Authors: Bizhe Bai, Hongming Wu, Peng Ye, Tao Chen

    Abstract: Self-supervised reinforcement learning (RL) presents a promising approach for enhancing the reasoning capabilities of Large Language Models (LLMs) without reliance on expensive human-annotated data. However, we find that existing methods suffer from a critical failure mode under long-horizon training: a "policy collapse" where performance precipitously degrades. We diagnose this instability and de… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

    Comments: 7 pages, 5 figures,Accepted NeurIPS 2025 Workshop on Efficient Reasoning

  3. arXiv:2511.22131  [pdf

    cs.CV cs.LG physics.med-ph

    Autonomous labeling of surgical resection margins using a foundation model

    Authors: Xilin Yang, Musa Aydin, Yuhong Lu, Sahan Yoruc Selcuk, Bijie Bai, Yijie Zhang, Andrew Birkeland, Katjana Ehrlich, Julien Bec, Laura Marcu, Nir Pillar, Aydogan Ozcan

    Abstract: Assessing resection margins is central to pathological specimen evaluation and has profound implications for patient outcomes. Current practice employs physical inking, which is applied variably, and cautery artifacts can obscure the true margin on histological sections. We present a virtual inking network (VIN) that autonomously localizes the surgical cut surface on whole-slide images, reducing r… ▽ More

    Submitted 27 November, 2025; originally announced November 2025.

    Comments: 20 Pages, 5 Figures

  4. arXiv:2511.08496  [pdf, ps, other

    cs.SD cs.AI eess.AS

    HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios

    Authors: Bingsong Bai, Yizhong Geng, Fengping Wang, Cong Wang, Puyuan Guo, Yingming Gao, Ya Li

    Abstract: Zero-shot singing voice conversion (SVC) transforms a source singer's timbre to an unseen target speaker's voice while preserving melodic content without fine-tuning. Existing methods model speaker timbre and vocal content separately, losing essential acoustic information that degrades output quality while requiring significant computational resources. To overcome these limitations, we propose HQ-… ▽ More

    Submitted 15 November, 2025; v1 submitted 11 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI 2026 main technical track

  5. arXiv:2511.01202  [pdf, ps, other

    cs.IT cs.AI

    Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs

    Authors: Bo Bai

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in numerous real-world applications. While the vast majority of research conducted from an experimental perspective is progressing rapidly, it demands substantial computational power, data, and other resources. Therefore, how to open the black-box of LLMs from a theoretical standpoint has become a critical challenge. This paper… ▽ More

    Submitted 2 November, 2025; originally announced November 2025.

  6. arXiv:2510.14703  [pdf, ps, other

    cs.AI

    ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling

    Authors: Jianghao Lin, Yuanyuan Shi, Xin Peng, Renjie Ding, Hairui Wang, Yuxuan Peng, Bizhe Bai, Weixi Song, Fengshuo Bai, Huacan Chai, Weinan Zhang, Fei Huang, Ying Wen

    Abstract: Large language models (LLMs) are increasingly demonstrating strong capabilities as autonomous agents, with function calling serving as a core mechanism for interaction with the environment. Meanwhile, inference scaling has become a cutting-edge technique to enhance LLM performance by allocating more computational resources during the inference process. However, current research on inference scalin… ▽ More

    Submitted 16 October, 2025; originally announced October 2025.

  7. arXiv:2509.20861  [pdf, ps, other

    cs.CR

    FlowXpert: Context-Aware Flow Embedding for Enhanced Traffic Detection in IoT Network

    Authors: Chao Zha, Haolin Pan, Bing Bai, Jiangxing Wu, Ruyun Zhang

    Abstract: In the Internet of Things (IoT) environment, continuous interaction among a large number of devices generates complex and dynamic network traffic, which poses significant challenges to rule-based detection approaches. Machine learning (ML)-based traffic detection technology, capable of identifying anomalous patterns and potential threats within this traffic, serves as a critical component in ensur… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  8. arXiv:2509.14946  [pdf, ps, other

    eess.AS cs.CL

    SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding

    Authors: Bingsong Bai, Qihang Lu, Wenbing Yang, Zihan Sun, Yueran Hou, Peilei Jia, Songbai Pu, Ruibo Fu, Yingming Gao, Ya Li, Jun Gao

    Abstract: Paralinguistic sounds, like laughter and sighs, are crucial for synthesizing more realistic and engaging speech. However, existing methods typically depend on proprietary datasets, while publicly available resources often suffer from incomplete speech, inaccurate or missing timestamps, and limited real-world relevance. To address these problems, we propose an automated framework for generating lar… ▽ More

    Submitted 28 September, 2025; v1 submitted 18 September, 2025; originally announced September 2025.

    Comments: Submitted to ICASSP 2026. Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    ACM Class: I.2.7

  9. arXiv:2509.09681  [pdf, ps, other

    cs.IR cs.AI cs.CL cs.LG

    DB3 Team's Solution For Meta KDD Cup' 25

    Authors: Yikuan Xia, Jiazun Chen, Yirui Zhan, Suifeng Zhao, Weipeng Jiang, Chaorui Zhang, Wei Han, Bo Bai, Jun Gao

    Abstract: This paper presents the db3 team's winning solution for the Meta CRAG-MM Challenge 2025 at KDD Cup'25. Addressing the challenge's unique multi-modal, multi-turn question answering benchmark (CRAG-MM), we developed a comprehensive framework that integrates tailored retrieval pipelines for different tasks with a unified LLM-tuning approach for hallucination control. Our solution features (1) domain-… ▽ More

    Submitted 12 August, 2025; originally announced September 2025.

  10. arXiv:2508.04355  [pdf, ps, other

    cs.IT

    Grid-like Error-Correcting Codes for Matrix Multiplication with Better Correcting Capability

    Authors: Hao Shi, Zhengyi Jiang, Zhongyi Huang, Bo Bai, Gong Zhang, Hanxu Hou

    Abstract: Matrix multiplication over the real field constitutes a foundational operation in the training of deep learning models, serving as a computational cornerstone for both forward and backward propagation processes. However, the presence of silent data corruption (SDC) in large-scale distributed training environments poses a significant threat to model convergence and predictive accuracy, particularly… ▽ More

    Submitted 6 August, 2025; originally announced August 2025.

  11. arXiv:2507.18028  [pdf, ps, other

    cs.CL cs.AI

    NeuralDB: Scaling Knowledge Editing in LLMs to 100,000 Facts with Neural KV Database

    Authors: Weizhi Fei, Hao Shi, Jing Xu, Jingchen Peng, Jiazheng Li, Jingzhao Zhang, Bo Bai, Wei Han, Zhenyuan Chen, Xueyan Niu

    Abstract: Efficiently editing knowledge stored in large language models (LLMs) enables model updates without large-scale training. One possible solution is Locate-and-Edit (L\&E), allowing simultaneous modifications of a massive number of facts. However, such editing may compromise the general abilities of LLMs and even result in forgetting edited facts when scaling up to thousands of edits. In this paper,… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

  12. arXiv:2506.09931  [pdf, ps, other

    cs.IT eess.SP

    Faster-than-Nyquist Signaling is Good for Single-Carrier ISAC: An Analytical Study

    Authors: Shuangyang Li, Fan Liu, Yifeng Xiong, Weijie Yuan, Baoming Bai, Christos Masouros, Giuseppe Caire

    Abstract: In this paper, we provide an analytical study of single-carrier faster-than-Nyquist (FTN) signaling for integrated sensing and communications (ISAC). Our derivations show that FTN is advantageous for ISAC, and reveal new insights that these advantages come from the fact that FTN signaling can effectively avoid the spectral aliasing due to the mismatch between the symbol rate and the bandwidth of t… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  13. Graph Evidential Learning for Anomaly Detection

    Authors: Chunyu Wei, Wenji Hu, Xingjia Hao, Yunhai Wang, Yueguo Chen, Bing Bai, Fei Wang

    Abstract: Graph anomaly detection faces significant challenges due to the scarcity of reliable anomaly-labeled datasets, driving the development of unsupervised methods. Graph autoencoders (GAEs) have emerged as a dominant approach by reconstructing graph structures and node features while deriving anomaly scores from reconstruction errors. However, relying solely on reconstruction error for anomaly detecti… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Accepted by KDD25

  14. arXiv:2505.21200  [pdf, ps, other

    cs.CV

    Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models

    Authors: Xudong Tan, Yaoxin Yang, Peng Ye, Jialin Zheng, Bizhe Bai, Xinyi Wang, Jia Hao, Tao Chen

    Abstract: Vision-Language-Action (VLA) models have emerged as a powerful paradigm for general-purpose robot control through natural language instructions. However, their high inference cost-stemming from large-scale token computation and autoregressive decoding-poses significant challenges for real-time deployment and edge applications. While prior work has primarily focused on architectural optimization, w… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  15. arXiv:2504.14653  [pdf, ps, other

    cs.IT eess.SP

    Wireless Large AI Model: Shaping the AI-Native Future of 6G and Beyond

    Authors: Fenghao Zhu, Xinquan Wang, Siming Jiang, Xinyi Li, Maojun Zhang, Yixuan Chen, Chongwen Huang, Zhaohui Yang, Xiaoming Chen, Zhaoyang Zhang, Richeng Jin, Yongming Huang, Wei Feng, Tingting Yang, Baoming Bai, Feifei Gao, Kun Yang, Yuanwei Liu, Sami Muhaidat, Chau Yuen, Kaibin Huang, Kai-Kit Wong, Dusit Niyato, Ying-Chang Liang, Mérouane Debbah

    Abstract: The emergence of sixth-generation and beyond communication systems is expected to fundamentally transform digital experiences through introducing unparalleled levels of intelligence, efficiency, and connectivity. A promising technology poised to enable this revolutionary vision is the wireless large AI model (WLAM), characterized by its exceptional capabilities in data processing, inference, and d… ▽ More

    Submitted 18 December, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

  16. arXiv:2504.06271  [pdf, other

    cs.IR cs.AI cs.CL

    ER-RAG: Enhance RAG with ER-Based Unified Modeling of Heterogeneous Data Sources

    Authors: Yikuan Xia, Jiazun Chen, Yirui Zhan, Suifeng Zhao, Weipeng Jiang, Chaorui Zhang, Wei Han, Bo Bai, Jun Gao

    Abstract: Large language models (LLMs) excel in question-answering (QA) tasks, and retrieval-augmented generation (RAG) enhances their precision by incorporating external evidence from diverse sources like web pages, databases, and knowledge graphs. However, current RAG methods rely on agent-specific strategies for individual data sources, posing challenges low-resource or black-box environments and complic… ▽ More

    Submitted 2 March, 2025; originally announced April 2025.

  17. arXiv:2503.23959  [pdf, other

    cs.CV

    Local Information Matters: Inference Acceleration For Grounded Conversation Generation Models Through Adaptive Local-Aware Token Pruning

    Authors: Bizhe Bai, Jianjian Cao, Yadan Luo, Tao Chen

    Abstract: Grounded Conversation Generation (GCG) is an emerging vision-language task that requires models to generate natural language responses seamlessly intertwined with corresponding object segmentation masks. Recent models, such as GLaMM and OMG-LLaVA, achieve pixel-level grounding but incur significant computational costs due to processing a large number of visual tokens. Existing token pruning method… ▽ More

    Submitted 1 April, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

  18. arXiv:2501.12959  [pdf, other

    cs.CL

    Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference

    Authors: Weizhi Fei, Xueyan Niu, Guoqing Xie, Yingqing Liu, Bo Bai, Wei Han

    Abstract: Although applications involving long-context inputs are crucial for the effective utilization of large language models (LLMs), they also result in increased computational costs and reduced performance. To address this challenge, we propose an efficient, training-free prompt compression method that retains key information within compressed prompts. We identify specific attention heads in transforme… ▽ More

    Submitted 5 February, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

  19. arXiv:2501.12135  [pdf, ps, other

    cs.IT

    Revisit the AWGN-goodness of Polar-like Lattices

    Authors: Ling Liu, Junjiang Yu, Shanxiang Lyu, Baoming Bai

    Abstract: This paper aims to provide a comprehensive introduction to lattices constructed based on polar-like codes and demonstrate some of their key properties, such as AWGN goodness. We first present polar lattices directly from the perspective of their generator matrix. Next, we discuss their connection with the recently proposed PAC (polarization adjusted convolutional) lattices and analyze the structur… ▽ More

    Submitted 14 November, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

    Comments: 8 pages, 5 figures

  20. arXiv:2501.11931  [pdf, ps, other

    cs.IT

    Construction of Simultaneously Good Polar Codes and Polar Lattices

    Authors: Ling Liu, Ruimin Yuan, Shanxiang Lyu, Cong Ling, Baoming Bai

    Abstract: In this work, we investigate the simultaneous goodness of polar codes and polar lattices. The simultaneous goodness of a lattice or a code means that it is optimal for both channel coding and source coding simultaneously. The existence of such kind of lattices was proven by using random lattice ensembles. Our work provides an explicit construction based on the polarization technique.

    Submitted 22 January, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

    Comments: 7 pages, 3 figures, submitted to IEEE for publication

  21. arXiv:2410.15521  [pdf

    physics.optics cs.CV physics.app-ph

    Lying mirror

    Authors: Yuhang Li, Shiqi Chen, Bijie Bai, Aydogan Ozcan

    Abstract: We introduce an all-optical system, termed the "lying mirror", to hide input information by transforming it into misleading, ordinary-looking patterns that effectively camouflage the underlying image data and deceive the observers. This misleading transformation is achieved through passive light-matter interactions of the incident light with an optimized structured diffractive surface, enabling th… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: 21 Pages, 8 Figures

  22. arXiv:2409.00670  [pdf, other

    cs.LG cs.SI

    Towards Faster Graph Partitioning via Pre-training and Inductive Inference

    Authors: Meng Qin, Chaorui Zhang, Yu Gao, Yibin Ding, Weipeng Jiang, Weixi Zhang, Wei Han, Bo Bai

    Abstract: Graph partitioning (GP) is a classic problem that divides the node set of a graph into densely-connected blocks. Following the IEEE HPEC Graph Challenge and recent advances in pre-training techniques (e.g., large-language models), we propose PR-GPT (Pre-trained & Refined Graph ParTitioning) based on a novel pre-training & refinement paradigm. We first conduct the offline pre-training of a deep gra… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: Champion winner of IEEE HPEC 2024 Graph Challenge (https://graphchallenge.mit.edu/champions)

  23. arXiv:2408.15491  [pdf, other

    cs.CL

    Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

    Authors: Haowen Hou, Fei Ma, Binwen Bai, Xinxin Zhu, Fei Yu

    Abstract: Large Language Models (LLMs) have garnered widespread attention due to their remarkable performance across various tasks. However, to mitigate the issue of hallucinations, LLMs often incorporate retrieval-augmented pipeline to provide them with rich external knowledge and context. Nevertheless, challenges stem from inaccurate and coarse-grained context retrieved from the retriever. Supplying irrel… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 20 pages

  24. arXiv:2408.08681  [pdf, other

    cs.LG math.NA math.PR

    A Mean Field Ansatz for Zero-Shot Weight Transfer

    Authors: Xingyuan Chen, Wenwei Kuang, Lei Deng, Wei Han, Bo Bai, Goncalo dos Reis

    Abstract: The pre-training cost of large language models (LLMs) is prohibitive. One cutting-edge approach to reduce the cost is zero-shot weight transfer, also known as model growth for some cases, which magically transfers the weights trained in a small model to a large model. However, there are still some theoretical mysteries behind the weight transfer. In this paper, inspired by prior applications of me… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 40 pages, 6 Figures, 1 table

  25. arXiv:2407.19484  [pdf, ps, other

    cs.IT

    Error Correction Decoding Algorithms of RS Codes Based on An Earlier Termination Algorithm to Find The Error Locator Polynomial

    Authors: Zhengyi Jiang, Hao Shi, Zhongyi Huang, Linqi Song, Bo Bai, Gong Zhang, Hanxu Hou

    Abstract: Reed-Solomon (RS) codes are widely used to correct errors in storage systems. Finding the error locator polynomial is one of the key steps in the error correction procedure of RS codes. Modular Approach (MA) is an effective algorithm for solving the Welch-Berlekamp (WB) key-equation problem to find the error locator polynomial that needs $2t$ steps, where $t$ is the error correction capability. In… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  26. arXiv:2407.11529  [pdf, other

    eess.IV cs.AI cs.CV

    Cross-Phase Mutual Learning Framework for Pulmonary Embolism Identification on Non-Contrast CT Scans

    Authors: Bizhe Bai, Yan-Jie Zhou, Yujian Hu, Tony C. W. Mok, Yilang Xiang, Le Lu, Hongkun Zhang, Minfeng Xu

    Abstract: Pulmonary embolism (PE) is a life-threatening condition where rapid and accurate diagnosis is imperative yet difficult due to predominantly atypical symptomatology. Computed tomography pulmonary angiography (CTPA) is acknowledged as the gold standard imaging tool in clinics, yet it can be contraindicated for emergency department (ED) patients and represents an onerous procedure, thus necessitating… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Early accept by MICCAI 2024

  27. arXiv:2406.17223  [pdf, ps, other

    cs.IT

    On Zero-Error Capacity of Graphs with One Edge

    Authors: Qi Cao, Qi Chen, Baoming Bai

    Abstract: In this paper, we study the zero-error capacity of channels with memory, which are represented by graphs. We provide a method to construct code for any graph with one edge, thereby determining a lower bound on its zero-error capacity. Moreover, this code can achieve zero-error capacity when the symbols in a vertex with degree one are the same. We further apply our method to the one-edge graphs rep… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  28. arXiv:2406.12331  [pdf, other

    cs.CL cs.AI

    Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding

    Authors: Weizhi Fei, Xueyan Niu, Guoqing Xie, Yanhua Zhang, Bo Bai, Lei Deng, Wei Han

    Abstract: Current Large Language Models (LLMs) face inherent limitations due to their pre-defined context lengths, which impede their capacity for multi-hop reasoning within extensive textual contexts. While existing techniques like Retrieval-Augmented Generation (RAG) have attempted to bridge this gap by sourcing external information, they fall short when direct answers are not readily available. We introd… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  29. arXiv:2406.05692  [pdf, other

    cs.SD cs.AI eess.AS

    SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion

    Authors: Bingsong Bai, Fengping Wang, Yingming Gao, Ya Li

    Abstract: Diffusion-based singing voice conversion (SVC) models have shown better synthesis quality compared to traditional methods. However, in cross-domain SVC scenarios, where there is a significant disparity in pitch between the source and target voice domains, the models tend to generate audios with hoarseness, posing challenges in achieving high-quality vocal outputs. Therefore, in this paper, we prop… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  30. arXiv:2405.08707  [pdf, other

    cs.LG

    Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

    Authors: Xueyan Niu, Bo Bai, Lei Deng, Wei Han

    Abstract: Increasing the size of a Transformer does not always lead to enhanced performance. This phenomenon cannot be explained by the empirical scaling laws. Furthermore, the model's enhanced performance is closely associated with its memorization of the training samples. We present a theoretical framework that sheds light on the memorization during pre-training of transformer-based language models. We mo… ▽ More

    Submitted 27 November, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  31. arXiv:2405.04051  [pdf, ps, other

    cs.IT

    On the quantization goodness of polar lattices

    Authors: Ling Liu, Shanxiang Lyu, Cong Ling, Baoming Bai

    Abstract: In this work, we prove that polar lattices, when tailored for lossy compression, are quantization-good in the sense that their normalized second moments approach $\frac{1}{2πe}$ as the dimension of lattices increases. It has been predicted by Zamir et al. \cite{ZamirQZ96} that the Entropy Coded Dithered Quantization (ECDQ) system using quantization-good lattices can achieve the rate-distortion bou… ▽ More

    Submitted 20 January, 2025; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 13 pages, 5 figures, a journal version of the IEEE ITW conference paper

  32. arXiv:2405.02713  [pdf, other

    cs.IT

    Set Transformation: Trade-off Between Repair Bandwidth and Sub-packetization

    Authors: Hao Shi, Zhengyi Jiang, Zhongyi Huang, Bo Bai, Gong Zhang, Hanxu Hou

    Abstract: Maximum distance separable (MDS) codes facilitate the achievement of elevated levels of fault tolerance in storage systems while incurring minimal redundancy overhead. Reed-Solomon (RS) codes are typical MDS codes with the sub-packetization level being one, however, they require large repair bandwidth defined as the total amount of symbols downloaded from other surviving nodes during single-node f… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  33. arXiv:2404.04681  [pdf, other

    cs.IT

    Computation and Critical Transitions of Rate-Distortion-Perception Functions With Wasserstein Barycenter

    Authors: Chunhui Chen, Xueyan Niu, Wenhao Ye, Hao Wu, Bo Bai

    Abstract: The information rate-distortion-perception (RDP) function characterizes the three-way trade-off between description rate, average distortion, and perceptual quality measured by discrepancy between probability distributions and has been applied to emerging areas in communications empowered by generative modeling. We study several variants of the RDP functions through the lens of optimal transport t… ▽ More

    Submitted 30 October, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: This paper was presented in part at the 2023 IEEE International Symposium on Information Theory

  34. arXiv:2404.00837  [pdf

    eess.IV cs.CV cs.LG physics.med-ph

    Automated HER2 Scoring in Breast Cancer Images Using Deep Learning and Pyramid Sampling

    Authors: Sahan Yoruc Selcuk, Xilin Yang, Bijie Bai, Yijie Zhang, Yuzhu Li, Musa Aydin, Aras Firat Unal, Aditya Gomatam, Zhen Guo, Darrow Morgan Angus, Goren Kolodney, Karine Atlan, Tal Keidar Haran, Nir Pillar, Aydogan Ozcan

    Abstract: Human epidermal growth factor receptor 2 (HER2) is a critical protein in cancer cell growth that signifies the aggressiveness of breast cancer (BC) and helps predict its prognosis. Accurate assessment of immunohistochemically (IHC) stained tissue slides for HER2 expression levels is essential for both treatment guidance and understanding of cancer mechanisms. Nevertheless, the traditional workflow… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 21 Pages, 7 Figures

    Journal ref: BME Frontiers (2024)

  35. arXiv:2403.14192  [pdf, ps, other

    cs.IT eess.SP

    Fundamentals of Delay-Doppler Communications: Practical Implementation and Extensions to OTFS

    Authors: Shuangyang Li, Peter Jung, Weijie Yuan, Zhiqiang Wei, Jinhong Yuan, Baoming Bai, Giuseppe Caire

    Abstract: The recently proposed orthogonal time frequency space (OTFS) modulation, which is a typical Delay-Doppler (DD) communication scheme, has attracted significant attention thanks to its appealing performance over doubly-selective channels. In this paper, we present the fundamentals of general DD communications from the viewpoint of the Zak transform. We start our study by constructing DD domain basis… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  36. arXiv:2403.09100  [pdf

    physics.med-ph cs.CV cs.LG eess.IV physics.optics

    Virtual birefringence imaging and histological staining of amyloid deposits in label-free tissue using autofluorescence microscopy and deep learning

    Authors: Xilin Yang, Bijie Bai, Yijie Zhang, Musa Aydin, Sahan Yoruc Selcuk, Zhen Guo, Gregory A. Fishbein, Karine Atlan, William Dean Wallace, Nir Pillar, Aydogan Ozcan

    Abstract: Systemic amyloidosis is a group of diseases characterized by the deposition of misfolded proteins in various organs and tissues, leading to progressive organ dysfunction and failure. Congo red stain is the gold standard chemical stain for the visualization of amyloid deposits in tissue sections, as it forms complexes with the misfolded proteins and shows a birefringence pattern under polarized lig… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 20 Pages, 5 Figures

    Journal ref: Nature Communications (2024)

  37. arXiv:2402.08934  [pdf, other

    eess.IV cs.CV

    Extreme Video Compression with Pre-trained Diffusion Models

    Authors: Bohan Li, Yiming Liu, Xueyan Niu, Bo Bai, Lei Deng, Deniz Gündüz

    Abstract: Diffusion models have achieved remarkable success in generating high quality image and video data. More recently, they have also been used for image compression with high perceptual quality. In this paper, we present a novel approach to extreme video compression leveraging the predictive power of diffusion-based generative models at the decoder. The conditional diffusion model takes several neural… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  38. arXiv:2402.02397  [pdf

    physics.optics cs.CV cs.NE

    Multiplexed all-optical permutation operations using a reconfigurable diffractive optical network

    Authors: Guangdong Ma, Xilin Yang, Bijie Bai, Jingxi Li, Yuhang Li, Tianyi Gan, Che-Yung Shen, Yijie Zhang, Yuzhu Li, Mona Jarrahi, Aydogan Ozcan

    Abstract: Large-scale and high-dimensional permutation operations are important for various applications in e.g., telecommunications and encryption. Here, we demonstrate the use of all-optical diffractive computing to execute a set of high-dimensional permutation operations between an input and output field-of-view through layer rotations in a diffractive optical network. In this reconfigurable multiplexed… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 37 Pages, 10 Figures

    Journal ref: Laser & Photonics Reviews (2024)

  39. arXiv:2401.08923  [pdf

    physics.optics cs.CV physics.app-ph

    Subwavelength Imaging using a Solid-Immersion Diffractive Optical Processor

    Authors: Jingtian Hu, Kun Liao, Niyazi Ulas Dinc, Carlo Gigli, Bijie Bai, Tianyi Gan, Xurong Li, Hanlong Chen, Xilin Yang, Yuhang Li, Cagatay Isil, Md Sadman Sakib Rahman, Jingxi Li, Xiaoyong Hu, Mona Jarrahi, Demetri Psaltis, Aydogan Ozcan

    Abstract: Phase imaging is widely used in biomedical imaging, sensing, and material characterization, among other fields. However, direct imaging of phase objects with subwavelength resolution remains a challenge. Here, we demonstrate subwavelength imaging of phase and amplitude objects based on all-optical diffractive encoding and decoding. To resolve subwavelength features of an object, the diffractive im… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 32 Pages, 9 Figures

    Journal ref: eLight (2024)

  40. arXiv:2401.07856  [pdf

    physics.optics cs.CV physics.app-ph

    Information hiding cameras: optical concealment of object information into ordinary images

    Authors: Bijie Bai, Ryan Lee, Yuhang Li, Tianyi Gan, Yuntian Wang, Mona Jarrahi, Aydogan Ozcan

    Abstract: Data protection methods like cryptography, despite being effective, inadvertently signal the presence of secret communication, thereby drawing undue attention. Here, we introduce an optical information hiding camera integrated with an electronic decoder, optimized jointly through deep learning. This information hiding-decoding system employs a diffractive optical processor as its front-end, which… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: 26 Pages, 8 Figures

    Journal ref: Science Advances (2024)

  41. arXiv:2312.12358  [pdf, other

    cs.IT eess.SP

    Localization and Discrete Beamforming with a Large Reconfigurable Intelligent Surface

    Authors: Baojia Luo, Yili Deng, Miaomiao Dong, Zhongyi Huang, Xiang Chen, Wei Han, Bo Bai

    Abstract: In millimeter-wave (mmWave) cellular systems, reconfigurable intelligent surfaces (RISs) are foreseeably deployed with a large number of reflecting elements to achieve high beamforming gains. The large-sized RIS will make radio links fall in the near-field localization regime with spatial non-stationarity issues. Moreover, the discrete phase restriction on the RIS reflection coefficient incurs exp… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 13 pages

  42. arXiv:2312.09571  [pdf, other

    cs.CL cs.IT

    Extending Context Window of Large Language Models via Semantic Compression

    Authors: Weizhi Fei, Xueyan Niu, Pingyi Zhou, Lu Hou, Bo Bai, Lei Deng, Wei Han

    Abstract: Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long texts. We propose a novel semantic compression method that enables generalization to texts that are 6-8 times longer, without incurring significant computational c… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  43. arXiv:2312.01560  [pdf, ps, other

    cs.SI

    RaftGP: Random Fast Graph Partitioning

    Authors: Yu Gao, Meng Qin, Yibin Ding, Li Zeng, Chaorui Zhang, Weixi Zhang, Wei Han, Rongqian Zhao, Bo Bai

    Abstract: Graph partitioning (GP), a.k.a. community detection, is a classic problem that divides the node set of a graph into densely-connected blocks. Following prior work on the IEEE HPEC Graph Challenge benchmark and recent advances in graph machine learning, we propose a novel RAndom FasT Graph Partitioning (RaftGP) method based on an efficient graph embedding scheme. It uses the Gaussian random project… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  44. arXiv:2311.10349  [pdf, other

    eess.IV cs.CV cs.LG

    Pseudo Label-Guided Data Fusion and Output Consistency for Semi-Supervised Medical Image Segmentation

    Authors: Tao Wang, Yuanbin Chen, Xinlin Zhang, Yuanbo Zhou, Junlin Lan, Bizhe Bai, Tao Tan, Min Du, Qinquan Gao, Tong Tong

    Abstract: Supervised learning algorithms based on Convolutional Neural Networks have become the benchmark for medical image segmentation tasks, but their effectiveness heavily relies on a large amount of labeled data. However, annotating medical image datasets is a laborious and time-consuming process. Inspired by semi-supervised algorithms that use both labeled and unlabeled data for training, we propose t… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  45. arXiv:2311.02108  [pdf, other

    cs.HC cs.AI

    A Virtual Reality Training System for Automotive Engines Assembly and Disassembly

    Authors: Gongjin Lan, Qiangqiang Lai, Bing Bai, Zirui Zhao, Qi Hao

    Abstract: Automotive engine assembly and disassembly are common and crucial programs in the automotive industry. Traditional education trains students to learn automotive engine assembly and disassembly in lecture courses and then to operate with physical engines, which are generally low effectiveness and high cost. In this work, we developed a multi-layer structured Virtual Reality (VR) system to provide s… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 10 pages, 9 figures

  46. arXiv:2310.03384  [pdf

    physics.optics cs.NE

    Complex-valued universal linear transformations and image encryption using spatially incoherent diffractive networks

    Authors: Xilin Yang, Md Sadman Sakib Rahman, Bijie Bai, Jingxi Li, Aydogan Ozcan

    Abstract: As an optical processor, a Diffractive Deep Neural Network (D2NN) utilizes engineered diffractive surfaces designed through machine learning to perform all-optical information processing, completing its tasks at the speed of light propagation through thin optical layers. With sufficient degrees-of-freedom, D2NNs can perform arbitrary complex-valued linear transformations using spatially coherent l… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: 16 Pages, 3 Figures

    Journal ref: Advanced Photonics Nexus (2024)

  47. arXiv:2309.15889  [pdf, other

    eess.IV cs.CV cs.IT cs.LG cs.MM

    High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models

    Authors: Selim F. Yilmaz, Xueyan Niu, Bo Bai, Wei Han, Lei Deng, Deniz Gunduz

    Abstract: We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC) along with a denoising diffusion probabilistic model (DDPM) at the receiver. Specifically, we are interested in the perception-distortion trade-off in the practical finite block length regime, in which separate source and channel coding can be highly suboptimal. W… ▽ More

    Submitted 20 September, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: 6 pages, 5 figures. Published at INFOCOM 2024 Workshops

  48. arXiv:2309.01963  [pdf, other

    cs.IT

    Generalized Simple Regenerating Codes: Trading Sub-packetization and Fault Tolerance

    Authors: Zhengyi Jiang, Hao Shi, Zhongyi Huang, Bo Bai, Gong Zhang, Hanxu Hou

    Abstract: Maximum distance separable (MDS) codes have the optimal trade-off between storage efficiency and fault tolerance, which are widely used in distributed storage systems. As typical non-MDS codes, simple regenerating codes (SRCs) can achieve both smaller repair bandwidth and smaller repair locality than traditional MDS codes in repairing single-node erasure. In this paper, we propose {\em generaliz… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  49. arXiv:2308.15019  [pdf

    physics.optics cs.CV cs.NE physics.app-ph

    Pyramid diffractive optical networks for unidirectional image magnification and demagnification

    Authors: Bijie Bai, Xilin Yang, Tianyi Gan, Jingxi Li, Deniz Mengu, Mona Jarrahi, Aydogan Ozcan

    Abstract: Diffractive deep neural networks (D2NNs) are composed of successive transmissive layers optimized using supervised deep learning to all-optically implement various computational tasks between an input and output field-of-view (FOV). Here, we present a pyramid-structured diffractive optical network design (which we term P-D2NN), optimized specifically for unidirectional image magnification and dema… ▽ More

    Submitted 31 July, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: 41 Pages, 11 Figures

    Journal ref: Light: Science & Applications (2024)

  50. arXiv:2308.14527  [pdf, ps, other

    cs.IT

    MDS Array Codes With Small Sub-packetization Levels and Small Repair Degrees

    Authors: Jie Li, Yi Liu, Xiaohu Tang, Yunghsiang S. Han, Bo Bai, Gong Zhang

    Abstract: High-rate minimum storage regenerating (MSR) codes are known to require a large sub-packetization level, which can make meta-data management difficult and hinder implementation in practical systems. A few maximum distance separable (MDS) array code constructions have been proposed to attain a much smaller sub-packetization level by sacrificing a bit of repair bandwidth. However, to the best of our… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Submitted to the IEEE Transactions on Information Theory