Skip to main content

Showing 1–50 of 100 results for author: Bai, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.16924  [pdf, ps, other

    cs.CV

    The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text

    Authors: Hanlin Wang, Hao Ouyang, Qiuyu Wang, Yue Yu, Yihao Meng, Wen Wang, Ka Leong Cheng, Shuailei Ma, Qingyan Bai, Yixuan Li, Cheng Chen, Yanhong Zeng, Xing Zhu, Yujun Shen, Qifeng Chen

    Abstract: We present WorldCanvas, a framework for promptable world events that enables rich, user-directed simulation by combining text, trajectories, and reference images. Unlike text-only approaches and existing trajectory-controlled image-to-video methods, our multimodal approach combines trajectories -- encoding motion, timing, and visibility -- with natural language for semantic intent and reference im… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

    Comments: Project page and code: https://worldcanvas.github.io/

  2. arXiv:2512.05470  [pdf, ps, other

    cs.SE

    Everything is Context: Agentic File System Abstraction for Context Engineering

    Authors: Xiwei Xu, Robert Mao, Quan Bai, Xuewu Gu, Yechao Li, Liming Zhu

    Abstract: Generative AI (GenAI) has reshaped software system design by introducing foundation models as pre-trained subsystems that redefine architectures and operations. The emerging challenge is no longer model fine-tuning but context engineering-how systems capture, structure, and govern external knowledge, memory, tools, and human input to enable trustworthy reasoning. Existing practices such as prompt… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

    Comments: Submitted

  3. arXiv:2512.03046  [pdf, ps, other

    cs.CV

    MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues

    Authors: Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Shuailei Ma, Ka Leong Cheng, Wen Wang, Qingyan Bai, Yuxuan Zhang, Yanhong Zeng, Yixuan Li, Xing Zhu, Yujun Shen, Qifeng Chen

    Abstract: We propose MagicQuill V2, a novel system that introduces a \textbf{layered composition} paradigm to generative image editing, bridging the gap between the semantic power of diffusion models and the granular control of traditional graphics software. While diffusion transformers excel at holistic generation, their use of singular, monolithic prompts fails to disentangle distinct user intentions for… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

    Comments: Code and demo available at https://magicquill.art/v2/

  4. arXiv:2511.14539  [pdf, ps, other

    cs.CV

    Learning Compact Latent Space for Representing Neural Signed Distance Functions with High-fidelity Geometry Details

    Authors: Qiang Bai, Bojian Wu, Xi Yang, Zhizhong Han

    Abstract: Neural signed distance functions (SDFs) have been a vital representation to represent 3D shapes or scenes with neural networks. An SDF is an implicit function that can query signed distances at specific coordinates for recovering a 3D surface. Although implicit functions work well on a single shape or scene, they pose obstacles when analyzing multiple SDFs with high-fidelity geometry details, due… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: Accepted as an Poster paper at the AAAI Conference on Artificial Intelligence (AAAI-26)

  5. arXiv:2510.15742  [pdf, ps, other

    cs.CV

    Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

    Authors: Qingyan Bai, Qiuyu Wang, Hao Ouyang, Yue Yu, Hanlin Wang, Wen Wang, Ka Leong Cheng, Shuailei Ma, Yanhong Zeng, Zichen Liu, Yinghao Xu, Yujun Shen, Qifeng Chen

    Abstract: Instruction-based video editing promises to democratize content creation, yet its progress is severely hampered by the scarcity of large-scale, high-quality training data. We introduce Ditto, a holistic framework designed to tackle this fundamental challenge. At its heart, Ditto features a novel data generation pipeline that fuses the creative diversity of a leading image editor with an in-context… ▽ More

    Submitted 16 December, 2025; v1 submitted 17 October, 2025; originally announced October 2025.

    Comments: Project page: https://ezioby.github.io/Ditto_page Code: https://github.com/EzioBy/Ditto

  6. arXiv:2509.22964  [pdf, ps, other

    cs.LG cs.AI

    Functional Critic Modeling for Provably Convergent Off-Policy Actor-Critic

    Authors: Qinxun Bai, Yuxuan Han, Wei Xu, Zhengyuan Zhou

    Abstract: Off-policy reinforcement learning (RL) with function approximation offers an effective way to improve sample efficiency by reusing past experience. Within this setting, the actor-critic (AC) framework has achieved strong empirical success. However, both the critic and actor learning is challenging for the off-policy AC methods: first of all, in addition to the classic "deadly triad" instability of… ▽ More

    Submitted 14 October, 2025; v1 submitted 26 September, 2025; originally announced September 2025.

  7. arXiv:2509.03859  [pdf, ps, other

    cs.RO

    Learning Multi-Stage Pick-and-Place with a Legged Mobile Manipulator

    Authors: Haichao Zhang, Haonan Yu, Le Zhao, Andrew Choi, Qinxun Bai, Yiqing Yang, Wei Xu

    Abstract: Quadruped-based mobile manipulation presents significant challenges in robotics due to the diversity of required skills, the extended task horizon, and partial observability. After presenting a multi-stage pick-and-place task as a succinct yet sufficiently rich setup that captures key desiderata for quadruped-based mobile manipulation, we propose an approach that can train a visuo-motor policy ent… ▽ More

    Submitted 8 September, 2025; v1 submitted 3 September, 2025; originally announced September 2025.

    Comments: Accepted to IEEE Robotics and Automation Letters (RA-L). Tech Report: arXiv:2501.09905

  8. arXiv:2509.03281  [pdf, ps, other

    cs.NE

    A Brain-Inspired Gating Mechanism Unlocks Robust Computation in Spiking Neural Networks

    Authors: Qianyi Bai, Haiteng Wang, Qiang Yu

    Abstract: While spiking neural networks (SNNs) provide a biologically inspired and energy-efficient computational framework, their robustness and the dynamic advantages inherent to biological neurons remain significantly underutilized owing to oversimplified neuron models. In particular, conventional leaky integrate-and-fire (LIF) neurons often omit the dynamic conductance mechanisms inherent in biological… ▽ More

    Submitted 3 September, 2025; originally announced September 2025.

  9. arXiv:2507.17735  [pdf, ps, other

    eess.AS cs.SD

    Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data

    Authors: Qibing Bai, Sho Inoue, Shuai Wang, Zhongjie Jiang, Yannan Wang, Haizhou Li

    Abstract: Accent normalization converts foreign-accented speech into native-like speech while preserving speaker identity. We propose a novel pipeline using self-supervised discrete tokens and non-parallel training data. The system extracts tokens from source speech, converts them through a dedicated model, and synthesizes the output using flow matching. Our method demonstrates superior performance over a f… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

    Comments: Accepted to INTERSPEECH 2025

  10. arXiv:2506.24123  [pdf, ps, other

    cs.CV

    Calligrapher: Freestyle Text Image Customization

    Authors: Yue Ma, Qingyan Bai, Hao Ouyang, Ka Leong Cheng, Qiuyu Wang, Hongyu Liu, Zichen Liu, Haofan Wang, Jingye Chen, Yujun Shen, Qifeng Chen

    Abstract: We introduce Calligrapher, a novel diffusion-based framework that innovatively integrates advanced text customization with artistic typography for digital calligraphy and design applications. Addressing the challenges of precise style control and data dependency in typographic customization, our framework incorporates three key technical contributions. First, we develop a self-distillation mechani… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Project page: https://calligrapher2025.github.io/Calligrapher Code: https://github.com/Calligrapher2025/Calligrapher

  11. arXiv:2506.09513  [pdf, ps, other

    cs.CL cs.AI cs.MA

    ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

    Authors: Yu Sun, Xingyu Qian, Weiwen Xu, Hao Zhang, Chenghao Xiao, Long Li, Deli Zhao, Wenbing Huang, Tingyang Xu, Qifeng Bai, Yu Rong

    Abstract: Reasoning-based large language models have excelled in mathematics and programming, yet their potential in knowledge-intensive medical question answering remains underexplored and insufficiently validated in clinical contexts. To bridge this gap, we introduce ReasonMed, the largest medical reasoning dataset to date, comprising 370k high-quality examples distilled from 1.75 million initial reasonin… ▽ More

    Submitted 9 October, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

    Comments: 28 pages, 6 figures, 7 tables

  12. arXiv:2505.15138  [pdf, ps, other

    cs.LG cs.AI

    Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm

    Authors: Yang Xu, Swetha Ganesh, Washim Uddin Mondal, Qinbo Bai, Vaneet Aggarwal

    Abstract: This paper investigates infinite-horizon average reward Constrained Markov Decision Processes (CMDPs) with general parametrization. We propose a Primal-Dual Natural Actor-Critic algorithm that adeptly manages constraints while ensuring a high convergence rate. In particular, our algorithm achieves global convergence and constraint violation rates of $\tilde{\mathcal{O}}(1/\sqrt{T})$ over a horizon… ▽ More

    Submitted 9 December, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: NeurIPS 2025

  13. arXiv:2503.06518  [pdf, other

    cs.LG cs.AI

    Towards Superior Quantization Accuracy: A Layer-sensitive Approach

    Authors: Feng Zhang, Yanbin Liu, Weihua Li, Jie Lv, Xiaodan Wang, Quan Bai

    Abstract: Large Vision and Language Models have exhibited remarkable human-like intelligence in tasks such as natural language comprehension, problem-solving, logical reasoning, and knowledge retrieval. However, training and serving these models require substantial computational resources, posing a significant barrier to their widespread application and further research. To mitigate this challenge, various… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  14. arXiv:2503.00413  [pdf, other

    cs.CV cs.LG

    CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering

    Authors: Tianyu Huai, Jie Zhou, Xingjiao Wu, Qin Chen, Qingchun Bai, Ze Zhou, Liang He

    Abstract: Multimodal large language models (MLLMs) have garnered widespread attention from researchers due to their remarkable understanding and generation capabilities in visual language tasks (e.g., visual question answering). However, the rapid pace of knowledge updates in the real world makes offline training of MLLMs costly, and when faced with non-stationary data streams, MLLMs suffer from catastrophi… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

    Comments: 10 pages,4 figures,accepted by CVPR2025

  15. arXiv:2502.02945  [pdf, other

    cs.CL cs.AI

    LLM-KT: Aligning Large Language Models with Knowledge Tracing using a Plug-and-Play Instruction

    Authors: Ziwei Wang, Jie Zhou, Qin Chen, Min Zhang, Bo Jiang, Aimin Zhou, Qinchun Bai, Liang He

    Abstract: The knowledge tracing (KT) problem is an extremely important topic in personalized education, which aims to predict whether students can correctly answer the next question based on their past question-answer records. Prior work on this task mainly focused on learning the sequence of behaviors based on the IDs or textual information. However, these studies usually fail to capture students' sufficie… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  16. arXiv:2501.13394  [pdf, ps, other

    cs.LG cs.AI

    Concurrent Learning with Aggregated States via Randomized Least Squares Value Iteration

    Authors: Yan Chen, Qinxun Bai, Yiteng Zhang, Shi Dong, Maria Dimakopoulou, Qi Sun, Zhengyuan Zhou

    Abstract: Designing learning agents that explore efficiently in a complex environment has been widely recognized as a fundamental challenge in reinforcement learning. While a number of works have demonstrated the effectiveness of techniques based on randomized value functions on a single agent, it remains unclear, from a theoretical point of view, whether injecting randomization can help a society of agents… ▽ More

    Submitted 15 June, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

  17. arXiv:2501.09905  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    SLIM: Sim-to-Real Legged Instructive Manipulation via Long-Horizon Visuomotor Learning

    Authors: Haichao Zhang, Haonan Yu, Le Zhao, Andrew Choi, Qinxun Bai, Break Yang, Wei Xu

    Abstract: We present a low-cost legged mobile manipulation system that solves long-horizon real-world tasks, trained by reinforcement learning purely in simulation. This system is made possible by 1) a hierarchical design of a high-level policy for visual-mobile manipulation following task instructions, and a low-level quadruped locomotion policy, 2) a teacher and student training pipeline for the high leve… ▽ More

    Submitted 29 January, 2025; v1 submitted 16 January, 2025; originally announced January 2025.

  18. arXiv:2412.21079  [pdf, other

    cs.CV

    Edicho: Consistent Image Editing in the Wild

    Authors: Qingyan Bai, Hao Ouyang, Yinghao Xu, Qiuyu Wang, Ceyuan Yang, Ka Leong Cheng, Yujun Shen, Qifeng Chen

    Abstract: As a verified need, consistent editing across in-the-wild images remains a technical challenge arising from various unmanageable factors, like object poses, lighting conditions, and photography environments. Edicho steps in with a training-free solution based on diffusion models, featuring a fundamental design principle of using explicit image correspondence to direct editing. Specifically, the ke… ▽ More

    Submitted 14 January, 2025; v1 submitted 30 December, 2024; originally announced December 2024.

    Comments: Project page: https://ant-research.github.io/edicho/

  19. arXiv:2411.08307  [pdf, ps, other

    cs.AI cs.MM cs.SD eess.AS

    PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation

    Authors: Yungang Yi, Weihua Li, Matthew Kuo, Quan Bai

    Abstract: AI-based music generation has made significant progress in recent years. However, generating symbolic music that is both long-structured and expressive remains a significant challenge. In this paper, we propose PerceiverS (Segmentation and Scale), a novel architecture designed to address this issue by leveraging both Effective Segmentation and Multi-Scale attention mechanisms. Our approach enhance… ▽ More

    Submitted 21 September, 2025; v1 submitted 12 November, 2024; originally announced November 2024.

    ACM Class: I.2.7; H.5.5

    Journal ref: IEEE Transactions on Audio, Speech, and Language Processing, 2025

  20. arXiv:2411.00259  [pdf, other

    cs.LG

    Enhancing Diversity in Bayesian Deep Learning via Hyperspherical Energy Minimization of CKA

    Authors: David Smerkous, Qinxun Bai, Fuxin Li

    Abstract: Particle-based Bayesian deep learning often requires a similarity metric to compare two networks. However, naive similarity metrics lack permutation invariance and are inappropriate for comparing networks. Centered Kernel Alignment (CKA) on feature kernels has been proposed to compare deep networks but has not been used as an optimization objective in Bayesian deep learning. In this paper, we expl… ▽ More

    Submitted 31 October, 2024; originally announced November 2024.

    Comments: NeurIPS 2024

  21. arXiv:2410.08345  [pdf, other

    cs.AI

    Large Legislative Models: Towards Efficient AI Policymaking in Economic Simulations

    Authors: Henry Gasztowtt, Benjamin Smith, Vincent Zhu, Qinxun Bai, Edwin Zhang

    Abstract: The improvement of economic policymaking presents an opportunity for broad societal benefit, a notion that has inspired research towards AI-driven policymaking tools. AI policymaking holds the potential to surpass human performance through the ability to process data quickly at scale. However, existing RL-based methods exhibit sample inefficiency, and are further limited by an inability to flexibl… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  22. arXiv:2408.11408  [pdf, other

    cs.CV

    Latent Feature and Attention Dual Erasure Attack against Multi-View Diffusion Models for 3D Assets Protection

    Authors: Jingwei Sun, Xuchong Zhang, Changfeng Sun, Qicheng Bai, Hongbin Sun

    Abstract: Multi-View Diffusion Models (MVDMs) enable remarkable improvements in the field of 3D geometric reconstruction, but the issue regarding intellectual property has received increasing attention due to unauthorized imitation. Recently, some works have utilized adversarial attacks to protect copyright. However, all these works focus on single-image generation tasks which only need to consider the inne… ▽ More

    Submitted 7 April, 2025; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by ICME 2025

  23. arXiv:2407.15415  [pdf, other

    cs.CL

    LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models

    Authors: Xi Chen, Songyang Zhang, Qibing Bai, Kai Chen, Satoshi Nakamura

    Abstract: We introduces LLaST, a framework for building high-performance Large Language model based Speech-to-text Translation systems. We address the limitations of end-to-end speech translation(E2E ST) models by exploring model architecture design and optimization techniques tailored for LLMs. Our approach includes LLM-based speech translation architecture design, ASR-augmented training, multilingual data… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  24. arXiv:2407.15233  [pdf, other

    cs.CV

    LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with Diffusion Transformer

    Authors: Yu Li, Yifan Chen, Gongye Liu, Fei Yin, Qingyan Bai, Jie Wu, Hongfa Wang, Ruihang Chu, Yujiu Yang

    Abstract: Layout generation is a foundation task of graphic design, which requires the integration of visual aesthetics and harmonious expression of content delivery. However, existing methods still face challenges in generating precise and visually appealing layouts, including blocking, overlapping, small-sized, or spatial misalignment. We found that these methods overlook the crucial balance between learn… ▽ More

    Submitted 22 November, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

  25. Imbalanced Graph-Level Anomaly Detection via Counterfactual Augmentation and Feature Learning

    Authors: Zitong Wang, Xuexiong Luo, Enfeng Song, Qiuqing Bai, Fu Lin

    Abstract: Graph-level anomaly detection (GLAD) has already gained significant importance and has become a popular field of study, attracting considerable attention across numerous downstream works. The core focus of this domain is to capture and highlight the anomalous information within given graph datasets. In most existing studies, anomalies are often the instances of few. The stark imbalance misleads cu… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 12 pages, 4 figures, SSDBM2024

  26. arXiv:2406.11481  [pdf, other

    cs.LG cs.AI

    Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

    Authors: Vaneet Aggarwal, Washim Uddin Mondal, Qinbo Bai

    Abstract: Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology, mechanics, and finance. The primary objective in these applications is to maximize the average reward. Real-world scenarios often necessitate adherence to specific co… ▽ More

    Submitted 17 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.02042

    Journal ref: Foundations and Trends in Optimization: Vol. 6: No. 4, pp 193-298, 2024

  27. arXiv:2406.10367  [pdf, other

    cs.LG

    Disentangled Hyperbolic Representation Learning for Heterogeneous Graphs

    Authors: Qijie Bai, Changli Nie, Haiwei Zhang, Zhicheng Dou, Xiaojie Yuan

    Abstract: Heterogeneous graphs have attracted a lot of research interests recently due to the success for representing complex real-world systems. However, existing methods have two pain points in embedding them into low-dimensional spaces: the mixing of structural and semantic information, and the distributional mismatch between data and embedding spaces. These two challenges require representation methods… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  28. arXiv:2406.05551  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

    Authors: Zhijun Liu, Shuai Wang, Sho Inoue, Qibing Bai, Haizhou Li

    Abstract: Audio language models have recently emerged as a promising approach for various audio generation tasks, relying on audio tokenizers to encode waveforms into sequences of discrete symbols. Audio tokenization often poses a necessary compromise between code bitrate and reconstruction accuracy. When dealing with low-bitrate audio codes, language models are constrained to process only a subset of the i… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  29. arXiv:2406.04679  [pdf, other

    eess.IV cs.CV

    XctDiff: Reconstruction of CT Images with Consistent Anatomical Structures from a Single Radiographic Projection Image

    Authors: Qingze Bai, Tiange Liu, Zhi Liu, Yubing Tong, Drew Torigian, Jayaram Udupa

    Abstract: In this paper, we present XctDiff, an algorithm framework for reconstructing CT from a single radiograph, which decomposes the reconstruction process into two easily controllable tasks: feature extraction and CT reconstruction. Specifically, we first design a progressive feature extraction strategy that is able to extract robust 3D priors from radiographs. Then, we use the extracted prior informat… ▽ More

    Submitted 13 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

  30. arXiv:2404.11869  [pdf, other

    cs.LG cs.SI

    An Efficient Loop and Clique Coarsening Algorithm for Graph Classification

    Authors: Xiaorui Qi, Qijie Bai, Yanlong Wen, Haiwei Zhang, Xiaojie Yuan

    Abstract: Graph Transformers (GTs) have made remarkable achievements in graph-level tasks. However, most existing works regard graph structures as a form of guidance or bias for enhancing node representations, which focuses on node-central perspectives and lacks explicit representations of edges and structures. One natural question arises as to whether we can leverage a hypernode to represent some structure… ▽ More

    Submitted 9 December, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  31. arXiv:2404.04906  [pdf, other

    cs.HC cs.IR

    Balancing Information Perception with Yin-Yang: Agent-Based Information Neutrality Model for Recommendation Systems

    Authors: Mengyan Wang, Yuxuan Hu, Shiqing Wu, Weihua Li, Quan Bai, Verica Rupar

    Abstract: While preference-based recommendation algorithms effectively enhance user engagement by recommending personalized content, they often result in the creation of ``filter bubbles''. These bubbles restrict the range of information users interact with, inadvertently reinforcing their existing viewpoints. Previous research has focused on modifying these underlying algorithms to tackle this issue. Yet,… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  32. arXiv:2402.15525  [pdf, other

    cs.CL cs.CY

    Detecting misinformation through Framing Theory: the Frame Element-based Model

    Authors: Guan Wang, Rebecca Frederick, Jinglong Duan, William Wong, Verica Rupar, Weihua Li, Quan Bai

    Abstract: In this paper, we delve into the rapidly evolving challenge of misinformation detection, with a specific focus on the nuanced manipulation of narrative frames - an under-explored area within the AI community. The potential for Generative AI models to generate misleading narratives underscores the urgency of this problem. Drawing from communication and framing theories, we posit that the presentati… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 17 pages, 9 figures, 7 tables

  33. arXiv:2402.15289  [pdf, other

    cs.CL cs.LG

    Let's Rectify Step by Step: Improving Aspect-based Sentiment Analysis with Diffusion Models

    Authors: Shunyu Liu, Jie Zhou, Qunxi Zhu, Qin Chen, Qingchun Bai, Jun Xiao, Liang He

    Abstract: Aspect-Based Sentiment Analysis (ABSA) stands as a crucial task in predicting the sentiment polarity associated with identified aspects within text. However, a notable challenge in ABSA lies in precisely determining the aspects' boundaries (start and end indices), especially for long ones, due to users' colloquial expressions. We propose DiffusionABSA, a novel diffusion model tailored for ABSA, wh… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: Accepted to LREC-COLING 2024, submission version

  34. arXiv:2402.14000  [pdf, other

    cs.CV

    Real-time 3D-aware Portrait Editing from a Single Image

    Authors: Qingyan Bai, Zifan Shi, Yinghao Xu, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, Qifeng Chen

    Abstract: This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or text descriptions, in a 3D-aware manner. To this end, a lightweight module is distilled from a 3D portrait generator and a text-to-image model, which provide prior knowledge of face geometry and superior editing capability, respectively. Such a design brings two comp… ▽ More

    Submitted 18 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: ECCV 2024 camera-ready version. Project page: https://github.com/EzioBy/3dpe

  35. arXiv:2402.02042  [pdf, ps, other

    cs.LG cs.AI

    Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm

    Authors: Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal

    Abstract: This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDPs). To the best of our knowledge, this work is the first to delve into the regret and constraint violation analysis of average reward CMDPs with a general policy parametrization. To address this challenge, we propose a primal dual-based policy gradient algorithm that adeptly manages the cons… ▽ More

    Submitted 30 October, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Journal ref: NeurIPS 2024

  36. arXiv:2312.02877  [pdf, other

    cs.CV

    DIPR: Efficient Point Cloud Registration via Dynamic Iteration

    Authors: Yang Ai, Qiang Bai, Jindong Li, Xi Yang

    Abstract: Point cloud registration (PCR) is an essential task in 3D vision. Existing methods achieve increasingly higher accuracy. However, a large proportion of non-overlapping points in point cloud registration consume a lot of computational resources while negatively affecting registration accuracy. To overcome this challenge, we introduce a novel Efficient Point Cloud Registration via Dynamic Iteration… ▽ More

    Submitted 24 August, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  37. arXiv:2310.04342  [pdf, other

    cs.DB cs.NI

    Minerva: Decentralized Collaborative Query Processing over InterPlanetary File System

    Authors: Zhiyi Yao, Bowen Ding, Qianlan Bai, Yuedong Xu

    Abstract: Data silos create barriers in accessing and utilizing data dispersed over networks. Directly sharing data easily suffers from the long downloading time, the single point failure and the untraceable data usage. In this paper, we present Minerva, a peer-to-peer cross-cluster data query system based on InterPlanetary File System (IPFS). Minerva makes use of the distributed Hash table (DHT) lookup to… ▽ More

    Submitted 8 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  38. arXiv:2309.11730  [pdf, other

    eess.AS cs.SD

    Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

    Authors: Shuai Wang, Qibing Bai, Qi Liu, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li

    Abstract: Current speaker recognition systems primarily rely on supervised approaches, constrained by the scale of labeled datasets. To boost the system performance, researchers leverage large pretrained models such as WavLM to transfer learned high-level features to the downstream speaker recognition task. However, this approach introduces extra parameters as the pretrained model remains in the inference s… ▽ More

    Submitted 26 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: submitted to ICASSP 2024

  39. arXiv:2309.01922  [pdf, ps, other

    cs.LG cs.AI

    Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

    Authors: Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal

    Abstract: In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Distinguishing itself from existing works within this context, our approach harnesses the power of the general policy gradient-based algorithm, liberating it from the constraints of assuming a linear MDP structure. We propose a policy gradient-based algorithm and show its global convergence property. We th… ▽ More

    Submitted 2 February, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

    Journal ref: AAAI 2024

  40. arXiv:2308.07926  [pdf, other

    cs.CV

    CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

    Authors: Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen

    Abstract: We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i.e., rendered from the canonical content field) to each individual frame along the time axis. Given a target video, these two fi… ▽ More

    Submitted 12 December, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Project Webpage: https://qiuyu96.github.io/CoDeF/, Code: https://github.com/qiuyu96/CoDeF

  41. arXiv:2307.02797  [pdf, other

    cs.IR cs.AI

    BHEISR: Nudging from Bias to Balance -- Promoting Belief Harmony by Eliminating Ideological Segregation in Knowledge-based Recommendations

    Authors: Mengyan Wang, Yuxuan Hu, Zihan Yuan, Chenting Jiang, Weihua Li, Shiqing Wu, Quan Bai

    Abstract: In the realm of personalized recommendation systems, the increasing concern is the amplification of belief imbalance and user biases, a phenomenon primarily attributed to the filter bubble. Addressing this critical issue, we introduce an innovative intermediate agency (BHEISR) between users and existing recommendation systems to attenuate the negative repercussions of the filter bubble effect in e… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: 26 pages

    MSC Class: 68T07 ACM Class: I.2.6; I.2.7

  42. arXiv:2306.05537  [pdf, other

    cs.CL

    AaKOS: Aspect-adaptive Knowledge-based Opinion Summarization

    Authors: Guan Wang, Weihua Li, Edmund M-K. Lai, Quan Bai

    Abstract: The rapid growth of information on the Internet has led to an overwhelming amount of opinions and comments on various activities, products, and services. This makes it difficult and time-consuming for users to process all the available information when making decisions. Text summarization, a Natural Language Processing (NLP) task, has been widely explored to help users quickly retrieve relevant in… ▽ More

    Submitted 25 May, 2023; originally announced June 2023.

    Comments: 21 pages, 4 figures, 7 tables

  43. arXiv:2305.08272  [pdf, other

    cs.DB

    QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting

    Authors: Qiushi Bai, Sadeem Alsudais, Chen Li

    Abstract: SQL query performance is critical in database applications, and query rewriting is a technique that transforms an original query into an equivalent query with a better performance. In a wide range of database-supported systems, there is a unique problem where both the application and database layer are black boxes, and the developers need to use their knowledge about the data and domain to rewrite… ▽ More

    Submitted 14 May, 2023; originally announced May 2023.

  44. HGWaveNet: A Hyperbolic Graph Neural Network for Temporal Link Prediction

    Authors: Qijie Bai, Changli Nie, Haiwei Zhang, Dongming Zhao, Xiaojie Yuan

    Abstract: Temporal link prediction, aiming to predict future edges between paired nodes in a dynamic graph, is of vital importance in diverse applications. However, existing methods are mainly built upon uniform Euclidean space, which has been found to be conflict with the power-law distributions of real-world graphs and unable to represent the hierarchical connections between nodes effectively. With respec… ▽ More

    Submitted 3 May, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

    Comments: Accepted by Web Conference (WWW) 2023

    Journal ref: WWW '23: Proceedings of the ACM Web Conference 2023 (523-532)

  45. $\text{H}^2\text{TNE}$: Temporal Heterogeneous Information Network Embedding in Hyperbolic Spaces

    Authors: Qijie Bai, Jiawen Guo, Haiwei Zhang, Changli Nie, Lin Zhang, Xiaojie Yuan

    Abstract: Temporal heterogeneous information network (temporal HIN) embedding, aiming to represent various types of nodes of different timestamps into low dimensional spaces while preserving structural and semantic information, is of vital importance in diverse real-life tasks. Researchers have made great efforts on temporal HIN embedding in Euclidean spaces and got some considerable achievements. However,… ▽ More

    Submitted 14 June, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

    Journal ref: The Semantic Web-ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23-27, 2022, Proceedings (pp. 179-195)

  46. arXiv:2304.01999  [pdf, other

    cs.CV

    Revisiting the Evaluation of Image Synthesis with GANs

    Authors: Mengping Yang, Ceyuan Yang, Yichi Zhang, Qingyan Bai, Yujun Shen, Bo Dai

    Abstract: A good metric, which promises a reliable comparison between solutions, is essential for any well-defined task. Unlike most vision tasks that have per-sample ground-truth, image synthesis tasks target generating unseen data and hence are usually evaluated through a distributional distance between one set of real samples and another set of generated samples. This study presents an empirical investig… ▽ More

    Submitted 23 October, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: NeurIPS 2023 datasets and benchmarks track

  47. arXiv:2303.00815  [pdf, other

    cs.CL cs.AI

    Soft Prompt Guided Joint Learning for Cross-Domain Sentiment Analysis

    Authors: Jingli Shi, Weihua Li, Quan Bai, Yi Yang, Jianhua Jiang

    Abstract: Aspect term extraction is a fundamental task in fine-grained sentiment analysis, which aims at detecting customer's opinion targets from reviews on product or service. The traditional supervised models can achieve promising results with annotated datasets, however, the performance dramatically decreases when they are applied to the task of cross-domain aspect term extraction. Existing cross-domain… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: 22 pages

  48. arXiv:2302.08505  [pdf, other

    cs.CV cs.AI

    Rapid-Motion-Track: Markerless Tracking of Fast Human Motion with Deeper Learning

    Authors: Renjie Li, Chun Yu Lao, Rebecca St. George, Katherine Lawler, Saurabh Garg, Son N. Tran, Quan Bai, Jane Alty

    Abstract: Objective The coordination of human movement directly reflects function of the central nervous system. Small deficits in movement are often the first sign of an underlying neurological problem. The objective of this research is to develop a new end-to-end, deep learning-based system, Rapid-Motion-Track (RMT) that can track the fastest human movement accurately when webcams or laptop cameras are us… ▽ More

    Submitted 18 January, 2023; originally announced February 2023.

  49. arXiv:2302.01443  [pdf, other

    cs.AI

    DOR: A Novel Dual-Observation-Based Approach for News Recommendation Systems

    Authors: Mengyan Wang, Weihua Li, Jingli Shi, Shiqing Wu, Quan Bai

    Abstract: Online social media platforms offer access to a vast amount of information, but sifting through the abundance of news can be overwhelming and tiring for readers. personalised recommendation algorithms can help users find information that interests them. However, most existing models rely solely on observations of user behaviour, such as viewing history, ignoring the connections between the news an… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    MSC Class: 68T07

  50. arXiv:2212.03752  [pdf, other

    cs.CV eess.IV

    GLeaD: Improving GANs with A Generator-Leading Task

    Authors: Qingyan Bai, Ceyuan Yang, Yinghao Xu, Xihui Liu, Yujiu Yang, Yujun Shen

    Abstract: Generative adversarial network (GAN) is formulated as a two-player game between a generator (G) and a discriminator (D), where D is asked to differentiate whether an image comes from real data or is produced by G. Under such a formulation, D plays as the rule maker and hence tends to dominate the competition. Towards a fairer game in GANs, we propose a new paradigm for adversarial training, which… ▽ More

    Submitted 6 June, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: CVPR2023. Project page: https://ezioby.github.io/glead/ Code: https://github.com/EzioBy/glead/