Physics-Enforced Neural Ordinary Differential Equation for Chemical Kinetics Optimization in Reaction-Diffusion Systems

Authors: Feixue Cai, Hua Zhou, Zhuyin Ren

Abstract: Calibrating chemical kinetics in a reaction-diffusion system is challenging because of complex dynamics governed by tightly coupled chemistry and transport, while experimental observations are often sparse and noisy. We propose a physics consistent diffusion-chemistry coupled neural ordinary differential equation (Diff-Chem Neural ODE) that embeds Arrhenius-structured reaction neurons into a fully… ▽ More Calibrating chemical kinetics in a reaction-diffusion system is challenging because of complex dynamics governed by tightly coupled chemistry and transport, while experimental observations are often sparse and noisy. We propose a physics consistent diffusion-chemistry coupled neural ordinary differential equation (Diff-Chem Neural ODE) that embeds Arrhenius-structured reaction neurons into a fully differentiable streamline formulation and explicitly accounts for diffusion coupling. This design enables direct gradient-based analysis of kinetic parameters without sampling-based pretraining. We validate this method on burner-stabilized flat and stagnation reacting flows using mechanisms spanning different stiffness ranges. The proposed method reproduces species profiles with near-reference accuracy, whereas a pure chemistry Neural ODE that neglects diffusion coupling may misplace ignition and generate an incorrect thin reaction zone. Diff-Chem Neural ODE is more robust than pure chemistry Neural ODE and provides substantial speedups for gradient evaluation compared with fully discretized computations. In kinetics refinement, optimizing only a limited set of "primal" species reduces the loss by over 98% and simultaneously recovers unobserved variables, demonstrating physically consistent global control. Finally, tests with 1-20% noise in the objective show stable convergence without local overfitting, supporting its applicability under noisy measurements. △ Less

Submitted 30 March, 2026; originally announced March 2026.

arXiv:2603.21856 [pdf, ps, other]

Climate Prompting: Generating the Madden-Julian Oscillation using Video Diffusion and Low-Dimensional Conditioning

Authors: Sulian Thual, Feiyang Cai, Jingjing Wang, Feng Luo

Abstract: Generative Deep Learning is a powerful tool for modeling of the Madden-Julian oscillation (MJO) in the tropics, yet its relationship to traditional theoretical frameworks remains poorly understood. Here we propose a video diffusion model, trained on atmospheric reanalysis, to synthetize long MJO sequences conditioned on key low-dimensional metrics. The generated MJOs capture key features including… ▽ More Generative Deep Learning is a powerful tool for modeling of the Madden-Julian oscillation (MJO) in the tropics, yet its relationship to traditional theoretical frameworks remains poorly understood. Here we propose a video diffusion model, trained on atmospheric reanalysis, to synthetize long MJO sequences conditioned on key low-dimensional metrics. The generated MJOs capture key features including composites, power spectra and multiscale structures including convectively coupled waves, despite some bias. We then prompt the model to generate more tractable MJOs based on intentionally idealized low-dimensional conditionings, for example a perpetual MJO, an isolated modulation by seasons and/or the El Nino-Southern Oscillation, and so on. This enables deconstructing the underlying processes and identifying physical drivers. The present approach provides a practical framework for bridging the gap between low-dimensional MJO theory and high-resolution atmospheric complexity and will help tropical atmosphere prediction. △ Less

Submitted 23 March, 2026; originally announced March 2026.

arXiv:2603.08254 [pdf, ps, other]

DynamicVGGT: Learning Dynamic Point Maps for 4D Scene Reconstruction in Autonomous Driving

Authors: Zhuolin He, Jing Li, Guanghao Li, Xiaolei Chen, Jiacheng Tang, Siyang Zhang, Zhounan Jin, Feipeng Cai, Bin Li, Jian Pu, Jia Cai, Xiangyang Xue

Abstract: Dynamic scene reconstruction in autonomous driving remains a fundamental challenge due to significant temporal variations, moving objects, and complex scene dynamics. Existing feed-forward 3D models have demonstrated strong performance in static reconstruction but still struggle to capture dynamic motion. To address these limitations, we propose DynamicVGGT, a unified feed-forward framework that e… ▽ More Dynamic scene reconstruction in autonomous driving remains a fundamental challenge due to significant temporal variations, moving objects, and complex scene dynamics. Existing feed-forward 3D models have demonstrated strong performance in static reconstruction but still struggle to capture dynamic motion. To address these limitations, we propose DynamicVGGT, a unified feed-forward framework that extends VGGT from static 3D perception to dynamic 4D reconstruction. Our goal is to model point motion within feed-forward 3D models in a dynamic and temporally coherent manner. To this end, we jointly predict the current and future point maps within a shared reference coordinate system, allowing the model to implicitly learn dynamic point representations through temporal correspondence. To efficiently capture temporal dependencies, we introduce a Motion-aware Temporal Attention (MTA) module that learns motion continuity. Furthermore, we design a Dynamic 3D Gaussian Splatting Head that explicitly models point motion by predicting Gaussian velocities using learnable motion tokens under scene flow supervision. It refines dynamic geometry through continuous 3D Gaussian optimization. Extensive experiments on autonomous driving datasets demonstrate that DynamicVGGT significantly outperforms existing methods in reconstruction accuracy, achieving robust feed-forward 4D dynamic scene reconstruction under complex driving scenarios. △ Less

Submitted 9 March, 2026; originally announced March 2026.

arXiv:2603.01221 [pdf, ps, other]

Epistemic Gain, Aleatoric Cost: Uncertainty Decomposition in Multi-Agent Debate for Math Reasoning

Authors: Dan Qiao, Binbin Chen, Fengyu Cai, Jianlong Chen, Wenhao Li, Fuxin Jiang, Zuzhi Chen, Hongyuan Zha, Tieying Zhang, Baoxiang Wang

Abstract: Multi-Agent Debate (MAD) has shown promise in leveraging collective intelligence to improve reasoning and reduce hallucinations, yet it remains unclear how information exchange shapes the underlying ability. Empirically, MAD exhibits paradoxical phenomena, such as accuracy improvement accompanied by substantial increase in token entropy, and remarkable divergence between homogeneous and heterogene… ▽ More Multi-Agent Debate (MAD) has shown promise in leveraging collective intelligence to improve reasoning and reduce hallucinations, yet it remains unclear how information exchange shapes the underlying ability. Empirically, MAD exhibits paradoxical phenomena, such as accuracy improvement accompanied by substantial increase in token entropy, and remarkable divergence between homogeneous and heterogeneous model combinations. In this paper, we propose a Bayesian uncertainty analysis framework for MAD, which decomposes total predictive uncertainty into epistemic uncertainty reducible by debate context and aleatoric uncertainty induced by internal model noise. Across multiple model configurations, we find that effective debate hinges on achieving high epistemic gain under controlled aleatoric cost. Building on this insight, we design an uncertainty-guided multi-agent reinforcement learning (MARL) algorithm that explicitly optimizes aleatoric noise reduction and epistemic information utilization. Experiments show that our training significantly improves post-debate accuracy and stability, and enhances individual reasoning beyond single-agent RL, providing a unified Bayesian uncertainty perspective for understanding and improving MAD. △ Less

Submitted 1 March, 2026; originally announced March 2026.

arXiv:2602.02320 [pdf, ps, other]

A Large-Scale Dataset for Molecular Structure-Language Description via a Rule-Regularized Method

Authors: Feiyang Cai, Guijuan He, Yi Hu, Jingjing Wang, Joshua Luo, Tianyu Zhu, Srikanth Pilla, Gang Li, Ling Liu, Feng Luo

Abstract: Molecular function is largely determined by structure. Accurately aligning molecular structure with natural language is therefore essential for enabling large language models (LLMs) to reason about downstream chemical tasks. However, the substantial cost of human annotation makes it infeasible to construct large-scale, high-quality datasets of structure-grounded descriptions. In this work, we prop… ▽ More Molecular function is largely determined by structure. Accurately aligning molecular structure with natural language is therefore essential for enabling large language models (LLMs) to reason about downstream chemical tasks. However, the substantial cost of human annotation makes it infeasible to construct large-scale, high-quality datasets of structure-grounded descriptions. In this work, we propose a fully automated annotation framework for generating precise molecular structure descriptions at scale. Our approach builds upon and extends a rule-based chemical nomenclature parser to interpret IUPAC names and construct enriched, structured XML metadata that explicitly encodes molecular structure. This metadata is then used to guide LLMs in producing accurate natural-language descriptions. Using this framework, we curate a large-scale dataset of approximately $163$k molecule-description pairs. A rigorous validation protocol combining LLM-based and expert human evaluation on a subset of $2,000$ molecules demonstrates a high description precision of $98.6\%$. The resulting dataset provides a reliable foundation for future molecule-language alignment, and the proposed annotation method is readily extensible to larger datasets and broader chemical tasks that rely on structural descriptions. △ Less

Submitted 10 February, 2026; v1 submitted 2 February, 2026; originally announced February 2026.

arXiv:2512.11872 [pdf, ps, other]

WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving

Authors: Mingwang Xu, Jiahao Cui, Feipeng Cai, Hanlin Shang, Zhihao Zhu, Shan Luan, Yifang Xu, Neng Zhang, Yaoyi Li, Jia Cai, Siyu Zhu

Abstract: End-to-end autonomous driving systems based on vision-language-action (VLA) models integrate multimodal sensor inputs and language instructions to generate planning and control signals. While autoregressive large language models and continuous diffusion policies are prevalent, the potential of discrete masked diffusion for trajectory generation remains largely unexplored. This paper presents WAM-D… ▽ More End-to-end autonomous driving systems based on vision-language-action (VLA) models integrate multimodal sensor inputs and language instructions to generate planning and control signals. While autoregressive large language models and continuous diffusion policies are prevalent, the potential of discrete masked diffusion for trajectory generation remains largely unexplored. This paper presents WAM-Diff, a VLA framework that employs masked diffusion to iteratively refine a discrete sequence representing future ego-trajectories. Our approach features three key innovations: a systematic adaptation of masked diffusion for autonomous driving that supports flexible, non-causal decoding orders; scalable model capacity via a sparse MoE architecture trained jointly on motion prediction and driving-oriented visual question answering (VQA); and online reinforcement learning using Group Sequence Policy Optimization (GSPO) to optimize sequence-level driving rewards. Remarkably, our model achieves 91.0 PDMS on NAVSIM-v1 and 89.7 EPDMS on NAVSIM-v2, demonstrating the effectiveness of masked diffusion for autonomous driving. The approach provides a promising alternative to autoregressive and diffusion-based policies, supporting scenario-aware decoding strategies for trajectory generation. The code for this paper will be released publicly at: https://github.com/fudan-generative-vision/WAM-Diff △ Less

Submitted 6 December, 2025; originally announced December 2025.

arXiv:2512.06112 [pdf, ps, other]

WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving

Authors: Yifang Xu, Jiahao Cui, Feipeng Cai, Zhihao Zhu, Hanlin Shang, Shan Luan, Mingwang Xu, Neng Zhang, Yaoyi Li, Jia Cai, Siyu Zhu

Abstract: We introduce WAM-Flow, a vision-language-action (VLA) model that casts ego-trajectory planning as discrete flow matching over a structured token space. In contrast to autoregressive decoders, WAM-Flow performs fully parallel, bidirectional denoising, enabling coarse-to-fine refinement with a tunable compute-accuracy trade-off. Specifically, the approach combines a metric-aligned numerical tokenize… ▽ More We introduce WAM-Flow, a vision-language-action (VLA) model that casts ego-trajectory planning as discrete flow matching over a structured token space. In contrast to autoregressive decoders, WAM-Flow performs fully parallel, bidirectional denoising, enabling coarse-to-fine refinement with a tunable compute-accuracy trade-off. Specifically, the approach combines a metric-aligned numerical tokenizer that preserves scalar geometry via triplet-margin learning, a geometry-aware flow objective and a simulator-guided GRPO alignment that integrates safety, ego progress, and comfort rewards while retaining parallel generation. A multi-stage adaptation converts a pre-trained auto-regressive backbone (Janus-1.5B) from causal decoding to non-causal flow model and strengthens road-scene competence through continued multimodal pretraining. Thanks to the inherent nature of consistency model training and parallel decoding inference, WAM-Flow achieves superior closed-loop performance against autoregressive and diffusion-based VLA baselines, with 1-step inference attaining 89.1 PDMS and 5-step inference reaching 90.3 PDMS on NAVSIM v1 benchmark. These results establish discrete flow matching as a new promising paradigm for end-to-end autonomous driving. The code will be publicly available soon. △ Less

Submitted 11 December, 2025; v1 submitted 5 December, 2025; originally announced December 2025.

Comments: 18 pages, 11 figures. Code & Model: https://github.com/fudan-generative-vision/WAM-Flow

arXiv:2511.21970 [pdf, ps, other]

MOTIF-RF: Multi-template On-chip Transformer Synthesis Incorporating Frequency-domain Self-transfer Learning for RFIC Design Automation

Authors: Houbo He, Yizhou Xu, Lei Xia, Yaolong Hu, Fan Cai, Taiyun Chi

Abstract: This paper presents a systematic study on developing multi-template machine learning (ML) surrogate models and applying them to the inverse design of transformers (XFMRs) in radio-frequency integrated circuits (RFICs). Our study starts with benchmarking four widely used ML architectures, including MLP-, CNN-, UNet-, and GT-based models, using the same datasets across different XFMR topologies. To… ▽ More This paper presents a systematic study on developing multi-template machine learning (ML) surrogate models and applying them to the inverse design of transformers (XFMRs) in radio-frequency integrated circuits (RFICs). Our study starts with benchmarking four widely used ML architectures, including MLP-, CNN-, UNet-, and GT-based models, using the same datasets across different XFMR topologies. To improve modeling accuracy beyond these baselines, we then propose a new frequency-domain self-transfer learning technique that exploits correlations between adjacent frequency bands, leading to around 30%-50% accuracy improvement in the S-parameters prediction. Building on these models, we further develop an inverse design framework based on the covariance matrix adaptation evolutionary strategy (CMA-ES) algorithm. This framework is validated using multiple impedance-matching tasks, all demonstrating fast convergence and trustworthy performance. These results advance the goal of AI-assisted specs-to-GDS automation for RFICs and provide RFIC designers with actionable tools for integrating AI into their workflows. △ Less

Submitted 26 November, 2025; originally announced November 2025.

Comments: Accepted at ASP-DAC 2026

arXiv:2508.15304 [pdf, ps, other]

MLLMRec: A Preference Reasoning Paradigm with Graph Refinement for Multimodal Recommendation

Authors: Yuzhuo Dang, Xin Zhang, Zhiqiang Pan, Yuxiao Duan, Wanyu Chen, Fei Cai, Honghui Chen

Abstract: Multimodal recommendation combines the user historical behaviors with the modal features of items to capture the tangible user preferences, presenting superior performance compared to the conventional ID-based recommender systems. However, existing methods still encounter two key problems in the representation learning of users and items, respectively: (1) the initialization of multimodal user rep… ▽ More Multimodal recommendation combines the user historical behaviors with the modal features of items to capture the tangible user preferences, presenting superior performance compared to the conventional ID-based recommender systems. However, existing methods still encounter two key problems in the representation learning of users and items, respectively: (1) the initialization of multimodal user representations is either agnostic to historical behaviors or contaminated by irrelevant modal noise, and (2) the widely used KNN-based item-item graph contains noisy edges with low similarities and lacks audience co-occurrence relationships. To address such issues, we propose MLLMRec, a novel preference reasoning paradigm with graph refinement for multimodal recommendation. Specifically, on the one hand, the item images are first converted into high-quality semantic descriptions using a multimodal large language model (MLLM), thereby bridging the semantic gap between visual and textual modalities. Then, we construct a behavioral description list for each user and feed it into the MLLM to reason about the purified user preference profiles that contain the latent interaction intents. On the other hand, we develop the threshold-controlled denoising and topology-aware enhancement strategies to refine the suboptimal item-item graph, thereby improving the accuracy of item representation learning. Extensive experiments on three publicly available datasets demonstrate that MLLMRec achieves the state-of-the-art performance with an average improvement of 21.48% over the optimal baselines. The source code is provided at https://github.com/Yuzhuo-Dang/MLLMRec.git. △ Less

Submitted 24 January, 2026; v1 submitted 21 August, 2025; originally announced August 2025.

arXiv:2507.14580 [pdf]

doi 10.1063/5.0258340

Investigation on high-order planar Hall effect in trigonal PtBi$_2$

Authors: Fangqi Cai, Mingxi Chi, Yingjie Hu, Heyao Liu, Yangyang Chen, Chao Jing, Wei Ren, He Wang

Abstract: The trigonal PtBi$_2$ (t-PtBi$_2$) as a Weyl semimetal possessing triply degenerate points in its electronic bands near the Fermi level endows it with rich electronic properties. Previous studies have already measured the planar Hall effect (PHE) and in-plane anisotropic magnetoresistance (AMR) of t-PtBi$_2$. We noticed that their experimental results exhibited high-order features in both the PHE… ▽ More The trigonal PtBi$_2$ (t-PtBi$_2$) as a Weyl semimetal possessing triply degenerate points in its electronic bands near the Fermi level endows it with rich electronic properties. Previous studies have already measured the planar Hall effect (PHE) and in-plane anisotropic magnetoresistance (AMR) of t-PtBi$_2$. We noticed that their experimental results exhibited high-order features in both the PHE and AMR, yet these features were not systematically investigated. In our work, we conducted more systematic measurements and analyses of the PHE and AMR in t-PtBi$_2$. Both PHE and AMR show high-order features under low temperatures and strong magnetic fields, and these features share a similar temperature and magnetic field dependence with the turn-on behavior of resistance and temperature curves, indicating a common physical origin for them. We further summarize the critical conditions for the emergence of high-order PHE in t-PtBi$_2$, which will help to understand the origin of high-order features. In addition, we performed computational simulations on the AMR of t-PtBi$_2$, and the results were consistent with the experiments, indicating the high-order features are the result of the combined contribution of the Fermi surface anisotropy and the scaling behavior of magnetoresistance. Our findings will contribute to a deeper understanding of the origins of high-order features in non-magnetic topological materials. △ Less

Submitted 19 July, 2025; originally announced July 2025.

Comments: 18 pages, 4 figures,

Journal ref: Appl. Phys. Lett. 126, 233101 (2025)

arXiv:2506.16552 [pdf, ps, other]

Revela: Dense Retriever Learning via Language Modeling

Authors: Fengyu Cai, Tong Chen, Xinran Zhao, Sihao Chen, Hongming Zhang, Sherry Tongshuang Wu, Iryna Gurevych, Heinz Koeppl

Abstract: Dense retrievers play a vital role in accessing external and specialized knowledge to augment language models (LMs). Training dense retrievers typically requires annotated query-document pairs, which are costly to create and scarce in specialized domains (e.g., code) or in complex settings (e.g., requiring reasoning). These practical challenges have sparked growing interest in self-supervised retr… ▽ More Dense retrievers play a vital role in accessing external and specialized knowledge to augment language models (LMs). Training dense retrievers typically requires annotated query-document pairs, which are costly to create and scarce in specialized domains (e.g., code) or in complex settings (e.g., requiring reasoning). These practical challenges have sparked growing interest in self-supervised retriever learning. Since LMs are trained to capture token-level dependencies through a self-supervised learning objective (i.e., next token prediction), we can analogously cast retrieval as learning dependencies among chunks of tokens. This analogy naturally leads to the question: How can we adapt self-supervised learning objectives in the spirit of language modeling to train retrievers? To answer this question, we introduce Revela, a unified and scalable training framework for self-supervised retriever learning via language modeling. Revela models semantic dependencies among documents by conditioning next token prediction on local and cross-document context through an in-batch attention mechanism. This attention is weighted by retriever-computed similarity scores, enabling the retriever to be optimized as part of language modeling. We evaluate Revela on domain-specific (CoIR), reasoning-intensive (BRIGHT), and general-domain (BEIR) benchmarks across various retriever backbones. Without annotated or synthetic query-document pairs, Revela surpasses larger supervised models and proprietary APIs on both CoIR and BRIGHT. It achieves BEIR's unsupervised SoTA with ~1000x less training data and 10x less compute. Performance increases with batch size and model size, highlighting Revela's scalability and its promise for self-supervised retriever learning. △ Less

Submitted 20 February, 2026; v1 submitted 19 June, 2025; originally announced June 2025.

Comments: Accepted to ICLR 2026 (Oral). Camera-ready version

arXiv:2506.15862 [pdf, ps, other]

MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers

Authors: Jushaan Singh Kalra, Xinran Zhao, To Eun Kim, Fengyu Cai, Fernando Diaz, Tongshuang Wu

Abstract: Retrieval-augmented Generation (RAG) is powerful, but its effectiveness hinges on which retrievers we use and how. Different retrievers offer distinct, often complementary signals: BM25 captures lexical matches; dense retrievers, semantic similarity. Yet in practice, we typically fix a single retriever based on heuristics, which fails to generalize across diverse information needs. Can we dynamica… ▽ More Retrieval-augmented Generation (RAG) is powerful, but its effectiveness hinges on which retrievers we use and how. Different retrievers offer distinct, often complementary signals: BM25 captures lexical matches; dense retrievers, semantic similarity. Yet in practice, we typically fix a single retriever based on heuristics, which fails to generalize across diverse information needs. Can we dynamically select and integrate multiple retrievers for each individual query, without the need for manual selection? In our work, we validate this intuition with quantitative analysis and introduce mixture of retrievers: a zero-shot, weighted combination of heterogeneous retrievers. Extensive experiments show that such mixtures are effective and efficient: Despite totaling just 0.8B parameters, this mixture outperforms every individual retriever and even larger 7B models by +10.8% and +3.9% on average, respectively. Further analysis also shows that this mixture framework can help incorporate specialized non-oracle human information sources as retrievers to achieve good collaboration, with a 58.9% relative performance improvement over simulated humans alone. △ Less

Submitted 18 June, 2025; originally announced June 2025.

Comments: 19 pages, 3 figures

arXiv:2506.11066 [pdf, ps, other]

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

Authors: Jiahui Geng, Fengyu Cai, Shaobo Cui, Qing Li, Liangwei Chen, Chenyang Lyu, Haonan Li, Derui Zhu, Walter Pretschner, Heinz Koeppl, Fakhri Karray

Abstract: Code retrieval is essential in modern software development, as it boosts code reuse and accelerates debugging. However, current benchmarks primarily emphasize functional relevance while neglecting critical dimensions of software quality. Motivated by this gap, we introduce CoQuIR, the first large-scale, multilingual benchmark specifically designed to evaluate quality-aware code retrieval across fo… ▽ More Code retrieval is essential in modern software development, as it boosts code reuse and accelerates debugging. However, current benchmarks primarily emphasize functional relevance while neglecting critical dimensions of software quality. Motivated by this gap, we introduce CoQuIR, the first large-scale, multilingual benchmark specifically designed to evaluate quality-aware code retrieval across four key dimensions: correctness, efficiency, security, and maintainability. CoQuIR provides fine-grained quality annotations for 42,725 queries and 134,907 code snippets in 11 programming languages, and is accompanied by two quality-centric evaluation metrics: Pairwise Preference Accuracy and Margin-based Ranking Score. Using CoQuIR, we benchmark 23 retrieval models, covering both open-source and proprietary systems, and find that even top-performing models frequently fail to distinguish buggy or insecure code from their more robust counterparts. Furthermore, we conduct preliminary investigations into training methods that explicitly encourage retrievers to recognize code quality. Using synthetic datasets, we demonstrate promising improvements in quality-aware metrics across various models, without sacrificing semantic relevance. Downstream code generation experiments further validate the effectiveness of our approach. Overall, our work highlights the importance of integrating quality signals into code retrieval systems, laying the groundwork for more trustworthy and robust software development tools. △ Less

Submitted 27 August, 2025; v1 submitted 31 May, 2025; originally announced June 2025.

arXiv:2506.10035 [pdf, ps, other]

FastFLUX: Pruning FLUX with Block-wise Replacement and Sandwich Training

Authors: Fuhan Cai, Yong Guo, Jie Li, Wenbo Li, Jian Chen, Xiangzhong Fang

Abstract: Recent advancements in text-to-image (T2I) generation have led to the emergence of highly expressive models such as diffusion transformers (DiTs), exemplified by FLUX. However, their massive parameter sizes lead to slow inference, high memory usage, and poor deployability. Existing acceleration methods (e.g., single-step distillation and attention pruning) often suffer from significant performance… ▽ More Recent advancements in text-to-image (T2I) generation have led to the emergence of highly expressive models such as diffusion transformers (DiTs), exemplified by FLUX. However, their massive parameter sizes lead to slow inference, high memory usage, and poor deployability. Existing acceleration methods (e.g., single-step distillation and attention pruning) often suffer from significant performance degradation and incur substantial training costs. To address these limitations, we propose FastFLUX, an architecture-level pruning framework designed to enhance the inference efficiency of FLUX. At its core is the Block-wise Replacement with Linear Layers (BRLL) method, which replaces structurally complex residual branches in ResBlocks with lightweight linear layers while preserving the original shortcut connections for stability. Furthermore, we introduce Sandwich Training (ST), a localized fine-tuning strategy that leverages LoRA to supervise neighboring blocks, mitigating performance drops caused by structural replacement. Experiments show that our FastFLUX maintains high image quality under both qualitative and quantitative evaluations, while significantly improving inference speed, even with 20\% of the hierarchy pruned. Our code will be available soon. △ Less

Submitted 13 January, 2026; v1 submitted 10 June, 2025; originally announced June 2025.

Comments: 14 pages

arXiv:2506.03747 [pdf, other]

Fast Non-Line-of-Sight Transient Data Simulation and an Open Benchmark Dataset

Authors: Yingjie Shi, Jinye Miao, Taotao Qin, Fuyao Cai, Yi Wei, Lingfeng Liu, Tongyao Li, Chenyang Wu, Huan Liang, Yuyang Yin, Lianfa Bai, Enlai Guo, Jing Han

Abstract: Non-Line-of-Sight (NLOS) imaging reconstructs the shape and depth of hidden objects from picosecond-resolved transient signals, offering potential applications in autonomous driving, security, and medical diagnostics. However, current NLOS experiments rely on expensive hardware and complex system alignment, limiting their scalability. This manuscript presents a simplified simulation method that ge… ▽ More Non-Line-of-Sight (NLOS) imaging reconstructs the shape and depth of hidden objects from picosecond-resolved transient signals, offering potential applications in autonomous driving, security, and medical diagnostics. However, current NLOS experiments rely on expensive hardware and complex system alignment, limiting their scalability. This manuscript presents a simplified simulation method that generates NLOS transient data by modeling light-intensity transport rather than performing conventional path tracing, significantly enhancing computational efficiency. All scene elements, including the relay surface, hidden target, stand-off distance, detector time resolution, and acquisition window are fully parameterized, allowing for rapid configuration of test scenarios. Reconstructions based on the simulated data accurately recover hidden geometries, validating the effectiveness of the approach. The proposed tool reduces the entry barrier for NLOS research and supports the optimization of system design. △ Less

Submitted 4 June, 2025; originally announced June 2025.

arXiv:2505.15054 [pdf, ps, other]

MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation

Authors: Feiyang Cai, Jiahui Bai, Tao Tang, Guijuan He, Joshua Luo, Tianyu Zhu, Srikanth Pilla, Gang Li, Ling Liu, Feng Luo

Abstract: Precise recognition, editing, and generation of molecules are essential prerequisites for both chemists and AI systems tackling various chemical tasks. We present MolLangBench, a comprehensive benchmark designed to evaluate fundamental molecule-language interface tasks: language-prompted molecular structure recognition, editing, and generation. To ensure high-quality, unambiguous, and deterministi… ▽ More Precise recognition, editing, and generation of molecules are essential prerequisites for both chemists and AI systems tackling various chemical tasks. We present MolLangBench, a comprehensive benchmark designed to evaluate fundamental molecule-language interface tasks: language-prompted molecular structure recognition, editing, and generation. To ensure high-quality, unambiguous, and deterministic outputs, we construct the recognition tasks using automated cheminformatics tools, and curate editing and generation tasks through rigorous expert annotation and validation. MolLangBench supports the evaluation of models that interface language with different molecular representations, including linear strings, molecular images, and molecular graphs. Evaluations of state-of-the-art models reveal significant limitations: the strongest model (GPT-5) achieves $86.2\%$ and $85.5\%$ accuracy on recognition and editing tasks, which are intuitively simple for humans, and performs even worse on the generation task, reaching only $43.0\%$ accuracy. These results highlight the shortcomings of current AI systems in handling even preliminary molecular recognition and manipulation tasks. We hope MolLangBench will catalyze further research toward more effective and reliable AI systems for chemical applications.The dataset and code can be accessed at https://huggingface.co/datasets/ChemFM/MolLangBench and https://github.com/TheLuoFengLab/MolLangBench, respectively. △ Less

Submitted 23 March, 2026; v1 submitted 20 May, 2025; originally announced May 2025.

Comments: ICLR-2026 Camera-Ready version

arXiv:2504.18869 [pdf]

doi 10.1063/5.0265918

Micro-tip manipulated origami for robust twisted few-layer graphene

Authors: Ruo-Jue Zou, Long Deng, Si-Min Xue, Feng-Fei Cai, Ling-Hui Tong, Yang Zhang, Yuan Tian, Li Zhang, Lijie Zhang, Zhihui Qin, Long-Jing Yin

Abstract: Twisted few-layer graphene (tFLG) has emerged as an ideal model system for investigating novel strongly correlated and topological phenomena. However, the experimental construction of tFLG with high structural stability is still challenging. Here, we introduce a highly accessible method for fabricating robust tFLG by polymer micro-tip manipulated origami. Through using a self-prepared polymer micr… ▽ More Twisted few-layer graphene (tFLG) has emerged as an ideal model system for investigating novel strongly correlated and topological phenomena. However, the experimental construction of tFLG with high structural stability is still challenging. Here, we introduce a highly accessible method for fabricating robust tFLG by polymer micro-tip manipulated origami. Through using a self-prepared polymer micro-tip, which is composed of multiple dimethylpolysiloxane, poly(vinyl chloride), and graphite sheets, to fold graphene layers, we fabricated tFLG with different twist angles (0°-30°) and various layers, including twisted bilayers (1+1), twisted double-bilayers (2+2), twisted double-trilayers (3+3), and thicker layers. Even ABC-stacked tFLG were created, such as twisted ABC/ABC and ABC/ABA graphene coexisting in an ABC-ABA domain wall region. We found that the origami-fabricated tFLG exhibits high stability against thermal and mechanical perturbations including heating and transferring, which could be attributed to its special folding and tearing structures. Moreover, based on the rich types of samples, we revealed twist-angle and stacking-order dependent Raman characteristics of tFLG, which is valuable for understanding the stacking-modulated phonon spectroscopy. Our experiments provide a simple and efficient approach to construct structurally robust tFLG, paving the way for the study of highly stable twisted van der Waals heterostructures. △ Less

Submitted 26 April, 2025; originally announced April 2025.

Comments: 17 pages, 5 figures

Journal ref: Appl. Phys. Lett. 126, 163105 (2025) Featured Article & AIP Scilight

arXiv:2503.14530

SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders

Authors: Qing Li, Jiahui Geng, Derui Zhu, Fengyu Cai, Chenyang Lyu, Fakhri Karray

Abstract: Unlearning methods for vision-language models (VLMs) have primarily adapted techniques from large language models (LLMs), relying on weight updates that demand extensive annotated forget sets. Moreover, these methods perform unlearning at a coarse granularity, often leading to excessive forgetting and reduced model utility. To address this issue, we introduce SAUCE, a novel method that leverages s… ▽ More Unlearning methods for vision-language models (VLMs) have primarily adapted techniques from large language models (LLMs), relying on weight updates that demand extensive annotated forget sets. Moreover, these methods perform unlearning at a coarse granularity, often leading to excessive forgetting and reduced model utility. To address this issue, we introduce SAUCE, a novel method that leverages sparse autoencoders (SAEs) for fine-grained and selective concept unlearning in VLMs. Briefly, SAUCE first trains SAEs to capture high-dimensional, semantically rich sparse features. It then identifies the features most relevant to the target concept for unlearning. During inference, it selectively modifies these features to suppress specific concepts while preserving unrelated information. We evaluate SAUCE on two distinct VLMs, LLaVA-v1.5-7B and LLaMA-3.2-11B-Vision-Instruct, across two types of tasks: concrete concept unlearning (objects and sports scenes) and abstract concept unlearning (emotions, colors, and materials), encompassing a total of 60 concepts. Extensive experiments demonstrate that SAUCE outperforms state-of-the-art methods by 18.04% in unlearning quality while maintaining comparable model utility. Furthermore, we investigate SAUCE's robustness against widely used adversarial attacks, its transferability across models, and its scalability in handling multiple simultaneous unlearning requests. Our findings establish SAUCE as an effective and scalable solution for selective concept unlearning in VLMs. △ Less

Submitted 20 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

Comments: More comparative experiments are needed

arXiv:2503.11279 [pdf, ps, other]

Revisiting $B_{c}^-\to J/ψ(η_c) L^-$ decays within the SM and beyond in QCD factorization

Authors: Wei-Jun Deng, Fang-Min Cai, Xin-Qiang Li, Yan Shi, Ya-Dong Yang

Abstract: Motivated by the deviations observed between the data and the SM predictions of $\mathcal{B}(\bar{B}_s^0\to D_s^+ π^-)$ and $\mathcal{B}(\bar{B}_d^0\to D^+ K^-)$, we revisit the $B_{c}^{-}\to J/ψ(η_{c}) L^{-}$ decays, with $L=π, K^{(*)}, ρ$, both within the SM and beyond. Since these processes are also mediated by $b\to c \bar{u} d(s)$ transitions and hence dominated by the colour-allowed tree top… ▽ More Motivated by the deviations observed between the data and the SM predictions of $\mathcal{B}(\bar{B}_s^0\to D_s^+ π^-)$ and $\mathcal{B}(\bar{B}_d^0\to D^+ K^-)$, we revisit the $B_{c}^{-}\to J/ψ(η_{c}) L^{-}$ decays, with $L=π, K^{(*)}, ρ$, both within the SM and beyond. Since these processes are also mediated by $b\to c \bar{u} d(s)$ transitions and hence dominated by the colour-allowed tree topology, the QCD factorization (QCDF) is expected to hold in the heavy-quark limit. Firstly, we update the SM predictions of these decays by including the nonfactorizable vertex corrections up to the NNLO in $α_s$. It is found that, relative to the LO results, the branching ratios of these decays up to the NLO and NNLO corrections are always enhanced, with a relative amount given by $δ_{\text{NLO}} = (\mathcal{B}^\text{NLO}-\mathcal{B}^\text{LO})/\mathcal{B}^\text{LO} \approx +6\%$ and $δ_{\text{NNLO}} = (\mathcal{B}^\text{NNLO}-\mathcal{B}^\text{LO})/\mathcal{B}^\text{LO} \approx +9\%$, respectively. To minimize the uncertainties brought by $V_{cb}$ and the transition form factors, we construct the ratios $R_{J/ψ(η_{c}) L}$, $R_{(s)L}^{(\ast)}$, and $R_{π/μν_μ}$, which are then used to constrain the model-independent new physics (NP) Wilson coefficients. After considering the latest Belle data and the updated $B_{(s)}\to D_{(s)}^{(*)}$ form factors, we find that the deviations can still be explained by the NP four-quark operators with $(1+γ_{5}) \otimes (1-γ_{5})$ and $(1+γ_{5}) \otimes (1+γ_{5})$ structures, while the solution with $γ^μ(1+γ_{5}) \otimes γ_μ(1-γ_{5})$ structure does not work anymore, under the combined constraints from $R_{(s)L}^{(\ast)}$ at the $2σ$ level. Furthermore, the ratio $R_{π/μν_μ}$, once measured precisely, could provide complementary constraint. △ Less

Submitted 12 August, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

Comments: 31 pages, 9 figures and 6 tables; final version published in the journal

arXiv:2503.01854 [pdf, ps, other]

A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models

Authors: Jiahui Geng, Qing Li, Herbert Woisetschlaeger, Zongxiong Chen, Fengyu Cai, Yuxia Wang, Preslav Nakov, Hans-Arno Jacobsen, Fakhri Karray

Abstract: This study investigates the machine unlearning techniques within the context of large language models (LLMs), referred to as \textit{LLM unlearning}. LLM unlearning offers a principled approach to removing the influence of undesirable data (e.g., sensitive or illegal information) from LLMs, while preserving their overall utility without requiring full retraining. Despite growing research interest,… ▽ More This study investigates the machine unlearning techniques within the context of large language models (LLMs), referred to as \textit{LLM unlearning}. LLM unlearning offers a principled approach to removing the influence of undesirable data (e.g., sensitive or illegal information) from LLMs, while preserving their overall utility without requiring full retraining. Despite growing research interest, there is no comprehensive survey that systematically organizes existing work and distills key insights; here, we aim to bridge this gap. We begin by introducing the definition and the paradigms of LLM unlearning, followed by a comprehensive taxonomy of existing unlearning studies. Next, we categorize current unlearning approaches, summarizing their strengths and limitations. Additionally, we review evaluation metrics and benchmarks, providing a structured overview of current assessment methodologies. Finally, we outline promising directions for future research, highlighting key challenges and opportunities in the field. △ Less

Submitted 31 May, 2025; v1 submitted 22 February, 2025; originally announced March 2025.

arXiv:2410.21422 [pdf, ps, other]

ChemFM as a Scaling Law Guided Foundation Model Pre-trained on Informative Chemicals

Authors: Feiyang Cai, Katelin Zacour, Tianyu Zhu, Tzuen-Rong Tzeng, Yongping Duan, Ling Liu, Srikanth Pilla, Gang Li, Feng Luo

Abstract: Traditional AI methods often rely on task-specific model designs and training, which constrain both the scalability of model size and generalization across different tasks. Here, we introduce ChemFM, a large foundation model specifically developed for chemicals. By conducting a series of scaling experiments, we identify UniChem as the informative molecular database for pre-training the foundation… ▽ More Traditional AI methods often rely on task-specific model designs and training, which constrain both the scalability of model size and generalization across different tasks. Here, we introduce ChemFM, a large foundation model specifically developed for chemicals. By conducting a series of scaling experiments, we identify UniChem as the informative molecular database for pre-training the foundation model. ChemFM comprises 3 billion parameters and is pre-trained on 178 million molecules using self-supervised causal language modeling to extract generalizable molecular representations. This model can be adapted to diverse downstream chemical applications using either full-parameter or parameter-efficient fine-tuning methods. ChemFM consistently outperforms state-of-the-art task-specific AI models across all tested tasks. Notably, it achieves up to 67.48% performance improvement across 34 property prediction benchmarks, up to 33.80% reduction in mean average deviation between conditioned and actual properties of generated molecules in conditional molecular generation tasks, and up to 3.7% top-1 accuracy improvement across 4 reaction prediction datasets. Moreover, ChemFM demonstrates its superior performance in predicting antibiotic activity and cytotoxicity, highlighting its potential to advance the discovery of novel antibiotics. Furthermore, we demonstrate that, as a foundation model, ChemFM exhibits strong data efficiency, requiring significantly fewer labeled training samples to achieve state-of-the-art performance. We anticipate that ChemFM will significantly advance chemistry research by providing a foundation model capable of effectively generalizing across a broad range of tasks with minimal additional training. △ Less

Submitted 5 November, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

arXiv:2410.05691 [pdf, ps, other]

Exotic hybrid pseudopotentials at finite temperature and chemical potential

Authors: Le Zhang, Fei-Yang Cai, Xun Chen

Abstract: Using gauge/gravity duality, we study the exotic hybrid pseudopotentials at finite temperature and chemical potential. The $Σ$ hybrid meson can be described by a model including an object called ``defect'' on a string linking the quark and antiquark. It was first proposed by Andreev and perfectly described the $Σ_u^-$ hybrid potential at zero temperature and chemical potential. In this paper, we w… ▽ More Using gauge/gravity duality, we study the exotic hybrid pseudopotentials at finite temperature and chemical potential. The $Σ$ hybrid meson can be described by a model including an object called ``defect'' on a string linking the quark and antiquark. It was first proposed by Andreev and perfectly described the $Σ_u^-$ hybrid potential at zero temperature and chemical potential. In this paper, we would like to extend this model to finite chemical potential and compare the separate distance and pseudopotentials of $Σ_g^+ $ and $Σ_u^-$. Unlike $Σ_g^+$ ground state, the $Σ_u^-$ hybrid pseudopotentials no longer behave as Coulomb-like at short distances. In addition, temperature and chemical potential have a significant impact on the $Σ_u^-$ hybrid pseudopotentials. The screen distances and hybrid pseudopotentials of $Σ_u^-$ significantly decrease with the increase of temperature and chemical potential. At last, we draw the melting diagram of $Σ_g^+ $ and $Σ_u^-$ in the $T- μ$ plane, and confirm that the quark-antiquark pair in $Σ_u^-$ excited state is easier to melt than in $Σ_g^+$ ground state. △ Less

Submitted 8 October, 2024; originally announced October 2024.

arXiv:2409.04179 [pdf, other]

CP asymmetries of $t \to c γ$ and $t \to cg$ decays in the aligned two-Higgs-doublet model

Authors: Fang-Min Cai, Rui-Lin Fan, Xin-Qiang Li, Ya-Dong Yang

Abstract: We study the CP asymmetries of the rare top-quark decays $t \to c γ$ and $t \to cg$ in the aligned two-Higgs-doublet model (A2HDM), which is generically characterized by new sources of CP violation beyond the Standard Model (SM). Specifically, the branching ratios and CP asymmetries of these rare top-quark decays are explicitly formulated, with an emphasis on the origins of weak and strong phases… ▽ More We study the CP asymmetries of the rare top-quark decays $t \to c γ$ and $t \to cg$ in the aligned two-Higgs-doublet model (A2HDM), which is generically characterized by new sources of CP violation beyond the Standard Model (SM). Specifically, the branching ratios and CP asymmetries of these rare top-quark decays are explicitly formulated, with an emphasis on the origins of weak and strong phases in the A2HDM. Taking into account the most relevant constraints on this model, we evaluate the variations of these observables with respect to the model parameters. It is found that the branching ratios of $t \to c γ$ and $t \to cg$ decays can maximally reach up to $1.47\times10^{-10}$ and $4.86\times10^{-9}$ respectively, which are about four and three orders of magnitude higher than the corresponding SM predictions. While the branching ratios are almost independent of the relative phase $\varphi$ between the two alignment parameters $ς_u$ and $ς_d$ within the allowed parameter space, the CP asymmetries are found to be very sensitive to $\varphi$. When the two alignment parameters are complex with a non-zero $\varphi$ varied within the range $[50^\circ,150^\circ]$, the magnitudes of the CP asymmetries can be significantly enhanced relative to both the SM and the real case. In particular, the maximum absolute values of the CP asymmetries can even reach up to $\mathcal{O}(1)$ for these two decay modes, in the range $\varphi \in [70^\circ,100^\circ]$. These interesting observations could be utilized to discriminate the SM and the different scenarios of the A2HDM. △ Less

Submitted 14 March, 2025; v1 submitted 6 September, 2024; originally announced September 2024.

Comments: 46 pages, 9 figures, and 3 tables; figure presentations changed and more references added, final version published in the journal

arXiv:2409.02390 [pdf, other]

Neural Dynamics Model of Visual Decision-Making: Learning from Human Experts

Authors: Jie Su, Fang Cai, Shu-Kuo Zhao, Xin-Yi Wang, Tian-Yi Qian, Da-Hui Wang, Bo Hong

Abstract: Uncovering the fundamental neural correlates of biological intelligence, developing mathematical models, and conducting computational simulations are critical for advancing new paradigms in artificial intelligence (AI). In this study, we implemented a comprehensive visual decision-making model that spans from visual input to behavioral output, using a neural dynamics modeling approach. Drawing ins… ▽ More Uncovering the fundamental neural correlates of biological intelligence, developing mathematical models, and conducting computational simulations are critical for advancing new paradigms in artificial intelligence (AI). In this study, we implemented a comprehensive visual decision-making model that spans from visual input to behavioral output, using a neural dynamics modeling approach. Drawing inspiration from the key components of the dorsal visual pathway in primates, our model not only aligns closely with human behavior but also reflects neural activities in primates, and achieving accuracy comparable to convolutional neural networks (CNNs). Moreover, magnetic resonance imaging (MRI) identified key neuroimaging features such as structural connections and functional connectivity that are associated with performance in perceptual decision-making tasks. A neuroimaging-informed fine-tuning approach was introduced and applied to the model, leading to performance improvements that paralleled the behavioral variations observed among subjects. Compared to classical deep learning models, our model more accurately replicates the behavioral performance of biological intelligence, relying on the structural characteristics of biological neural networks rather than extensive training data, and demonstrating enhanced resilience to perturbation. △ Less

Submitted 3 September, 2024; originally announced September 2024.

arXiv:2408.13711 [pdf, other]

SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting

Authors: Wenrui Li, Fucheng Cai, Yapeng Mi, Zhe Yang, Wangmeng Zuo, Xingtao Wang, Xiaopeng Fan

Abstract: Text-driven 3D scene generation has seen significant advancements recently. However, most existing methods generate single-view images using generative models and then stitch them together in 3D space. This independent generation for each view often results in spatial inconsistency and implausibility in the 3D scenes. To address this challenge, we proposed a novel text-driven 3D-consistent scene g… ▽ More Text-driven 3D scene generation has seen significant advancements recently. However, most existing methods generate single-view images using generative models and then stitch them together in 3D space. This independent generation for each view often results in spatial inconsistency and implausibility in the 3D scenes. To address this challenge, we proposed a novel text-driven 3D-consistent scene generation model: SceneDreamer360. Our proposed method leverages a text-driven panoramic image generation model as a prior for 3D scene generation and employs 3D Gaussian Splatting (3DGS) to ensure consistency across multi-view panoramic images. Specifically, SceneDreamer360 enhances the fine-tuned Panfusion generator with a three-stage panoramic enhancement, enabling the generation of high-resolution, detail-rich panoramic images. During the 3D scene construction, a novel point cloud fusion initialization method is used, producing higher quality and spatially consistent point clouds. Our extensive experiments demonstrate that compared to other methods, SceneDreamer360 with its panoramic image generation and 3DGS can produce higher quality, spatially consistent, and visually appealing 3D scenes from any text prompt. Our codes are available at \url{https://github.com/liwrui/SceneDreamer360}. △ Less

Submitted 13 October, 2024; v1 submitted 24 August, 2024; originally announced August 2024.

arXiv:2407.12512 [pdf, other]

$\textit{GeoHard}$: Towards Measuring Class-wise Hardness through Modelling Class Semantics

Authors: Fengyu Cai, Xinran Zhao, Hongming Zhang, Iryna Gurevych, Heinz Koeppl

Abstract: Recent advances in measuring hardness-wise properties of data guide language models in sample selection within low-resource scenarios. However, class-specific properties are overlooked for task setup and learning. How will these properties influence model learning and is it generalizable across datasets? To answer this question, this work formally initiates the concept of… ▽ More Recent advances in measuring hardness-wise properties of data guide language models in sample selection within low-resource scenarios. However, class-specific properties are overlooked for task setup and learning. How will these properties influence model learning and is it generalizable across datasets? To answer this question, this work formally initiates the concept of $\textit{class-wise hardness}$. Experiments across eight natural language understanding (NLU) datasets demonstrate a consistent hardness distribution across learning paradigms, models, and human judgment. Subsequent experiments unveil a notable challenge in measuring such class-wise hardness with instance-level metrics in previous works. To address this, we propose $\textit{GeoHard}$ for class-wise hardness measurement by modeling class geometry in the semantic embedding space. $\textit{GeoHard}$ surpasses instance-level metrics by over 59 percent on $\textit{Pearson}$'s correlation on measuring class-wise hardness. Our analysis theoretically and empirically underscores the generality of $\textit{GeoHard}$ as a fresh perspective on data diagnosis. Additionally, we showcase how understanding class-wise hardness can practically aid in improving task learning. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: Findings of ACL 2024

arXiv:2407.10691 [pdf, other]

$\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity

Authors: Fengyu Cai, Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Iryna Gurevych, Heinz Koeppl

Abstract: Recent studies show the growing significance of document retrieval in the generation of LLMs, i.e., RAG, within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domain-specific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, thi… ▽ More Recent studies show the growing significance of document retrieval in the generation of LLMs, i.e., RAG, within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domain-specific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, this paper introduces $\texttt{MixGR}$, which improves dense retrievers' awareness of query-document matching across various levels of granularity in queries and documents using a zero-shot approach. $\texttt{MixGR}$ fuses various metrics based on these granularities to a united score that reflects a comprehensive query-document similarity. Our experiments demonstrate that $\texttt{MixGR}$ outperforms previous document retrieval by 24.7%, 9.8%, and 6.9% on nDCG@5 with unsupervised, supervised, and LLM-based retrievers, respectively, averaged on queries containing multiple subqueries from five scientific retrieval datasets. Moreover, the efficacy of two downstream scientific question-answering tasks highlights the advantage of $\texttt{MixGR}$ to boost the application of LLMs in the scientific domain. The code and experimental datasets are available. △ Less

Submitted 1 November, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

Comments: EMNLP 2024 Main Conference

arXiv:2406.14474 [pdf, ps, other]

Spatio-temporal Patterns between ENSO and Weather-related Power Outages in the Continental United States

Authors: Long Huo, Xin Chen, Kaiwen Li, Fengying Cai, Jürgen Kurths

Abstract: El Niño-Southern Oscillation (ENSO) exhibits significant impacts on the frequency of extreme weather events and its socio-economic implications prevail on a global scale. However, a fundamental gap still exists in understanding the relationship between the ENSO and weather-related power outages in the continental United States. Through 24-year (2000-2023) composite and statistical analysis, our st… ▽ More El Niño-Southern Oscillation (ENSO) exhibits significant impacts on the frequency of extreme weather events and its socio-economic implications prevail on a global scale. However, a fundamental gap still exists in understanding the relationship between the ENSO and weather-related power outages in the continental United States. Through 24-year (2000-2023) composite and statistical analysis, our study reveals that higher power outage numbers (PONs) are observed from the developing winter to the decaying summer of La Niña phases. In particular, during the decaying spring, high La Niña intensity favors the occurrences of power outage over the west coast and east of the United States, by modulating the frequency of extreme precipitations and heatwaves. Furthermore, projected increasing heatwaves from the Coupled Model Intercomparison Project Phase 6 (CMIP6) indicate that spring-time PONs over the eastern United States occur about 11 times higher for the mid-term future (2041-2060) and almost 26 times higher for the long-term future (2081-2100), compared with 2000-2023. Our study provides a strong recommendation for building a more climate-resilient power system. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14079 [pdf]

doi 10.1016/j.matdes.2024.113530

Nano-Patterned Pt-Based Metallic Glass Electrocatalysts with In-Situ Copper Oxide Foam for Enhanced Hydrogen Evolution

Authors: Fei-Fan Cai, Baran Sarac, Adnan Akman, Juan J. Londoño, Selin Gümrükcü, Lukas Schweiger, Martin Hantusch, Jan Schroers, Andreas Blatter, Annett Gebert, Florian Spieckermann, Jürgen Eckert

Abstract: Hydrogen is a promising energy carrier for replacing fossil fuels, and hydrogen production via hydrogen evolution reaction (HER) is an environmentally friendly option if electrocatalysts with low overpotentials and long-term stability are used. In this work, the electrocatalytic performance of $\mathrm{Pt_{57.5}Cu_{14.7}Ni_{5.3}P_{22.5}}$ bulk metallic glass (BMG) with flat, micro-patterned, and n… ▽ More Hydrogen is a promising energy carrier for replacing fossil fuels, and hydrogen production via hydrogen evolution reaction (HER) is an environmentally friendly option if electrocatalysts with low overpotentials and long-term stability are used. In this work, the electrocatalytic performance of $\mathrm{Pt_{57.5}Cu_{14.7}Ni_{5.3}P_{22.5}}$ bulk metallic glass (BMG) with flat, micro-patterned, and nano-patterned surfaces for HER in 0.5 M H2SO4 is studied. The nano-patterned Pt-BMG demonstrates outstanding long-term stability and self-improving behavior with a final overpotential of 150 mV and a Tafel slope of 42 $\mathrm{mV dec^{-1}}$ after 1000 linear sweep voltammetry (LSV) cycles, which is respectively 42% and 37% lower than in the first LSV cycle. X-ray photoelectron spectroscopy (XPS) and Auger electron spectroscopy (AES) indicate the formation of a layer of CuO/Cu2O foam deposited on top of the nano-patterned surface during the stability test of 1000 LSV cycles. A three-step process is proposed to explain the formation of CuxO foam via dynamic hydrogen bubble templating (DHBT) electrodeposition from Cu dissolution of the Pt-BMG without using copper salt. This work provides a method to create CuxO foams that could be used for various applications. Moreover, nano-patterned BMGs with DHBT deposition offer a feasible strategy to synthesize metal or metal-oxide foams. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 28 pages, 9 figures (including supplementary information)

Journal ref: Materials & Design Volume 249, January 2025, 113530

arXiv:2405.18554 [pdf, other]

Scalable Surrogate Verification of Image-based Neural Network Control Systems using Composition and Unrolling

Authors: Feiyang Cai, Chuchu Fan, Stanley Bak

Abstract: Verifying safety of neural network control systems that use images as input is a difficult problem because, from a given system state, there is no known way to mathematically model what images are possible in the real-world. We build on recent work that considers a surrogate verification approach, training a conditional generative adversarial network (cGAN) as an image generator in place of the re… ▽ More Verifying safety of neural network control systems that use images as input is a difficult problem because, from a given system state, there is no known way to mathematically model what images are possible in the real-world. We build on recent work that considers a surrogate verification approach, training a conditional generative adversarial network (cGAN) as an image generator in place of the real world. This enables set-based formal analysis of the closed-loop system, providing analysis beyond simulation and testing. While existing work is effective on small examples, excessive overapproximation both within a single control period and across multiple control periods limits its scalability. We propose approaches to overcome these two sources of error. First, we overcome one-step error by composing the system's dynamics along with the cGAN and neural network controller, without losing the dependencies between input states and the control outputs as in the monotonic analysis of the system dynamics. Second, we reduce multi-step error by repeating the single-step composition, essentially unrolling multiple steps of the control loop into a large neural network. We then leverage existing network verification tools to compute accurate reachable sets for multiple steps, avoiding the accumulation of abstraction error at each step. We demonstrate the effectiveness of our approach in terms of both accuracy and scalability using two case studies: an autonomous aircraft taxiing system and an advanced emergency braking system. On the aircraft taxiing system, the converged reachable set is 175% larger using the prior baseline method compared with our proposed approach. On the emergency braking system, with 24x the number of image output variables from the cGAN, the baseline method fails to prove any states are safe, whereas our improvements enable set-based safety analysis. △ Less

Submitted 28 April, 2025; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted by AAAI-25

arXiv:2405.16127 [pdf, other]

Finetuning Large Language Model for Personalized Ranking

Authors: Zhuoxi Bai, Ning Wu, Fengyu Cai, Xinyi Zhu, Yun Xiong

Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various domains, motivating researchers to investigate their potential use in recommendation systems. However, directly applying LLMs to recommendation tasks has proven challenging due to the significant disparity between the data used for pre-training LLMs and the specific requirements of recommendation tasks. In this st… ▽ More Large Language Models (LLMs) have demonstrated remarkable performance across various domains, motivating researchers to investigate their potential use in recommendation systems. However, directly applying LLMs to recommendation tasks has proven challenging due to the significant disparity between the data used for pre-training LLMs and the specific requirements of recommendation tasks. In this study, we introduce Direct Multi-Preference Optimization (DMPO), a streamlined framework designed to bridge the gap and enhance the alignment of LLMs for recommendation tasks. DMPO enhances the performance of LLM-based recommenders by simultaneously maximizing the probability of positive samples and minimizing the probability of multiple negative samples. We conducted experimental evaluations to compare DMPO against traditional recommendation methods and other LLM-based recommendation approaches. The results demonstrate that DMPO significantly improves the recommendation capabilities of LLMs across three real-world public datasets in few-shot scenarios. Additionally, the experiments indicate that DMPO exhibits superior generalization ability in cross-domain recommendations. A case study elucidates the reasons behind these consistent improvements and also underscores DMPO's potential as an explainable recommendation system. △ Less

Submitted 20 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

arXiv:2403.09062 [pdf]

TBI Image/Text (TBI-IT): Comprehensive Text and Image Datasets for Traumatic Brain Injury Research

Authors: Jie Li, Jiaying Wen, Tongxin Yang, Fenglin Cai, Miao Wei, Zhiwei Zhang, Li Jiang

Abstract: In this paper, we introduce a new dataset in the medical field of Traumatic Brain Injury (TBI), called TBI-IT, which includes both electronic medical records (EMRs) and head CT images. This dataset is designed to enhance the accuracy of artificial intelligence in the diagnosis and treatment of TBI. This dataset, built upon the foundation of standard text and image data, incorporates specific annot… ▽ More In this paper, we introduce a new dataset in the medical field of Traumatic Brain Injury (TBI), called TBI-IT, which includes both electronic medical records (EMRs) and head CT images. This dataset is designed to enhance the accuracy of artificial intelligence in the diagnosis and treatment of TBI. This dataset, built upon the foundation of standard text and image data, incorporates specific annotations within the EMRs, extracting key content from the text information, and categorizes the annotation content of imaging data into five types: brain midline, hematoma, left cerebral ventricle, right cerebral ventricle and fracture. TBI-IT aims to be a foundational dataset for feature learning in image segmentation tasks and named entity recognition. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2401.15934

arXiv:2403.07225 [pdf, other]

Stereo-NEC: Enhancing Stereo Visual-Inertial SLAM Initialization with Normal Epipolar Constraints

Authors: Weihan Wang, Chieh Chou, Ganesh Sevagamoorthy, Kevin Chen, Zheng Chen, Ziyue Feng, Youjie Xia, Feiyang Cai, Yi Xu, Philippos Mordohai

Abstract: We propose an accurate and robust initialization approach for stereo visual-inertial SLAM systems. Unlike the current state-of-the-art method, which heavily relies on the accuracy of a pure visual SLAM system to estimate inertial variables without updating camera poses, potentially compromising accuracy and robustness, our approach offers a different solution. We realize the crucial impact of prec… ▽ More We propose an accurate and robust initialization approach for stereo visual-inertial SLAM systems. Unlike the current state-of-the-art method, which heavily relies on the accuracy of a pure visual SLAM system to estimate inertial variables without updating camera poses, potentially compromising accuracy and robustness, our approach offers a different solution. We realize the crucial impact of precise gyroscope bias estimation on rotation accuracy. This, in turn, affects trajectory accuracy due to the accumulation of translation errors. To address this, we first independently estimate the gyroscope bias and use it to formulate a maximum a posteriori problem for further refinement. After this refinement, we proceed to update the rotation estimation by performing IMU integration with gyroscope bias removed from gyroscope measurements. We then leverage robust and accurate rotation estimates to enhance translation estimation via 3-DoF bundle adjustment. Moreover, we introduce a novel approach for determining the success of the initialization by evaluating the residual of the normal epipolar constraint. Extensive evaluations on the EuRoC dataset illustrate that our method excels in accuracy and robustness. It outperforms ORB-SLAM3, the current leading stereo visual-inertial initialization method, in terms of absolute trajectory error and relative rotation error, while maintaining competitive computational speed. Notably, even with 5 keyframes for initialization, our method consistently surpasses the state-of-the-art approach using 10 keyframes in rotation accuracy. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2401.15934 [pdf, other]

HICH Image/Text (HICH-IT): Comprehensive Text and Image Datasets for Hypertensive Intracerebral Hemorrhage Research

Authors: Jie Li, Yulong Xia, Tongxin Yang, Fenglin Cai, Miao Wei, Zhiwei Zhang, Li Jiang

Abstract: In this paper, we introduce a new dataset in the medical field of hypertensive intracerebral hemorrhage (HICH), called HICH-IT, which includes both electronic medical records (EMRs) and head CT images. This dataset is designed to enhance the accuracy of artificial intelligence in the diagnosis and treatment of HICH. This dataset, built upon the foundation of standard text and image data, incorpora… ▽ More In this paper, we introduce a new dataset in the medical field of hypertensive intracerebral hemorrhage (HICH), called HICH-IT, which includes both electronic medical records (EMRs) and head CT images. This dataset is designed to enhance the accuracy of artificial intelligence in the diagnosis and treatment of HICH. This dataset, built upon the foundation of standard text and image data, incorporates specific annotations within the EMRs, extracting key content from the text information, and categorizes the annotation content of imaging data into four types: brain midline, hematoma, left and right cerebral ventricle. HICH-IT aims to be a foundational dataset for feature learning in image segmentation tasks and named entity recognition. To further understand the dataset, we have trained deep learning algorithms to observe the performance. The pretrained models have been released at both www.daip.club and github.com/Deep-AI-Application-DAIP. The dataset has been uploaded to https://github.com/CYBUS123456/HICH-IT-Datasets. Index Terms-HICH, Deep learning, Intraparenchymal hemorrhage, named entity recognition, novel dataset △ Less

Submitted 5 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2312.15879 [pdf, ps, other]

Optimal estimates for mappings admitting general Poisson representations in the unit ball

Authors: Deguang Zhong, Fangming Cai, Dongping Wei

Abstract: Suppose that $1<p\leq\infty$ and $\varphi\in L^{p}(\mathbb{B}^{n},\mathbb{R}^{n}).$ In this note, we use Hölder inequality and some basic properties of hypergeometric functions to establish the sharp constant $C_{p}$ and function $C_{p}(x)$ in the following inequalities $$|u(x)|\leq \frac{C_{p}}{(1-|x|^{2})^{(n-1)/p}}\cdot||\varphi||_{L^{p}}$$ and… ▽ More Suppose that $1<p\leq\infty$ and $\varphi\in L^{p}(\mathbb{B}^{n},\mathbb{R}^{n}).$ In this note, we use Hölder inequality and some basic properties of hypergeometric functions to establish the sharp constant $C_{p}$ and function $C_{p}(x)$ in the following inequalities $$|u(x)|\leq \frac{C_{p}}{(1-|x|^{2})^{(n-1)/p}}\cdot||\varphi||_{L^{p}}$$ and $$|u(x)|\leq \frac{C_{p}(x)}{(1-|x|^{2})^{(n-1)/p}}\cdot||\varphi||_{L^{p}},$$ where $u$ are those mapping from the unit ball $\mathbb{B}^{n}$ into $\mathbb{R}^{n}$ admitting general Poisson representations. The obtained results generalize and extend some known results from harmonic mappings (\cite[Proposition 6.16]{ABR92} and \cite[Theorems 1.1 and 1.2]{DM12}) and hyperbolic harmonic mappings (\cite[Theorems 1.1 and 1.2]{CJLK20}). △ Less

Submitted 4 October, 2025; v1 submitted 25 December, 2023; originally announced December 2023.

Comments: The extremum function $\varphi_ {0} (\ eta)$ given in arXiv: 2312.15879 is incorrect. In this new version, we have rephrased the main theorems and obtained the correct expression for the extreme value function $\varphi_{0}(η)=\left(0,0,\ldots,\left[\frac{(1-|x|^{2})^{β-\frac{n-1}{q}}}{|x-η|^β}\right]^{q/p}\right)$

MSC Class: 31B05; 31B10; 42B30

arXiv:2312.07036 [pdf, other]

doi 10.1145/3616855.3635848

Debiasing Sequential Recommenders through Distributionally Robust Optimization over System Exposure

Authors: Jiyuan Yang, Yue Ding, Yidan Wang, Pengjie Ren, Zhumin Chen, Fei Cai, Jun Ma, Rui Zhang, Zhaochun Ren, Xin Xin

Abstract: Sequential recommendation (SR) models are typically trained on user-item interactions which are affected by the system exposure bias, leading to the user preference learned from the biased SR model not being fully consistent with the true user preference. Exposure bias refers to the fact that user interactions are dependent upon the partial items exposed to the user. Existing debiasing methods do… ▽ More Sequential recommendation (SR) models are typically trained on user-item interactions which are affected by the system exposure bias, leading to the user preference learned from the biased SR model not being fully consistent with the true user preference. Exposure bias refers to the fact that user interactions are dependent upon the partial items exposed to the user. Existing debiasing methods do not make full use of the system exposure data and suffer from sub-optimal recommendation performance and high variance. In this paper, we propose to debias sequential recommenders through Distributionally Robust Optimization (DRO) over system exposure data. The key idea is to utilize DRO to optimize the worst-case error over an uncertainty set to safeguard the model against distributional discrepancy caused by the exposure bias. The main challenge to apply DRO for exposure debiasing in SR lies in how to construct the uncertainty set and avoid the overestimation of user preference on biased samples. Moreover, how to evaluate the debiasing effect on biased test set is also an open question. To this end, we first introduce an exposure simulator trained upon the system exposure data to calculate the exposure distribution, which is then regarded as the nominal distribution to construct the uncertainty set of DRO. Then, we introduce a penalty to items with high exposure probability to avoid the overestimation of user preference for biased samples. Finally, we design a debiased self-normalized inverse propensity score (SNIPS) evaluator for evaluating the debiasing effect on the biased offline test set. We conduct extensive experiments on two real-world datasets to verify the effectiveness of the proposed methods. Experimental results demonstrate the superior exposure debiasing performance of proposed methods. Codes and data are available at \url{https://github.com/nancheng58/DebiasedSR_DRO}. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: Accept by WSDM 2024

arXiv:2311.12626 [pdf]

Acoustic Vortex in Waveguide with Chiral Gradient Sawtooth Metasurface

Authors: Zeliang Song, Shuhuan Xie, Yong Li, Hua Ding, Feiyan Cai, Yugui Peng, Xuefeng Zhu, Degang Zhao

Abstract: The acoustic vortex states with spiral phase dislocation that can carry orbital angular moment (OAM) have aroused many research interests in recent years. The mainstream methods of generating acoustic vortex are based on Huygens-Fresnel principle to modulate the wavefront to create spatial spiral phase dislocation. In this work, we propose an entirely new scenario to generate acoustic vortex in a… ▽ More The acoustic vortex states with spiral phase dislocation that can carry orbital angular moment (OAM) have aroused many research interests in recent years. The mainstream methods of generating acoustic vortex are based on Huygens-Fresnel principle to modulate the wavefront to create spatial spiral phase dislocation. In this work, we propose an entirely new scenario to generate acoustic vortex in a waveguide with chiral gradient sawtooth metasurface. The physical mechanism of our method is to lift the degenerate dipole eigenmodes through the scattering effect of the chiral surface structure, and then the superposition of them will generate both and order vortices in place. Compared to the existing methods of acoustic vortex production, our design has many merits, such as easy to manufacture and control, the working frequency is broadband, sign of vortex order can be readily flipped. Both the full-wave simulations and experimental measurements validate the existence of the acoustic vortices. The torque effect of the acoustic vortices is also successfully performed by rotating a foam disk as a practical application. Our work opens up a new route for generating acoustic vortex and could have potential significances in microfluidics, acoustic tweezers and ultrasonic communication, etc. △ Less

Submitted 14 January, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.08298 [pdf, other]

A Survey of Confidence Estimation and Calibration in Large Language Models

Authors: Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, Iryna Gurevych

Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks in various domains. Despite their impressive performance, they can be unreliable due to factual errors in their generations. Assessing their confidence and calibrating them across different tasks can help mitigate risks and enable LLMs to produce better generations. There has been a lot of recent re… ▽ More Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks in various domains. Despite their impressive performance, they can be unreliable due to factual errors in their generations. Assessing their confidence and calibrating them across different tasks can help mitigate risks and enable LLMs to produce better generations. There has been a lot of recent research aiming to address this, but there has been no comprehensive overview to organize it and outline the main lessons learned. The present survey aims to bridge this gap. In particular, we outline the challenges and we summarize recent technical advancements for LLM confidence estimation and calibration. We further discuss their applications and suggest promising directions for future work. △ Less

Submitted 25 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: 16 pages, 1 page, 1 table

arXiv:2309.02254 [pdf]

Seeing the Unheard: dynamics of thin liquid film in holographic ultrasonic field revealed by time-resolved Schlieren imaging

Authors: Weitao Sun, Diyao Wang, Yuheng Yang, Fangyu Cai, Mingchen Gao, Sirui Guo

Abstract: In this study, we introduce a unique approach that employs time-resolved Schlieren imaging to capture and visualize the dynamic changes of a thin liquid (mixture of water, soap and glycerin) film in ultrasonic wave field with high spatial and temporal resolution. By placing a soap film spanning a wire frame vertically in the path of light, we harnessed the vibrations induced by the ultrasonic wave… ▽ More In this study, we introduce a unique approach that employs time-resolved Schlieren imaging to capture and visualize the dynamic changes of a thin liquid (mixture of water, soap and glycerin) film in ultrasonic wave field with high spatial and temporal resolution. By placing a soap film spanning a wire frame vertically in the path of light, we harnessed the vibrations induced by the ultrasonic waves, resulting in remarkable Schlieren imaging patterns. The investigation not only uncovers an unexpected branch flow phenomenon within the film, challenging existing assumptions, but also reveals a fascinating interplay between vortex flow and branch flow. The experiments have revealed a captivating spectrum of dynamic phenomena within the thin liquid films. The observation of small-scale capillary waves, large-scale standing waves, traveling waves, and the intricate fusion of capillary-gravity wave patterns underscores the rich complexity inherent in the interaction between the films and the holographic ultrasonic wave field. These diverse states of film dynamics provide a comprehensive understanding of the intricate interplay between various wave modes and fluid behavior, further enhancing comprehension of this fascinating phenomenon. The ability to visualize the pressure field opens up new avenues for optimizing acoustic levitation techniques, investigating particle behavior, and exploring potential applications in materials science and bioengineering. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: 10 pages, 8 pages

arXiv:2307.15526 [pdf]

doi 10.1063/5.0171247

Enhanced boiling heat transfer using conducting-insulating microcavity surfaces in an electric field: A lattice Boltzmann study

Authors: Fanming Cai, Zhaomiao Liu, Nan Zheng, Yan Pang

Abstract: The field trap effect on the microcavity surface under the action of an electric field is not conducive to boiling heat transfer. This numerical study found that using conducting-insulating microcavity surfaces in an electric field removes the field trap effect, increasing the critical heat flux by more than 200%. Bubble behavior and heat transfer mechanisms on heating surfaces were further explor… ▽ More The field trap effect on the microcavity surface under the action of an electric field is not conducive to boiling heat transfer. This numerical study found that using conducting-insulating microcavity surfaces in an electric field removes the field trap effect, increasing the critical heat flux by more than 200%. Bubble behavior and heat transfer mechanisms on heating surfaces were further explored. The results show that a large electrical force can be generated at the junction of the conducting and insulating surfaces under the action of the electric field, which drives the bubbles in the cavity to departure quickly from the heating surface and avoids the formation of a vapor block. As the electric field intensity increases, the contact line produces pinning, which facilitates the formation of multiple continuously open vapor-liquid separation paths on the heating surface, resulting in a significant enhancement of the boiling heat transfer performance. Finally, a modified correlation equation is proposed to predict the critical heat flux under non-uniform electric field. △ Less

Submitted 17 October, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

Journal ref: Physics of Fluids 35, 107126 (2023)

arXiv:2307.08021 [pdf, ps, other]

Local weighted topological pressure

Authors: Fangzhou Cai

Abstract: In [D. Feng, W. Huang, Variational principle for weighted topological pressure. J. Math. Pures Appl. (2016)], the authors studied weighted topological pressure and established a variational principle for it. In this paper, we introduce the notion of local weighted topological pressure and generalize Feng and Huang's main results to localized version. In [D. Feng, W. Huang, Variational principle for weighted topological pressure. J. Math. Pures Appl. (2016)], the authors studied weighted topological pressure and established a variational principle for it. In this paper, we introduce the notion of local weighted topological pressure and generalize Feng and Huang's main results to localized version. △ Less

Submitted 16 July, 2023; originally announced July 2023.

arXiv:2307.05087 [pdf, other]

SAR-NeRF: Neural Radiance Fields for Synthetic Aperture Radar Multi-View Representation

Authors: Zhengxin Lei, Feng Xu, Jiangtao Wei, Feng Cai, Feng Wang, Ya-Qiu Jin

Abstract: SAR images are highly sensitive to observation configurations, and they exhibit significant variations across different viewing angles, making it challenging to represent and learn their anisotropic features. As a result, deep learning methods often generalize poorly across different view angles. Inspired by the concept of neural radiance fields (NeRF), this study combines SAR imaging mechanisms w… ▽ More SAR images are highly sensitive to observation configurations, and they exhibit significant variations across different viewing angles, making it challenging to represent and learn their anisotropic features. As a result, deep learning methods often generalize poorly across different view angles. Inspired by the concept of neural radiance fields (NeRF), this study combines SAR imaging mechanisms with neural networks to propose a novel NeRF model for SAR image generation. Following the mapping and projection pinciples, a set of SAR images is modeled implicitly as a function of attenuation coefficients and scattering intensities in the 3D imaging space through a differentiable rendering equation. SAR-NeRF is then constructed to learn the distribution of attenuation coefficients and scattering intensities of voxels, where the vectorized form of 3D voxel SAR rendering equation and the sampling relationship between the 3D space voxels and the 2D view ray grids are analytically derived. Through quantitative experiments on various datasets, we thoroughly assess the multi-view representation and generalization capabilities of SAR-NeRF. Additionally, it is found that SAR-NeRF augumented dataset can significantly improve SAR target classification performance under few-shot learning setup, where a 10-type classification accuracy of 91.6\% can be achieved by using only 12 images per class. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2306.15961 [pdf, other]

Disentangled Variational Auto-encoder Enhanced by Counterfactual Data for Debiasing Recommendation

Authors: Yupu Guo, Fei Cai, Xin Zhanga, Jianming Zhenga, Honghui Chena

Abstract: Recommender system always suffers from various recommendation biases, seriously hindering its development. In this light, a series of debias methods have been proposed in the recommender system, especially for two most common biases, i.e., popularity bias and amplified subjective bias. However, exsisting debias methods usually concentrate on correcting a single bias. Such single-functionality debi… ▽ More Recommender system always suffers from various recommendation biases, seriously hindering its development. In this light, a series of debias methods have been proposed in the recommender system, especially for two most common biases, i.e., popularity bias and amplified subjective bias. However, exsisting debias methods usually concentrate on correcting a single bias. Such single-functionality debiases neglect the bias-coupling issue in which the recommended items are collectively attributed to multiple biases. Besides, previous work cannot tackle the lacking supervised signals brought by sparse data, yet which has become a commonplace in the recommender system. In this work, we introduce a disentangled debias variational auto-encoder framework(DB-VAE) to address the single-functionality issue as well as a counterfactual data enhancement method to mitigate the adverse effect due to the data sparsity. In specific, DB-VAE first extracts two types of extreme items only affected by a single bias based on the collier theory, which are respectively employed to learn the latent representation of corresponding biases, thereby realizing the bias decoupling. In this way, the exact unbiased user representation can be learned by these decoupled bias representations. Furthermore, the data generation module employs Pearl's framework to produce massive counterfactual data, making up the lacking supervised signals due to the sparse data. Extensive experiments on three real-world datasets demonstrate the effectiveness of our proposed model. Besides, the counterfactual data can further improve DB-VAE, especially on the dataset with low sparsity. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.14397 [pdf, other]

Discriminating Human-authored from ChatGPT-Generated Code Via Discernable Feature Analysis

Authors: Li Ke, Hong Sheng, Fu Cai, Zhang Yunhe, Liu Ming

Abstract: The ubiquitous adoption of Large Language Generation Models (LLMs) in programming has underscored the importance of differentiating between human-written code and code generated by intelligent models. This paper specifically aims to distinguish code generated by ChatGPT from that authored by humans. Our investigation reveals disparities in programming style, technical level, and readability betwee… ▽ More The ubiquitous adoption of Large Language Generation Models (LLMs) in programming has underscored the importance of differentiating between human-written code and code generated by intelligent models. This paper specifically aims to distinguish code generated by ChatGPT from that authored by humans. Our investigation reveals disparities in programming style, technical level, and readability between these two sources. Consequently, we develop a discriminative feature set for differentiation and evaluate its efficacy through ablation experiments. Additionally, we devise a dataset cleansing technique, which employs temporal and spatial segmentation, to mitigate the dearth of datasets and to secure high-caliber, uncontaminated datasets. To further enrich data resources, we employ "code transformation," "feature transformation," and "feature customization" techniques, generating an extensive dataset comprising 10,000 lines of ChatGPT-generated code. The salient contributions of our research include: proposing a discriminative feature set yielding high accuracy in differentiating ChatGPT-generated code from human-authored code in binary classification tasks; devising methods for generating extensive ChatGPT-generated codes; and introducing a dataset cleansing strategy that extracts immaculate, high-grade code datasets from open-source repositories, thus achieving exceptional accuracy in code authorship attribution tasks. △ Less

Submitted 4 July, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

Comments: 11 pages, 8 figures, 3 tables

arXiv:2305.09335 [pdf, other]

MsPrompt: Multi-step Prompt Learning for Debiasing Few-shot Event Detection

Authors: Siyuan Wang, Jianming Zheng, Xuejun Hu, Fei Cai, Chengyu Song, Xueshan Luo

Abstract: Event detection (ED) is aimed to identify the key trigger words in unstructured text and predict the event types accordingly. Traditional ED models are too data-hungry to accommodate real applications with scarce labeled data. Besides, typical ED models are facing the context-bypassing and disabled generalization issues caused by the trigger bias stemming from ED datasets. Therefore, we focus on t… ▽ More Event detection (ED) is aimed to identify the key trigger words in unstructured text and predict the event types accordingly. Traditional ED models are too data-hungry to accommodate real applications with scarce labeled data. Besides, typical ED models are facing the context-bypassing and disabled generalization issues caused by the trigger bias stemming from ED datasets. Therefore, we focus on the true few-shot paradigm to satisfy the low-resource scenarios. In particular, we propose a multi-step prompt learning model (MsPrompt) for debiasing few-shot event detection, that consists of the following three components: an under-sampling module targeting to construct a novel training set that accommodates the true few-shot setting, a multi-step prompt module equipped with a knowledge-enhanced ontology to leverage the event semantics and latent prior knowledge in the PLMs sufficiently for tackling the context-bypassing problem, and a prototypical module compensating for the weakness of classifying events with sparse data and boost the generalization performance. Experiments on two public datasets ACE-2005 and FewEvent show that MsPrompt can outperform the state-of-the-art models, especially in the strict low-resource scenarios reporting 11.43% improvement in terms of weighted F1-score against the best-performing baseline and achieving an outstanding debiasing performance. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2303.15790 [pdf, other]

doi 10.1007/s11467-023-1333-z

STCF Conceptual Design Report: Volume 1 -- Physics & Detector

Authors: M. Achasov, X. C. Ai, R. Aliberti, L. P. An, Q. An, X. Z. Bai, Y. Bai, O. Bakina, A. Barnyakov, V. Blinov, V. Bobrovnikov, D. Bodrov, A. Bogomyagkov, A. Bondar, I. Boyko, Z. H. Bu, F. M. Cai, H. Cai, J. J. Cao, Q. H. Cao, Z. Cao, Q. Chang, K. T. Chao, D. Y. Chen, H. Chen , et al. (413 additional authors not shown)

Abstract: The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII,… ▽ More The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII, providing a unique platform for exploring the asymmetry of matter-antimatter (charge-parity violation), in-depth studies of the internal structure of hadrons and the nature of non-perturbative strong interactions, as well as searching for exotic hadrons and physics beyond the Standard Model. The STCF project in China is under development with an extensive R\&D program. This document presents the physics opportunities at the STCF, describes conceptual designs of the STCF detector system, and discusses future plans for detector R\&D and physics case studies. △ Less

Submitted 5 October, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Journal ref: Front. Phys. 19(1), 14701 (2024)

arXiv:2303.11487 [pdf, ps, other]

doi 10.1016/j.jde.2022.02.019

On the properties of the mean orbital pseudo-metric

Authors: Fangzhou Cai, Dominik Kwietniak, Jian Li, Habibeh Pourmand

Abstract: Given a topological dynamical system $(X,T)$, we study properties of the mean orbital pseudo-metric $\bar E$ defined by \[ \bar E(x,y)= \limsup_{n\to\infty } \min_{σ\in S_n}\frac{1}{n}\sum_{k=0}^{n-1}d(T^k(x),T^{σ(k)}(y)), \] where $x,y\in X$ and $S_n$ is the permutation group of $\{0,1,\ldots,n-1\}$. Let $\hatω_T(x)$ denote the set of measures quasi-generated by a point $x\in X$. We show that the… ▽ More Given a topological dynamical system $(X,T)$, we study properties of the mean orbital pseudo-metric $\bar E$ defined by \[ \bar E(x,y)= \limsup_{n\to\infty } \min_{σ\in S_n}\frac{1}{n}\sum_{k=0}^{n-1}d(T^k(x),T^{σ(k)}(y)), \] where $x,y\in X$ and $S_n$ is the permutation group of $\{0,1,\ldots,n-1\}$. Let $\hatω_T(x)$ denote the set of measures quasi-generated by a point $x\in X$. We show that the map $x\mapsto\hatω_T(x)$ is uniformly continuous if $X$ is endowed with the pseudo-metric $\bar E$ and the space of compact subsets of the set of invariant measures is considered with the Hausdorff distance. We also obtain a new characterisation of $\bar E$-continuity, which connects it to other properties studied in the literature, like continuous pointwise ergodicity introduced by Downarowicz and Weiss. Finally, we apply our results to reprove some known results on $\bar E$-continuous and mean equicontinuous systems. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Journal ref: Journal of Differential Equations 318 (2022), 1-19

arXiv:2211.16092 [pdf, other]

Unsupervised Visual Defect Detection with Score-Based Generative Model

Authors: Yapeng Teng, Haoyang Li, Fuzhen Cai, Ming Shao, Siyu Xia

Abstract: Anomaly Detection (AD), as a critical problem, has been widely discussed. In this paper, we specialize in one specific problem, Visual Defect Detection (VDD), in many industrial applications. And in practice, defect image samples are very rare and difficult to collect. Thus, we focus on the unsupervised visual defect detection and localization tasks and propose a novel framework based on the recen… ▽ More Anomaly Detection (AD), as a critical problem, has been widely discussed. In this paper, we specialize in one specific problem, Visual Defect Detection (VDD), in many industrial applications. And in practice, defect image samples are very rare and difficult to collect. Thus, we focus on the unsupervised visual defect detection and localization tasks and propose a novel framework based on the recent score-based generative models, which synthesize the real image by iterative denoising through stochastic differential equations (SDEs). Our work is inspired by the fact that with noise injected into the original image, the defects may be changed into normal cases in the denoising process (i.e., reconstruction). First, based on the assumption that the anomalous data lie in the low probability density region of the normal data distribution, we explain a common phenomenon that occurs when reconstruction-based approaches are applied to VDD: normal pixels also change during the reconstruction process. Second, due to the differences in normal pixels between the reconstructed and original images, a time-dependent gradient value (i.e., score) of normal data distribution is utilized as a metric, rather than reconstruction loss, to gauge the defects. Third, a novel $T$ scales approach is developed to dramatically reduce the required number of iterations, accelerating the inference process. These practices allow our model to generalize VDD in an unsupervised manner while maintaining reasonably good performance. We evaluate our method on several datasets to demonstrate its effectiveness. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2204.12787 [pdf, other]

doi 10.1002/mma.9204

3-D generalized analytic signal associated with linear canonical transform in Clifford biquaternion domain

Authors: Zhen Feng Cai, Kit Ian Kou

Abstract: The analytic signal is a useful mathematical tool. It separates qualitative and quantitative information of a signal in form of the local phase and local amplitude. The Clifford Fourier transform (CFT) plays a vital role in the representation of multidimensional signals. By generalizing the CFT to the Clifford linear canonical transform (CLCT), we present a new type of Clifford biquaternionic anal… ▽ More The analytic signal is a useful mathematical tool. It separates qualitative and quantitative information of a signal in form of the local phase and local amplitude. The Clifford Fourier transform (CFT) plays a vital role in the representation of multidimensional signals. By generalizing the CFT to the Clifford linear canonical transform (CLCT), we present a new type of Clifford biquaternionic analytic signal. Due to the advantages of more freedom, the envelop detection problems of 3D images, with the help of this new analytic signal, can get a better visual appearance. Synthesis examples are presented to demonstrate these advantages. △ Less

Submitted 27 April, 2022; originally announced April 2022.

Comments: 19 pages and 5 figures

MSC Class: 45P05

arXiv:2203.08441 [pdf, other]

Open Set Recognition using Vision Transformer with an Additional Detection Head

Authors: Feiyang Cai, Zhenkai Zhang, Jie Liu, Xenofon Koutsoukos

Abstract: Deep neural networks have demonstrated prominent capacities for image classification tasks in a closed set setting, where the test data come from the same distribution as the training data. However, in a more realistic open set scenario, traditional classifiers with incomplete knowledge cannot tackle test data that are not from the training classes. Open set recognition (OSR) aims to address this… ▽ More Deep neural networks have demonstrated prominent capacities for image classification tasks in a closed set setting, where the test data come from the same distribution as the training data. However, in a more realistic open set scenario, traditional classifiers with incomplete knowledge cannot tackle test data that are not from the training classes. Open set recognition (OSR) aims to address this problem by both identifying unknown classes and distinguishing known classes simultaneously. In this paper, we propose a novel approach to OSR that is based on the vision transformer (ViT) technique. Specifically, our approach employs two separate training stages. First, a ViT model is trained to perform closed set classification. Then, an additional detection head is attached to the embedded features extracted by the ViT, trained to force the representations of known data to class-specific clusters compactly. Test examples are identified as known or unknown based on their distance to the cluster centers. To the best of our knowledge, this is the first time to leverage ViT for the purpose of OSR, and our extensive evaluation against several OSR benchmark datasets reveals that our approach significantly outperforms other baseline methods and obtains new state-of-the-art performance. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: under review

Showing 1–50 of 95 results for author: Cai, F