-
DeepGuard: Secure Code Generation via Multi-Layer Semantic Aggregation
Authors:
Li Huang,
Zhongxin Liu,
Yifan Wu,
Tao Yin,
Dong Li,
Jichao Bi,
Nankun Mu,
Hongyu Zhang,
Meng Yan
Abstract:
Large Language Models (LLMs) for code generation can replicate insecure patterns from their training data. To mitigate this, a common strategy for security hardening is to fine-tune models using supervision derived from the final transformer layer. However, this design may suffer from a final-layer bottleneck: vulnerability-discriminative cues can be distributed across layers and become less detec…
▽ More
Large Language Models (LLMs) for code generation can replicate insecure patterns from their training data. To mitigate this, a common strategy for security hardening is to fine-tune models using supervision derived from the final transformer layer. However, this design may suffer from a final-layer bottleneck: vulnerability-discriminative cues can be distributed across layers and become less detectable near the output representations optimized for next-token prediction. To diagnose this issue, we perform layer-wise linear probing. We observe that vulnerability-related signals are most detectable in a band of intermediate-to-upper layers yet attenuate toward the final layers. Motivated by this observation, we introduce DeepGuard, a framework that leverages distributed security-relevant cues by aggregating representations from multiple upper layers via an attention-based module. The aggregated signal powers a dedicated security analyzer within a multi-objective training objective that balances security enhancement and functional correctness, and further supports a lightweight inference-time steering strategy. Extensive experiments across five code LLMs demonstrate that DeepGuard improves the secure-and-correct generation rate by an average of 11.9% over strong baselines such as SVEN. It also preserves functional correctness while exhibiting generalization to held-out vulnerability types. Our code is public at https://github.com/unknownhl/DeepGuard.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
VIGOR: VIdeo Geometry-Oriented Reward for Temporal Generative Alignment
Authors:
Tengjiao Yin,
Jinglei Shi,
Heng Guo,
Xi Wang
Abstract:
Video diffusion models lack explicit geometric supervision during training, leading to inconsistency artifacts such as object deformation, spatial drift, and depth violations in generated videos. To address this limitation, we propose a geometry-based reward model that leverages pretrained geometric foundation models to evaluate multi-view consistency through cross-frame reprojection error. Unlike…
▽ More
Video diffusion models lack explicit geometric supervision during training, leading to inconsistency artifacts such as object deformation, spatial drift, and depth violations in generated videos. To address this limitation, we propose a geometry-based reward model that leverages pretrained geometric foundation models to evaluate multi-view consistency through cross-frame reprojection error. Unlike previous geometric metrics that measure inconsistency in pixel space, where pixel intensity may introduce additional noise, our approach conducts error computation in a pointwise fashion, yielding a more physically grounded and robust error metric. Furthermore, we introduce a geometry-aware sampling strategy that filters out low-texture and non-semantic regions, focusing evaluation on geometrically meaningful areas with reliable correspondences to improve robustness. We apply this reward model to align video diffusion models through two complementary pathways: post-training of a bidirectional model via SFT or Reinforcement Learning and inference-time optimization of a Causal Video Model (e.g., Streaming video generator) via test-time scaling with our reward as a path verifier. Experimental results validate the effectiveness of our design, demonstrating that our geometry-based reward provides superior robustness compared to other variants. By enabling efficient inference-time scaling, our method offers a practical solution for enhancing open-source video models without requiring extensive computational resources for retraining.
△ Less
Submitted 21 March, 2026; v1 submitted 17 March, 2026;
originally announced March 2026.
-
PlayWorld: Learning Robot World Models from Autonomous Play
Authors:
Tenny Yin,
Zhiting Mei,
Zhonghe Zheng,
Miyu Yamane,
David Wang,
Jade Sceats,
Samuel M. Bateman,
Lihan Zha,
Apurva Badithela,
Ola Shorinwa,
Anirudha Majumdar
Abstract:
Action-conditioned video models offer a promising path to building general-purpose robot simulators that can improve directly from data. Yet, despite training on large-scale robot datasets, current state-of-the-art video models still struggle to predict physically consistent robot-object interactions that are crucial in robotic manipulation. To close this gap, we present PlayWorld, a simple, scala…
▽ More
Action-conditioned video models offer a promising path to building general-purpose robot simulators that can improve directly from data. Yet, despite training on large-scale robot datasets, current state-of-the-art video models still struggle to predict physically consistent robot-object interactions that are crucial in robotic manipulation. To close this gap, we present PlayWorld, a simple, scalable, and fully autonomous pipeline for training high-fidelity video world simulators from interaction experience. In contrast to prior approaches that rely on success-biased human demonstrations, PlayWorld is the first system capable of learning entirely from unsupervised robot self-play, enabling naturally scalable data collection while capturing complex, long-tailed physical interactions essential for modeling realistic object dynamics. Experiments across diverse manipulation tasks show that PlayWorld generates high-quality, physically consistent predictions for contact-rich interactions that are not captured by world models trained on human-collected data. We further demonstrate the versatility of PlayWorld in enabling fine-grained failure prediction and policy evaluation, with up to 40% improvements over human-collected data. Finally, we demonstrate how PlayWorld enables reinforcement learning in the world model, improving policy performance by 65% in success rates when deployed in the real world.
△ Less
Submitted 5 April, 2026; v1 submitted 9 March, 2026;
originally announced March 2026.
-
LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer
Authors:
Lihan Zha,
Asher J. Hancock,
Mingtong Zhang,
Tenny Yin,
Yixuan Huang,
Dhruv Shah,
Allen Z. Ren,
Anirudha Majumdar
Abstract:
A long-standing goal in robotics is a generalist policy that can be deployed zero-shot on new robot embodiments without per-embodiment adaptation. Despite large-scale multi-embodiment pre-training, existing Vision-Language-Action models (VLAs) remain tightly coupled to their training embodiments and typically require costly fine-tuning. We introduce Language-Action Pre-training (LAP), a simple rec…
▽ More
A long-standing goal in robotics is a generalist policy that can be deployed zero-shot on new robot embodiments without per-embodiment adaptation. Despite large-scale multi-embodiment pre-training, existing Vision-Language-Action models (VLAs) remain tightly coupled to their training embodiments and typically require costly fine-tuning. We introduce Language-Action Pre-training (LAP), a simple recipe that represents low-level robot actions directly in natural language, aligning action supervision with the pre-trained vision-language model's input-output distribution. LAP requires no learned tokenizer, no costly annotation, and no embodiment-specific architectural design. Based on LAP, we present LAP-3B, which to the best of our knowledge is the first VLA to achieve substantial zero-shot transfer to previously unseen robot embodiments without any embodiment-specific fine-tuning. Across multiple novel robots and manipulation tasks, LAP-3B attains over 50% average zero-shot success, delivering roughly a 2x improvement over the strongest prior VLAs. We further show that LAP enables efficient adaptation and favorable scaling, while unifying action prediction and VQA in a shared language-action format that yields additional gains through co-training.
△ Less
Submitted 15 February, 2026; v1 submitted 11 February, 2026;
originally announced February 2026.
-
QARM V2: Quantitative Alignment Multi-Modal Recommendation for Reasoning User Sequence Modeling
Authors:
Tian Xia,
Jiaqi Zhang,
Yueyang Liu,
Hongjian Dou,
Tingya Yin,
Jiangxia Cao,
Xulei Liang,
Tianlu Xie,
Lihao Liu,
Xiang Chen,
Shen Wang,
Changxin Lao,
Haixiang Gan,
Jinkai Yu,
Keting Cen,
Lu Hao,
Xu Zhang,
Qiqiang Zhong,
Zhongbo Sun,
Yiyu Wang,
Shuang Yang,
Mingxin Wen,
Xiangyu Wu,
Shaoguo Liu,
Tingting Gao
, et al. (3 additional authors not shown)
Abstract:
With the evolution of large language models (LLMs), there is growing interest in leveraging their rich semantic understanding to enhance industrial recommendation systems (RecSys). Traditional RecSys relies on ID-based embeddings for user sequence modeling in the General Search Unit (GSU) and Exact Search Unit (ESU) paradigm, which suffers from low information density, knowledge isolation, and wea…
▽ More
With the evolution of large language models (LLMs), there is growing interest in leveraging their rich semantic understanding to enhance industrial recommendation systems (RecSys). Traditional RecSys relies on ID-based embeddings for user sequence modeling in the General Search Unit (GSU) and Exact Search Unit (ESU) paradigm, which suffers from low information density, knowledge isolation, and weak generalization ability. While LLMs offer complementary strengths with dense semantic representations and strong generalization, directly applying LLM embeddings to RecSys faces critical challenges: representation unmatch with business objectives and representation unlearning end-to-end with downstream tasks. In this paper, we present QARM V2, a unified framework that bridges LLM semantic understanding with RecSys business requirements for user sequence modeling.
△ Less
Submitted 9 February, 2026;
originally announced February 2026.
-
Mitigating Hallucination in Financial Retrieval-Augmented Generation via Fine-Grained Knowledge Verification
Authors:
Taoye Yin,
Haoyuan Hu,
Yaxin Fan,
Xinhao Chen,
Xinya Wu,
Kai Deng,
Kezun Zhang,
Feng Wang
Abstract:
In financial Retrieval-Augmented Generation (RAG) systems, models frequently rely on retrieved documents to generate accurate responses due to the time-sensitive nature of the financial domain. While retrieved documents help address knowledge gaps, model-generated responses still suffer from hallucinations that contradict the retrieved information. To mitigate this inconsistency, we propose a Rein…
▽ More
In financial Retrieval-Augmented Generation (RAG) systems, models frequently rely on retrieved documents to generate accurate responses due to the time-sensitive nature of the financial domain. While retrieved documents help address knowledge gaps, model-generated responses still suffer from hallucinations that contradict the retrieved information. To mitigate this inconsistency, we propose a Reinforcement Learning framework enhanced with Fine-grained Knowledge Verification (RLFKV). Our method decomposes financial responses into atomic knowledge units and assesses the correctness of each unit to compute the fine-grained faithful reward. This reward offers more precise optimization signals, thereby improving alignment with the retrieved documents. Additionally, to prevent reward hacking (e.g., overly concise replies), we incorporate an informativeness reward that encourages the policy model to retain at least as many knowledge units as the base model. Experiments conducted on the public Financial Data Description (FDD) task and our newly proposed FDD-ANT dataset demonstrate consistent improvements, confirming the effectiveness of our approach.
△ Less
Submitted 5 February, 2026;
originally announced February 2026.
-
Video Generation Models in Robotics -- Applications, Research Challenges, Future Directions
Authors:
Zhiting Mei,
Tenny Yin,
Ola Shorinwa,
Apurva Badithela,
Zhonghe Zheng,
Joseph Bruno,
Madison Bland,
Lihan Zha,
Asher Hancock,
Jaime Fernández Fisac,
Philip Dames,
Anirudha Majumdar
Abstract:
Video generation models have emerged as high-fidelity models of the physical world, capable of synthesizing high-quality videos capturing fine-grained interactions between agents and their environments conditioned on multi-modal user inputs. Their impressive capabilities address many of the long-standing challenges faced by physics-based simulators, driving broad adoption in many problem domains,…
▽ More
Video generation models have emerged as high-fidelity models of the physical world, capable of synthesizing high-quality videos capturing fine-grained interactions between agents and their environments conditioned on multi-modal user inputs. Their impressive capabilities address many of the long-standing challenges faced by physics-based simulators, driving broad adoption in many problem domains, e.g., robotics. For example, video models enable photorealistic, physically consistent deformable-body simulation without making prohibitive simplifying assumptions, which is a major bottleneck in physics-based simulation. Moreover, video models can serve as foundation world models that capture the dynamics of the world in a fine-grained and expressive way. They thus overcome the limited expressiveness of language-only abstractions in describing intricate physical interactions. In this survey, we provide a review of video models and their applications as embodied world models in robotics, encompassing cost-effective data generation and action prediction in imitation learning, dynamics and rewards modeling in reinforcement learning, visual planning, and policy evaluation. Further, we highlight important challenges hindering the trustworthy integration of video models in robotics, which include poor instruction following, hallucinations such as violations of physics, and unsafe content generation, in addition to fundamental limitations such as significant data curation, training, and inference costs. We present potential future directions to address these open research challenges to motivate research and ultimately facilitate broader applications, especially in safety-critical settings.
△ Less
Submitted 12 January, 2026;
originally announced January 2026.
-
Intention Chain-of-Thought Prompting with Dynamic Routing for Code Generation
Authors:
Shen Li,
Li Huang,
Shaoxiong Zhan,
Weifeng Sun,
Tao Yin,
Zhongxin Liu,
Meng Yan
Abstract:
Large language models (LLMs) exhibit strong generative capabilities and have shown great potential in code generation. Existing chain-of-thought (CoT) prompting methods enhance model reasoning by eliciting intermediate steps, but suffer from two major limitations: First, their uniform application tends to induce overthinking on simple tasks. Second, they lack intention abstraction in code generati…
▽ More
Large language models (LLMs) exhibit strong generative capabilities and have shown great potential in code generation. Existing chain-of-thought (CoT) prompting methods enhance model reasoning by eliciting intermediate steps, but suffer from two major limitations: First, their uniform application tends to induce overthinking on simple tasks. Second, they lack intention abstraction in code generation, such as explicitly modeling core algorithmic design and efficiency, leading models to focus on surface-level structures while neglecting the global problem objective. Inspired by the cognitive economy principle of engaging structured reasoning only when necessary to conserve cognitive resources, we propose RoutingGen, a novel difficulty-aware routing framework that dynamically adapts prompting strategies for code generation. For simple tasks, it adopts few-shot prompting; for more complex ones, it invokes a structured reasoning strategy, termed Intention Chain-of-Thought (ICoT), which we introduce to guide the model in capturing task intention, such as the core algorithmic logic and its time complexity. Experiments across three models and six standard code generation benchmarks show that RoutingGen achieves state-of-the-art performance in most settings, while reducing total token usage by 46.37% on average across settings. Furthermore, ICoT outperforms six existing prompting baselines on challenging benchmarks.
△ Less
Submitted 15 December, 2025;
originally announced December 2025.
-
Theory of Remaining Exceptional Points from Nongeneric Splitting in Non-Hermitian Systems
Authors:
Teng Yin,
Hao Zhang
Abstract:
In non-Hermitian physics, high-order exceptional points(HOEPs) with eigenvalues and eigenvectors coalesce are known for their enhanced sensitivity to perturbations. Typically, they exhibit eigenvalue splitting that scales as ε^(1/n), which is referred to as the generic response. However, under certain conditions, a nongeneric response of HOEPs occurs where the splitting follows a lower order ε^(1/…
▽ More
In non-Hermitian physics, high-order exceptional points(HOEPs) with eigenvalues and eigenvectors coalesce are known for their enhanced sensitivity to perturbations. Typically, they exhibit eigenvalue splitting that scales as ε^(1/n), which is referred to as the generic response. However, under certain conditions, a nongeneric response of HOEPs occurs where the splitting follows a lower order ε^(1/m) (m<n). A nongeneric response of HOEPs with a lower order splitting lead to the remaining EPs. While the presence of these remaining EPs is acknowledged, a thorough elucidation of their fundamental properties has yet to be achieved. In this work, we demonstrate those unsplit eigenvalue points must constitute remaining EPs in a perturbed n-orders HOEPs system. Combining graph theory and topological analysis, the number and splitting order of the remaining EPs is studied. This framework not only resolves a fundamental challenge in HOEPs but also paves the way for exploiting remaining EPs in applications such as anisotropic sensing and the design of Dirac exceptional points.
△ Less
Submitted 15 December, 2025;
originally announced December 2025.
-
World Models That Know When They Don't Know - Controllable Video Generation with Calibrated Uncertainty
Authors:
Zhiting Mei,
Tenny Yin,
Micah Baker,
Ola Shorinwa,
Anirudha Majumdar
Abstract:
Recent advances in generative video models have led to significant breakthroughs in high-fidelity video synthesis, specifically in controllable video generation where the generated video is conditioned on text and action inputs, e.g., in instruction-guided video editing and world modeling in robotics. Despite these exceptional capabilities, controllable video models often hallucinate - generating…
▽ More
Recent advances in generative video models have led to significant breakthroughs in high-fidelity video synthesis, specifically in controllable video generation where the generated video is conditioned on text and action inputs, e.g., in instruction-guided video editing and world modeling in robotics. Despite these exceptional capabilities, controllable video models often hallucinate - generating future video frames that are misaligned with physical reality - which raises serious concerns in many tasks such as robot policy evaluation and planning. However, state-of-the-art video models lack the ability to assess and express their confidence, impeding hallucination mitigation. To rigorously address this challenge, we propose C3, an uncertainty quantification (UQ) method for training continuous-scale calibrated controllable video models for dense confidence estimation at the subpatch level, precisely localizing the uncertainty in each generated video frame. Our UQ method introduces three core innovations to empower video models to estimate their uncertainty. First, our method develops a novel framework that trains video models for correctness and calibration via strictly proper scoring rules. Second, we estimate the video model's uncertainty in latent space, avoiding training instability and prohibitive training costs associated with pixel-space approaches. Third, we map the dense latent-space uncertainty to interpretable pixel-level uncertainty in the RGB space for intuitive visualization, providing high-resolution uncertainty heatmaps that identify untrustworthy regions. Through extensive experiments on large-scale robot learning datasets (Bridge and DROID) and real-world evaluations, we demonstrate that our method not only provides calibrated uncertainty estimates within the training distribution, but also enables effective out-of-distribution detection.
△ Less
Submitted 10 March, 2026; v1 submitted 5 December, 2025;
originally announced December 2025.
-
R-Tuning: Wavelet-Decomposed Replay and Semantic Alignment for Continual Adaptation of Pretrained Time-Series Models
Authors:
Tianyi Yin,
Jingwei Wang,
Chenze Wang,
Han Wang,
Jiexuan Cai,
Min Liu,
Yunlong Ma,
Kun Gao,
Yuting Song,
Weiming Shen
Abstract:
Pre-trained models have demonstrated exceptional generalization capabilities in time-series forecasting; however, adapting them to evolving data distributions remains a significant challenge. A key hurdle lies in accessing the original training data, as fine-tuning solely on new data often leads to catastrophic forgetting. To address this issue, we propose Replay Tuning (R-Tuning), a novel framewo…
▽ More
Pre-trained models have demonstrated exceptional generalization capabilities in time-series forecasting; however, adapting them to evolving data distributions remains a significant challenge. A key hurdle lies in accessing the original training data, as fine-tuning solely on new data often leads to catastrophic forgetting. To address this issue, we propose Replay Tuning (R-Tuning), a novel framework designed for the continual adaptation of pre-trained time-series models. R-Tuning constructs a unified latent space that captures both prior and current task knowledge through a frequency-aware replay strategy. Specifically, it augments model-generated samples via wavelet-based decomposition across multiple frequency bands, generating trend-preserving and fusion-enhanced variants to improve representation diversity and replay efficiency. To further reduce reliance on synthetic samples, R-Tuning introduces a latent consistency constraint that aligns new representations with the prior task space. This constraint guides joint optimization within a compact and semantically coherent latent space, ensuring robust knowledge retention and adaptation. Extensive experimental results demonstrate the superiority of R-Tuning, which reduces MAE and MSE by up to 46.9% and 46.8%, respectively, on new tasks, while preserving prior knowledge with gains of up to 5.7% and 6.0% on old tasks. Notably, under few-shot settings, R-Tuning outperforms all state-of-the-art baselines even when synthetic proxy samples account for only 5% of the new task dataset.
△ Less
Submitted 12 November, 2025;
originally announced November 2025.
-
MoRA: On-the-fly Molecule-aware Low-Rank Adaptation Framework for LLM-based Multi-Modal Molecular Assistant
Authors:
Tao Yin,
Xiaohong Zhang,
Jiacheng Zhang,
Li Huang,
Zhibin Zhang,
Yuansong Zeng,
Jin Xie,
Meng Yan
Abstract:
Effectively integrating molecular graph structures with Large Language Models (LLMs) is a key challenge in drug discovery. Most existing multi-modal alignment methods typically process these structures by fine-tuning the LLM or adding a static adapter simultaneously. However, these approaches have two main limitations: (1) it optimizes a shared parameter space across all molecular inputs, limiting…
▽ More
Effectively integrating molecular graph structures with Large Language Models (LLMs) is a key challenge in drug discovery. Most existing multi-modal alignment methods typically process these structures by fine-tuning the LLM or adding a static adapter simultaneously. However, these approaches have two main limitations: (1) it optimizes a shared parameter space across all molecular inputs, limiting the model's ability to capture instance-specific structural features; and (2) fine-tuning the LLM for molecular tasks can lead to catastrophic forgetting, undermining its general reasoning capabilities. In this paper, instead of static task-oriented adaptation, we propose an instance-specific parameter space alignment approach for each molecule on-the-fly. To this end, we introduce Molecule-aware Low-Rank Adaptation (MoRA) that produces a unique set of low-rank adaptation weights for each input molecular graph. These weights are then dynamically injected into a frozen LLM, allowing the model to adapt its reasoning to the structure of each molecular input, while preserving the LLM's core knowledge. Extensive experiments demonstrate that on key molecular tasks, such as chemical reaction prediction and molecular captioning, MoRA's instance-specific dynamic adaptation outperforms statically adapted baselines, including a 14.1% relative improvement in reaction prediction exact match and a 22% reduction in error for quantum property prediction. The code is available at https://github.com/jk-sounds/MoRA.
△ Less
Submitted 14 October, 2025;
originally announced October 2025.
-
ScatterAD: Temporal-Topological Scattering Mechanism for Time Series Anomaly Detection
Authors:
Tao Yin,
Xiaohong Zhang,
Shaochen Fu,
Zhibin Zhang,
Li Huang,
Yiyuan Yang,
Kaixiang Yang,
Meng Yan
Abstract:
One main challenge in time series anomaly detection for industrial IoT lies in the complex spatio-temporal couplings within multivariate data. However, traditional anomaly detection methods focus on modeling spatial or temporal dependencies independently, resulting in suboptimal representation learning and limited sensitivity to anomalous dispersion in high-dimensional spaces. In this work, we con…
▽ More
One main challenge in time series anomaly detection for industrial IoT lies in the complex spatio-temporal couplings within multivariate data. However, traditional anomaly detection methods focus on modeling spatial or temporal dependencies independently, resulting in suboptimal representation learning and limited sensitivity to anomalous dispersion in high-dimensional spaces. In this work, we conduct an empirical analysis showing that both normal and anomalous samples tend to scatter in high-dimensional space, especially anomalous samples are markedly more dispersed. We formalize this dispersion phenomenon as scattering, quantified by the mean pairwise distance among sample representations, and leverage it as an inductive signal to enhance spatio-temporal anomaly detection. Technically, we propose ScatterAD to model representation scattering across temporal and topological dimensions. ScatterAD incorporates a topological encoder for capturing graph-structured scattering and a temporal encoder for constraining over-scattering through mean squared error minimization between neighboring time steps. We introduce a contrastive fusion mechanism to ensure the complementarity of the learned temporal and topological representations. Additionally, we theoretically show that maximizing the conditional mutual information between temporal and topological views improves cross-view consistency and enhances more discriminative representations. Extensive experiments on multiple public benchmarks show that ScatterAD achieves state-of-the-art performance on multivariate time series anomaly detection. Code is available at this repository: https://github.com/jk-sounds/ScatterAD.
△ Less
Submitted 29 September, 2025;
originally announced September 2025.
-
PRISM: A Framework Harnessing Unsupervised Visual Representations and Textual Prompts for Explainable MACE Survival Prediction from Cardiac Cine MRI
Authors:
Haoyang Su,
Jin-Yi Xiang,
Shaohao Rui,
Yifan Gao,
Xingyu Chen,
Tingxuan Yin,
Shaoting Zhang,
Xiaosong Wang,
Lian-Ming Wu
Abstract:
Accurate prediction of major adverse cardiac events (MACE) remains a central challenge in cardiovascular prognosis. We present PRISM (Prompt-guided Representation Integration for Survival Modeling), a self-supervised framework that integrates visual representations from non-contrast cardiac cine magnetic resonance imaging with structured electronic health records (EHRs) for survival analysis. PRIS…
▽ More
Accurate prediction of major adverse cardiac events (MACE) remains a central challenge in cardiovascular prognosis. We present PRISM (Prompt-guided Representation Integration for Survival Modeling), a self-supervised framework that integrates visual representations from non-contrast cardiac cine magnetic resonance imaging with structured electronic health records (EHRs) for survival analysis. PRISM extracts temporally synchronized imaging features through motion-aware multi-view distillation and modulates them using medically informed textual prompts to enable fine-grained risk prediction. Across four independent clinical cohorts, PRISM consistently surpasses classical survival prediction models and state-of-the-art (SOTA) deep learning baselines under internal and external validation. Further clinical findings demonstrate that the combined imaging and EHR representations derived from PRISM provide valuable insights into cardiac risk across diverse cohorts. Three distinct imaging signatures associated with elevated MACE risk are uncovered, including lateral wall dyssynchrony, inferior wall hypersensitivity, and anterior elevated focus during diastole. Prompt-guided attribution further identifies hypertension, diabetes, and smoking as dominant contributors among clinical and physiological EHR factors.
△ Less
Submitted 29 January, 2026; v1 submitted 26 August, 2025;
originally announced August 2025.
-
On the direct and inverse electromagnetic scattering in a parallel-plate waveguide
Authors:
Jiawei Liang,
Maojun Li,
Tao Yin
Abstract:
This paper devotes to providing rigorous theoretical analysis of the wellposedness of the direct problem and the uniqueness of the inverse problem of electromagnetic scattering in a parallel-plate waveguide. The direct problem is reduced to an equivalent boundary value problem on a bounded domain by introducing an exact transparent boundary condition in terms of the electric-to-magnetic Calderón o…
▽ More
This paper devotes to providing rigorous theoretical analysis of the wellposedness of the direct problem and the uniqueness of the inverse problem of electromagnetic scattering in a parallel-plate waveguide. The direct problem is reduced to an equivalent boundary value problem on a bounded domain by introducing an exact transparent boundary condition in terms of the electric-to-magnetic Calderón operator which can be explicitly represented as a series expansion. Then the wellopsedness of the reduced problem in appropriate Sobolev spaces is proved via the variational approach provided by some necessary properties of the Calderón operator and Helmholtz decomposition. Relying on the Green's representation formula and a reciprocity relation, the probe method, finally, is utilized to show the uniqueness of the inverse obstacle problem.
△ Less
Submitted 19 July, 2025;
originally announced July 2025.
-
LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification
Authors:
Fengxiao Tang,
Huan Li,
Ming Zhao,
Zongzong Wu,
Shisong Peng,
Tao Yin
Abstract:
Verifying the credibility of Cyber Threat Intelligence (CTI) is essential for reliable cybersecurity defense. However, traditional approaches typically treat this task as a static classification problem, relying on handcrafted features or isolated deep learning models. These methods often lack the robustness needed to handle incomplete, heterogeneous, or noisy intelligence, and they provide limite…
▽ More
Verifying the credibility of Cyber Threat Intelligence (CTI) is essential for reliable cybersecurity defense. However, traditional approaches typically treat this task as a static classification problem, relying on handcrafted features or isolated deep learning models. These methods often lack the robustness needed to handle incomplete, heterogeneous, or noisy intelligence, and they provide limited transparency in decision-making-factors that reduce their effectiveness in real-world threat environments. To address these limitations, we propose LRCTI, a Large Language Model (LLM)-based framework designed for multi-step CTI credibility verification. The framework first employs a text summarization module to distill complex intelligence reports into concise and actionable threat claims. It then uses an adaptive multi-step evidence retrieval mechanism that iteratively identifies and refines supporting information from a CTI-specific corpus, guided by LLM feedback. Finally, a prompt-based Natural Language Inference (NLI) module is applied to evaluate the credibility of each claim while generating interpretable justifications for the classification outcome. Experiments conducted on two benchmark datasets, CTI-200 and PolitiFact show that LRCTI improves F1-Macro and F1-Micro scores by over 5%, reaching 90.9% and 93.6%, respectively, compared to state-of-the-art baselines. These results demonstrate that LRCTI effectively addresses the core limitations of prior methods, offering a scalable, accurate, and explainable solution for automated CTI credibility verification
△ Less
Submitted 15 July, 2025;
originally announced July 2025.
-
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Authors:
Gheorghe Comanici,
Eric Bieber,
Mike Schaekermann,
Ice Pasupat,
Noveen Sachdeva,
Inderjit Dhillon,
Marcel Blistein,
Ori Ram,
Dan Zhang,
Evan Rosen,
Luke Marris,
Sam Petulla,
Colin Gaffney,
Asaf Aharoni,
Nathan Lintz,
Tiago Cardal Pais,
Henrik Jacobsson,
Idan Szpektor,
Nan-Jiang Jiang,
Krishna Haridasan,
Ahmed Omran,
Nikunj Saunshi,
Dara Bahri,
Gaurav Mishra,
Eric Chu
, et al. (3410 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde…
▽ More
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.
△ Less
Submitted 19 December, 2025; v1 submitted 7 July, 2025;
originally announced July 2025.
-
Regularized boundary integral equation methods for open-arc scattering problems in thermoelasticity
Authors:
Yixuan X. Kong,
José Pinto,
Tao Yin
Abstract:
This paper devotes to developing novel boundary integral equation (BIE) solvers for the problem of thermoelastic scattering by open-arcs with four different boundary conditions in two dimensions. The proposed methodology is inspired by the Calderón formulas, whose eigenvalues are shown to accumulate at particular points depending only on Lamé parameters, satisfied by the thermoelastic boundary int…
▽ More
This paper devotes to developing novel boundary integral equation (BIE) solvers for the problem of thermoelastic scattering by open-arcs with four different boundary conditions in two dimensions. The proposed methodology is inspired by the Calderón formulas, whose eigenvalues are shown to accumulate at particular points depending only on Lamé parameters, satisfied by the thermoelastic boundary integral operators (BIOs) on both closed- and open-surfaces. Regularized BIEs in terms of weighted BIOs on open-arc that explicitly exhibits the edge singularity behavior, depending on the types of boundary conditions, of the unknown potentials are constructed to effectively reduce the required iteration number to solve the corresponding discretized linear systems. We implement the new formulations utilizing regularizations of singular integrals, which reduces the strongly- and hyper-singular integrals into weakly-singular integrals. Combined with spectrally accurate quadrature rules, numerical examples are presented to illustrate the accuracy and efficiency of the proposed solvers.
△ Less
Submitted 8 July, 2025;
originally announced July 2025.
-
Multi-patch/multiple-scattering frequency-time hybrid solver for interior and exterior wave equation problems
Authors:
Shuai Pan,
Gang Bao,
Tao Yin,
Oscar P. Bruno
Abstract:
This paper proposes a new multiple-scattering frequency-time hybrid (FTH-MS) integral equation solver for problems of wave scattering by obstacles in two dimensional space, including interior problems in closed cavities and problems exterior to a set of disconnected open or closed scattering obstacles. The multiple-scattering FTH-MS method is based on a partition of the domain boundary into a user…
▽ More
This paper proposes a new multiple-scattering frequency-time hybrid (FTH-MS) integral equation solver for problems of wave scattering by obstacles in two dimensional space, including interior problems in closed cavities and problems exterior to a set of disconnected open or closed scattering obstacles. The multiple-scattering FTH-MS method is based on a partition of the domain boundary into a user-prescribed set of overlapping open arcs, along with a corresponding sequence of multiple-scattering problems that effectively decompose the interior problem into a series of open-arc wave equation subproblems. The new strategy provides a significant extension of the original FTH-MS algorithm originally presented in [22], in that (1) By allowing for use of an arbitrary of number of component arcs, and not just two as in the previous contribution, the new approach affords (1a) A significantly increased geometric flexibility, as well as, (1b) The use of partitions for which each open arc leads to small numbers of iterations if iterative linear-algebra solvers are employed; and, (2) It facilitates parallelization -- as the subproblem solutions that are needed at each multiple scattering step can be evaluated in an embarrassingly parallel fashion. Utilizing a suitably-implemented Fourier transformation, each sub-problem is reduced to a Helmholtz frequency-domain problem that is tackled via a uniquely-solvable boundary integral equation. Similar FTH-MS methods are also presented for problems exterior to a number of bounded obstacles. All of the algorithms considered incorporate the previously introduced ``time-windowing and recentering'' methodology (that enables both treatment of incident signals of long duration and long time simulation), as well as a high-frequency Fourier transform algorithm that delivers numerically dispersionless, spectrally-accurate time evolution for arbitrarily long times.
△ Less
Submitted 8 July, 2025;
originally announced July 2025.
-
Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?
Authors:
Zhiting Mei,
Christina Zhang,
Tenny Yin,
Justin Lidard,
Ola Shorinwa,
Anirudha Majumdar
Abstract:
Reasoning language models have set state-of-the-art (SOTA) records on many challenging benchmarks, enabled by multi-step reasoning induced using reinforcement learning. However, like previous language models, reasoning models are prone to generating confident, plausible responses that are incorrect (hallucinations). Knowing when and how much to trust these models is critical to the safe deployment…
▽ More
Reasoning language models have set state-of-the-art (SOTA) records on many challenging benchmarks, enabled by multi-step reasoning induced using reinforcement learning. However, like previous language models, reasoning models are prone to generating confident, plausible responses that are incorrect (hallucinations). Knowing when and how much to trust these models is critical to the safe deployment of reasoning models in real-world applications. To this end, we explore uncertainty quantification of reasoning models in this work. Specifically, we ask three fundamental questions: First, are reasoning models well-calibrated? Second, does deeper reasoning improve model calibration? Finally, inspired by humans' innate ability to double-check their thought processes to verify the validity of their answers and their confidence, we ask: can reasoning models improve their calibration by explicitly reasoning about their chain-of-thought traces? We introduce introspective uncertainty quantification (UQ) to explore this direction. In extensive evaluations on SOTA reasoning models across a broad range of benchmarks, we find that reasoning models: (i) are typically overconfident, with self-verbalized confidence estimates often greater than 85% particularly for incorrect responses, (ii) become even more overconfident with deeper reasoning, and (iii) can become better calibrated through introspection (e.g., o3-Mini and DeepSeek R1) but not uniformly (e.g., Claude 3.7 Sonnet becomes more poorly calibrated). Lastly, we conclude with important research directions to design necessary UQ benchmarks and improve the calibration of reasoning models.
△ Less
Submitted 17 July, 2025; v1 submitted 22 June, 2025;
originally announced June 2025.
-
WoMAP: World Models For Embodied Open-Vocabulary Object Localization
Authors:
Tenny Yin,
Zhiting Mei,
Tao Sun,
Lihan Zha,
Emily Zhou,
Jeremy Bao,
Miyu Yamane,
Ola Shorinwa,
Anirudha Majumdar
Abstract:
Language-instructed active object localization is a critical challenge for robots, requiring efficient exploration of partially observable environments. However, state-of-the-art approaches either struggle to generalize beyond demonstration datasets (e.g., imitation learning methods) or fail to generate physically grounded actions (e.g., VLMs). To address these limitations, we introduce WoMAP (Wor…
▽ More
Language-instructed active object localization is a critical challenge for robots, requiring efficient exploration of partially observable environments. However, state-of-the-art approaches either struggle to generalize beyond demonstration datasets (e.g., imitation learning methods) or fail to generate physically grounded actions (e.g., VLMs). To address these limitations, we introduce WoMAP (World Models for Active Perception): a recipe for training open-vocabulary object localization policies that: (i) uses a Gaussian Splatting-based real-to-sim-to-real pipeline for scalable data generation without the need for expert demonstrations, (ii) distills dense rewards signals from open-vocabulary object detectors, and (iii) leverages a latent world model for dynamics and rewards prediction to ground high-level action proposals at inference time. Rigorous simulation and hardware experiments demonstrate WoMAP's superior performance in a broad range of zero-shot object localization tasks, with more than 9x and 2x higher success rates compared to VLM and diffusion policy baselines, respectively. Further, we show that WoMAP achieves strong generalization and sim-to-real transfer on a TidyBot.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Factorization method for near-field inverse scattering problems in elastodynamics
Authors:
Chun Liu,
Guanghui Hu,
Tao Yin,
Bo Zhang
Abstract:
Consider a time-harmonic elastic point source incident on a bounded obstacle which is embedded in an open space filled with a homogeneous and isotropic elastic medium. This paper is concerned with the inverse problem of recovering the location and shape of the obstacle from near-field data generated by infinitely many incident point source waves at a fixed energy. The incident point sources and th…
▽ More
Consider a time-harmonic elastic point source incident on a bounded obstacle which is embedded in an open space filled with a homogeneous and isotropic elastic medium. This paper is concerned with the inverse problem of recovering the location and shape of the obstacle from near-field data generated by infinitely many incident point source waves at a fixed energy. The incident point sources and the receivers for recording scattered signals are both located on a spherical closed surface, on which an outgoing-to-incoming operator is defined for facilitating the factorization of the near-field operator. Numerical examples in 2D are presented to show the validity and accuracy of the inversion algorithm.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Highly-accurate neutron star modeling in the Hartle-Thorne Approximation
Authors:
Carlos Conde-Ocazionez,
Tuojin Yin,
Jaquelyn Noronha-Hostler,
Nicolás Yunes
Abstract:
Future X-ray missions, such as NICER and LOFT, together with gravitational-wave observations from ground-based detectors, will provide new insights into neutron stars. Interpreting accurate observations in the future will require accurate models of their gravitational fields. In this first paper of a two-part series, we construct the perturbation equations for slowly-rotating, isolated, and unmagn…
▽ More
Future X-ray missions, such as NICER and LOFT, together with gravitational-wave observations from ground-based detectors, will provide new insights into neutron stars. Interpreting accurate observations in the future will require accurate models of their gravitational fields. In this first paper of a two-part series, we construct the perturbation equations for slowly-rotating, isolated, and unmagnetized neutron stars, extending the Hartle-Thorne approximation to seventh order in a slow-rotation expansion. We obtain exact, closed-form, analytical solutions for the exterior metric at each order in spin. From these solutions, we derive expressions for the mass and mass-current scalar multipole moments, $M_{\ell}$ and $S_{\ell}$, respectively, up to seventh order in spin frequency, using two distinct methods. This high-order expansion allows us to calculate second-, fourth-, and sixth-order relative spin corrections to the observed mass and moment of inertia; second- and fourth-order relative spin corrections to the quadrupole and octopole moments; second-order relative spin corrections to the hexadecapole and dotriacontapole moments; and leading-order-in-spin expressions for the hexacontatetrapole and hectoicosaoctapole moments. Going to seventh order in the spin-frequency approximation will enable very precise calculations of X-ray pulse profiles, as well as the I-Love-Q and three-hair relations for slowly-rotating neutron stars. These results will be valuable for breaking parameter degeneracies in future multimessenger observations.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Denoising Mutual Knowledge Distillation in Bi-Directional Multiple Instance Learning
Authors:
Chen Shu,
Boyu Fu,
Yiman Li,
Ting Yin,
Wenchuan Zhang,
Jie Chen,
Yuhao Yi,
Hong Bu
Abstract:
Multiple Instance Learning is the predominant method for Whole Slide Image classification in digital pathology, enabling the use of slide-level labels to supervise model training. Although MIL eliminates the tedious fine-grained annotation process for supervised learning, whether it can learn accurate bag- and instance-level classifiers remains a question. To address the issue, instance-level clas…
▽ More
Multiple Instance Learning is the predominant method for Whole Slide Image classification in digital pathology, enabling the use of slide-level labels to supervise model training. Although MIL eliminates the tedious fine-grained annotation process for supervised learning, whether it can learn accurate bag- and instance-level classifiers remains a question. To address the issue, instance-level classifiers and instance masks were incorporated to ground the prediction on supporting patches. These methods, while practically improving the performance of MIL methods, may potentially introduce noisy labels. We propose to bridge the gap between commonly used MIL and fully supervised learning by augmenting both the bag- and instance-level learning processes with pseudo-label correction capabilities elicited from weak to strong generalization techniques. The proposed algorithm improves the performance of dual-level MIL algorithms on both bag- and instance-level predictions. Experiments on public pathology datasets showcase the advantage of the proposed methods.
△ Less
Submitted 27 May, 2025; v1 submitted 17 May, 2025;
originally announced May 2025.
-
Out-of-Distribution Detection in Heterogeneous Graphs via Energy Propagation
Authors:
Tao Yin,
Chen Zhao,
Xiaoyan Liu,
Minglai Shao
Abstract:
Graph neural networks (GNNs) are proven effective in extracting complex node and structural information from graph data. While current GNNs perform well in node classification tasks within in-distribution (ID) settings, real-world scenarios often present distribution shifts, leading to the presence of out-of-distribution (OOD) nodes. OOD detection in graphs is a crucial and challenging task. Most…
▽ More
Graph neural networks (GNNs) are proven effective in extracting complex node and structural information from graph data. While current GNNs perform well in node classification tasks within in-distribution (ID) settings, real-world scenarios often present distribution shifts, leading to the presence of out-of-distribution (OOD) nodes. OOD detection in graphs is a crucial and challenging task. Most existing research focuses on homogeneous graphs, but real-world graphs are often heterogeneous, consisting of diverse node and edge types. This heterogeneity adds complexity and enriches the informational content. To the best of our knowledge, OOD detection in heterogeneous graphs remains an underexplored area. In this context, we propose a novel methodology for OOD detection in heterogeneous graphs (OODHG) that aims to achieve two main objectives: 1) detecting OOD nodes and 2) classifying all ID nodes based on the first task's results. Specifically, we learn representations for each node in the heterogeneous graph, calculate energy values to determine whether nodes are OOD, and then classify ID nodes. To leverage the structural information of heterogeneous graphs, we introduce a meta-path-based energy propagation mechanism and an energy constraint to enhance the distinction between ID and OOD nodes. Extensive experimental findings substantiate the simplicity and effectiveness of OODHG, demonstrating its superiority over baseline models in OOD detection tasks and its accuracy in ID node classification.
△ Less
Submitted 29 April, 2025;
originally announced May 2025.
-
BiasCause: Evaluate Socially Biased Causal Reasoning of Large Language Models
Authors:
Tian Xie,
Tongxin Yin,
Vaishakh Keshava,
Xueru Zhang,
Siddhartha Reddy Jonnalagadda
Abstract:
While large language models (LLMs) play increasingly significant roles in society, research shows they continue to generate content that reflects social bias against sensitive groups. Existing benchmarks effectively identify these biases, but a critical gap remains in understanding the underlying reasoning processes that produce them. This paper addresses this gap by evaluating the causal reasonin…
▽ More
While large language models (LLMs) play increasingly significant roles in society, research shows they continue to generate content that reflects social bias against sensitive groups. Existing benchmarks effectively identify these biases, but a critical gap remains in understanding the underlying reasoning processes that produce them. This paper addresses this gap by evaluating the causal reasoning of LLMs when answering socially biased questions. We propose a formal schema that categorizes causal reasoning into three types (mistaken, biased, and contextually-grounded). We then synthesize 1788 questions covering eight sensitive attributes, with each set of questions designed to probe a specific type of causal reasoning. All questions are then manually validated, and each of them prompts the LLM to generate a causal graph behind its answer. We evaluate four state-of-the-art LLMs and find that all models exhibit biased causal reasoning on most questions eliciting it. Moreover, we discover that LLMs are also prone to "mistaken-biased" reasoning, where they first confuse correlation with causality to infer sensitive group membership and subsequently apply biased causal reasoning. By examining the cases where LLMs produce unbiased causal reasoning, we also identify three strategies LLMs employ to avoid bias (i.e., explicitly refusing to answer, avoiding sensitive attributes, and adding contextual restrictions), which provide insights for future debiasing efforts.
△ Less
Submitted 10 March, 2026; v1 submitted 8 April, 2025;
originally announced April 2025.
-
A fast Fourier spectral method for the linearized Boltzmann collision operator
Authors:
Tianai Yin,
Zhenning Cai,
Yanli Wang
Abstract:
We introduce a fast Fourier spectral method to compute linearized collision operators of the Boltzmann equation for variable hard-sphere gases. While the state-of-the-art method provides a computational cost O(MN^4 log N), with N being the number of modes in each direction and M being the number of quadrature points on a hemisphere, our method reduces the cost to O(N^4 log N), removing the factor…
▽ More
We introduce a fast Fourier spectral method to compute linearized collision operators of the Boltzmann equation for variable hard-sphere gases. While the state-of-the-art method provides a computational cost O(MN^4 log N), with N being the number of modes in each direction and M being the number of quadrature points on a hemisphere, our method reduces the cost to O(N^4 log N), removing the factor M, which could be large in our numerical tests. The method is applied in a numerical solver for the steady-state Boltzmann equation with quadratic collision operators. Numerical experiments for both spatially homogeneous and inhomogeneous Boltzmann equations have been carried out to test the accuracy and efficiency of our method.
△ Less
Submitted 14 September, 2025; v1 submitted 12 March, 2025;
originally announced March 2025.
-
PGAD: Prototype-Guided Adaptive Distillation for Multi-Modal Learning in AD Diagnosis
Authors:
Yanfei Li,
Teng Yin,
Wenyi Shang,
Jingyu Liu,
Xi Wang,
Kaiyang Zhao
Abstract:
Missing modalities pose a major issue in Alzheimer's Disease (AD) diagnosis, as many subjects lack full imaging data due to cost and clinical constraints. While multi-modal learning leverages complementary information, most existing methods train only on complete data, ignoring the large proportion of incomplete samples in real-world datasets like ADNI. This reduces the effective training set and…
▽ More
Missing modalities pose a major issue in Alzheimer's Disease (AD) diagnosis, as many subjects lack full imaging data due to cost and clinical constraints. While multi-modal learning leverages complementary information, most existing methods train only on complete data, ignoring the large proportion of incomplete samples in real-world datasets like ADNI. This reduces the effective training set and limits the full use of valuable medical data. While some methods incorporate incomplete samples, they fail to effectively address inter-modal feature alignment and knowledge transfer challenges under high missing rates. To address this, we propose a Prototype-Guided Adaptive Distillation (PGAD) framework that directly incorporates incomplete multi-modal data into training. PGAD enhances missing modality representations through prototype matching and balances learning with a dynamic sampling strategy. We validate PGAD on the ADNI dataset with varying missing rates (20%, 50%, and 70%) and demonstrate that it significantly outperforms state-of-the-art approaches. Ablation studies confirm the effectiveness of prototype matching and adaptive sampling, highlighting the potential of our framework for robust and scalable AD diagnosis in real-world clinical settings.
△ Less
Submitted 26 August, 2025; v1 submitted 5 March, 2025;
originally announced March 2025.
-
Apollo-Forecast: Overcoming Aliasing and Inference Speed Challenges in Language Models for Time Series Forecasting
Authors:
Tianyi Yin,
Jingwei Wang,
Yunlong Ma,
Han Wang,
Chenze Wang,
Yukai Zhao,
Min Liu,
Weiming Shen,
Yufeng Chen
Abstract:
Encoding time series into tokens and using language models for processing has been shown to substantially augment the models' ability to generalize to unseen tasks. However, existing language models for time series forecasting encounter several obstacles, including aliasing distortion and prolonged inference times, primarily due to the limitations of quantization processes and the computational de…
▽ More
Encoding time series into tokens and using language models for processing has been shown to substantially augment the models' ability to generalize to unseen tasks. However, existing language models for time series forecasting encounter several obstacles, including aliasing distortion and prolonged inference times, primarily due to the limitations of quantization processes and the computational demands of large models. This paper introduces Apollo-Forecast, a novel framework that tackles these challenges with two key innovations: the Anti-Aliasing Quantization Module (AAQM) and the Race Decoding (RD) technique. AAQM adeptly encodes sequences into tokens while mitigating high-frequency noise in the original signals, thus enhancing both signal fidelity and overall quantization efficiency. RD employs a draft model to enable parallel processing and results integration, which markedly accelerates the inference speed for long-term predictions, particularly in large-scale models. Extensive experiments on various real-world datasets show that Apollo-Forecast outperforms state-of-the-art methods by 35.41\% and 18.99\% in WQL and MASE metrics, respectively, in zero-shot scenarios. Furthermore, our method achieves a 1.9X-2.7X acceleration in inference speed over baseline methods.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Authors:
Tianwei Yin,
Qiang Zhang,
Richard Zhang,
William T. Freeman,
Fredo Durand,
Eli Shechtman,
Xun Huang
Abstract:
Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies. The generation of a single frame requires the model to process the entire sequence, including the future. We address this limitation by adapting a pretrained bidirectional diffusion transformer to an autoregressive transformer that generates fra…
▽ More
Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies. The generation of a single frame requires the model to process the entire sequence, including the future. We address this limitation by adapting a pretrained bidirectional diffusion transformer to an autoregressive transformer that generates frames on-the-fly. To further reduce latency, we extend distribution matching distillation (DMD) to videos, distilling 50-step diffusion model into a 4-step generator. To enable stable and high-quality distillation, we introduce a student initialization scheme based on teacher's ODE trajectories, as well as an asymmetric distillation strategy that supervises a causal student model with a bidirectional teacher. This approach effectively mitigates error accumulation in autoregressive generation, allowing long-duration video synthesis despite training on short clips. Our model achieves a total score of 84.27 on the VBench-Long benchmark, surpassing all previous video generation models. It enables fast streaming generation of high-quality videos at 9.4 FPS on a single GPU thanks to KV caching. Our approach also enables streaming video-to-video translation, image-to-video, and dynamic prompting in a zero-shot manner.
△ Less
Submitted 23 September, 2025; v1 submitted 10 December, 2024;
originally announced December 2024.
-
Turbo3D: Ultra-fast Text-to-3D Generation
Authors:
Hanzhe Hu,
Tianwei Yin,
Fujun Luan,
Yiwei Hu,
Hao Tan,
Zexiang Xu,
Sai Bi,
Shubham Tulsiani,
Kai Zhang
Abstract:
We present Turbo3D, an ultra-fast text-to-3D system capable of generating high-quality Gaussian splatting assets in under one second. Turbo3D employs a rapid 4-step, 4-view diffusion generator and an efficient feed-forward Gaussian reconstructor, both operating in latent space. The 4-step, 4-view generator is a student model distilled through a novel Dual-Teacher approach, which encourages the stu…
▽ More
We present Turbo3D, an ultra-fast text-to-3D system capable of generating high-quality Gaussian splatting assets in under one second. Turbo3D employs a rapid 4-step, 4-view diffusion generator and an efficient feed-forward Gaussian reconstructor, both operating in latent space. The 4-step, 4-view generator is a student model distilled through a novel Dual-Teacher approach, which encourages the student to learn view consistency from a multi-view teacher and photo-realism from a single-view teacher. By shifting the Gaussian reconstructor's inputs from pixel space to latent space, we eliminate the extra image decoding time and halve the transformer sequence length for maximum efficiency. Our method demonstrates superior 3D generation results compared to previous baselines, while operating in a fraction of their runtime.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Providing Differential Privacy for Federated Learning Over Wireless: A Cross-layer Framework
Authors:
Jiayu Mao,
Tongxin Yin,
Aylin Yener,
Mingyan Liu
Abstract:
Federated Learning (FL) is a distributed machine learning framework that inherently allows edge devices to maintain their local training data, thus providing some level of privacy. However, FL's model updates still pose a risk of privacy leakage, which must be mitigated. Over-the-air FL (OTA-FL) is an adapted FL design for wireless edge networks that leverages the natural superposition property of…
▽ More
Federated Learning (FL) is a distributed machine learning framework that inherently allows edge devices to maintain their local training data, thus providing some level of privacy. However, FL's model updates still pose a risk of privacy leakage, which must be mitigated. Over-the-air FL (OTA-FL) is an adapted FL design for wireless edge networks that leverages the natural superposition property of the wireless medium. We propose a wireless physical layer (PHY) design for OTA-FL which improves differential privacy (DP) through a decentralized, dynamic power control that utilizes both inherent Gaussian noise in the wireless channel and a cooperative jammer (CJ) for additional artificial noise generation when higher privacy levels are required. Although primarily implemented within the Upcycled-FL framework, where a resource-efficient method with first-order approximations is used at every even iteration to decrease the required information from clients, our power control strategy is applicable to any FL framework, including FedAvg and FedProx as shown in the paper. This adaptation showcases the flexibility and effectiveness of our design across different learning algorithms while maintaining a strong emphasis on privacy. Our design removes the need for client-side artificial noise injection for DP, utilizing a cooperative jammer to enhance privacy without affecting transmission efficiency for higher privacy demands. Privacy analysis is provided using the Moments Accountant method. We perform a convergence analysis for non-convex objectives to tackle heterogeneous data distributions, highlighting the inherent trade-offs between privacy and accuracy. Numerical results show that our approach with various FL algorithms outperforms the state-of-the-art under the same DP conditions on the non-i.i.d. FEMNIST dataset, and highlight the cooperative jammer's effectiveness in ensuring strict privacy.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Graph-GIC: A Smart and Parallelized Geomagnetically Induced Current Modelling Algorithm Based on Graph Theory for Space Weather Applications
Authors:
Wen Chen,
Ding Yuan,
Xueshang Feng,
Stefaan Poedts,
Zhengyang Zou,
Song Feng,
Yuxuan Zhu,
Tong Yin
Abstract:
Geomagnetically Induced Current (GIC) refers to the electromagnetic response of the Earth and its conductive modern infrastructures to space weather and would pose a significant threat to high-voltage power grids designed for the alternative current operation. To assess the impact of space weather on the power grid, one needs to calculate the GIC on a national or continental scale. In this study,…
▽ More
Geomagnetically Induced Current (GIC) refers to the electromagnetic response of the Earth and its conductive modern infrastructures to space weather and would pose a significant threat to high-voltage power grids designed for the alternative current operation. To assess the impact of space weather on the power grid, one needs to calculate the GIC on a national or continental scale. In this study, we developed a smart and parallelized GIC modelling algorithm, Graph GIC. This algorithm deploys a graph representing a power grid in a single-line diagram, in which substations/transformers act as nodes and transmission lines as edges. With these denotations, a power grid and its electric parameters are mathematically represented with an adjacency matrix and an admittance matrix. We used sparse matrix and parallelisation techniques to expedite the intensive computation in cases of large-scale power grids. The Graph GIC was validated with a benchmark grid, applied to the GIC calculation of the 500 kV power grid of Guangdong, China, and conducted preliminary analysis on the grid's susceptibility to geomagnetic storms. The Graph GIC algorithm has the advantage of an intuitive and highly scalable graph representation of a power grid at any scale. It achieves high-accuracy calculation and a speedup of about 18 times after parallelisation. This algorithm could be applied to assess the impact of space weather on a power grid up to continental scales and could be incorporated into global space weather modelling frameworks.
△ Less
Submitted 29 October, 2024;
originally announced November 2024.
-
Can Efficient Fourier-Transform Techniques Favorably Impact on Broadband Computational Electromagnetism?
Authors:
Thomas G. Anderson,
Mark Lyon,
Tao Yin,
Oscar P. Bruno
Abstract:
In view of recently demonstrated joint use of novel Fourier-transform techniques and effective high-accuracy frequency domain solvers related to the Method of Moments, it is argued that a set of transformative innovations could be developed for the effective, accurate and efficient simulation of problems of wave propagation and scattering of broadband, time-dependent wavefields. This contribution…
▽ More
In view of recently demonstrated joint use of novel Fourier-transform techniques and effective high-accuracy frequency domain solvers related to the Method of Moments, it is argued that a set of transformative innovations could be developed for the effective, accurate and efficient simulation of problems of wave propagation and scattering of broadband, time-dependent wavefields. This contribution aims to convey the character of these methods and to highlight their applicability in computational modeling of electromagnetic configurations across various fields of science and engineering.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
Advancing Cyber-Attack Detection in Power Systems: A Comparative Study of Machine Learning and Graph Neural Network Approaches
Authors:
Tianzhixi Yin,
Syed Ahsan Raza Naqvi,
Sai Pushpak Nandanoori,
Soumya Kundu
Abstract:
This paper explores the detection and localization of cyber-attacks on time-series measurements data in power systems, focusing on comparing conventional machine learning (ML) like k-means, deep learning method like autoencoder, and graph neural network (GNN)-based techniques. We assess the detection accuracy of these approaches and their potential to pinpoint the locations of specific sensor meas…
▽ More
This paper explores the detection and localization of cyber-attacks on time-series measurements data in power systems, focusing on comparing conventional machine learning (ML) like k-means, deep learning method like autoencoder, and graph neural network (GNN)-based techniques. We assess the detection accuracy of these approaches and their potential to pinpoint the locations of specific sensor measurements under attack. Given the demonstrated success of GNNs in other time-series anomaly detection applications, we aim to evaluate their performance within the context of power systems cyber-attacks on sensor measurements. Utilizing the IEEE 68-bus system, we simulated four types of false data attacks, including scaling attacks, additive attacks, and their combinations, to test the selected approaches. Our results indicate that GNN-based methods outperform k-means and autoencoder in detection. Additionally, GNNs show promise in accurately localizing attacks for simple scenarios, although they still face challenges in more complex cases, especially ones that involve combinations of scaling and additive attacks.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
Aluminum Scandium Nitride as a Functional Material at 1000°C
Authors:
Venkateswarlu Gaddam,
Shaurya S. Dabas,
Jinghan Gao,
David J. Spry,
Garrett Baucom,
Nicholas G. Rudawski,
Tete Yin,
Ethan Angerhofer,
Philip G. Neudeck,
Honggyu Kim,
Philip X. -L. Feng,
Mark Sheplak,
Roozbeh Tabrizian
Abstract:
Aluminum scandium nitride (AlScN) has emerged as a highly promising material for high-temperature applications due to its robust piezoelectric, ferroelectric, and dielectric properties. This study investigates the behavior of Al0.7Sc0.3N thin films in extreme thermal environments, demonstrating functional stability up to 1000°C, making it suitable for use in aerospace, hypersonics, deep-well, and…
▽ More
Aluminum scandium nitride (AlScN) has emerged as a highly promising material for high-temperature applications due to its robust piezoelectric, ferroelectric, and dielectric properties. This study investigates the behavior of Al0.7Sc0.3N thin films in extreme thermal environments, demonstrating functional stability up to 1000°C, making it suitable for use in aerospace, hypersonics, deep-well, and nuclear reactor systems. Tantalum silicide (TaSi2)/Al0.7Sc0.3N/TaSi2 capacitors were fabricated and characterized across a wide temperature range, revealing robust ferroelectric and dielectric properties, along with significant enhancement in piezoelectric performance. At 1000°C, the ferroelectric hysteresis loops showed a substantial reduction in coercive field from 4.3 MV/cm to 1.2 MV/cm, while the longitudinal piezoelectric coefficient increased nearly tenfold, reaching 75.1 pm/V at 800°C. Structural analysis via scanning and transmission electron microscopy confirmed the integrity of the TaSi2/Al0.7Sc0.3N interfaces, even after exposure to extreme temperatures. Furthermore, the electromechanical coupling coefficient was calculated to increase by over 500%, from 12.9% at room temperature to 82% at 700°C. These findings establish AlScN as a versatile material for high-temperature ferroelectric, piezoelectric, and dielectric applications, offering unprecedented thermal stability and functional enhancement.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
Online Resynthesis of High-Level Collaborative Tasks for Robots with Changing Capabilities
Authors:
Amy Fang,
Tenny Yin,
Hadas Kress-Gazit
Abstract:
Given a collaborative high-level task and a team of heterogeneous robots and behaviors to satisfy it, this work focuses on the challenge of automatically, at runtime, adjusting the individual robot behaviors such that the task is still satisfied, when robots encounter changes to their abilities--either failures or additional actions they can perform. We consider tasks encoded in LTL^ψand minimize…
▽ More
Given a collaborative high-level task and a team of heterogeneous robots and behaviors to satisfy it, this work focuses on the challenge of automatically, at runtime, adjusting the individual robot behaviors such that the task is still satisfied, when robots encounter changes to their abilities--either failures or additional actions they can perform. We consider tasks encoded in LTL^ψand minimize global teaming reassignments (and as a result, local resynthesis) when robots' capabilities change. We also increase the expressivity of LTL^ψby including additional types of constraints on the overall teaming assignment that the user can specify, such as the minimum number of robots required for each assignment. We demonstrate the framework in a simulated warehouse scenario.
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
Intermittent Semi-Working Mask: A New Masking Paradigm for LLMs
Authors:
HaoYuan Hu,
Mingcong Lu,
Di Luo,
XinYa Wu,
Jiangcai Zhu,
Taoye Yin,
Zheng Li,
Hao Wang,
Shusheng Zhang,
KeZun Zhang,
KaiLai Shao,
Chao Chen,
Feng Wang
Abstract:
Multi-turn dialogues and context-intensive tasks challenge Large Language Models (LLMs) to integrate long histories without sacrificing generation quality. Although prefix LLMs can better exploit historical context via bidirectional attention on prefix tokens, they are rarely used in practice because multi-turn training requires many duplicated triplets, and its bidirectional prefix prevents KV-ca…
▽ More
Multi-turn dialogues and context-intensive tasks challenge Large Language Models (LLMs) to integrate long histories without sacrificing generation quality. Although prefix LLMs can better exploit historical context via bidirectional attention on prefix tokens, they are rarely used in practice because multi-turn training requires many duplicated triplets, and its bidirectional prefix prevents KV-cache reuse at inference time, driving up high cost and latency. To retain the contextual understanding of prefix mask while preserving the inference-time efficiency of causal mask, we introduce Intermittent Semi-working Mask (ISM), a masking scheme that injects sparse bidirectional attention into the causal backbone. ISM alternates bidirectional attention over query segments with unidirectional attention over answer segments, enabling the synthesis of in-context while preserving global causality. This design eliminates triplet expansion during training and maintains KV-cache reuse during inference, yielding latency comparable to standard causal LLMs. ISM is architecture-agnostic and parameter-free, adding only minimal latency. Across extensive evaluations, ISM outperforms causal baselines not only on multi-turn dialogue, but also on context-intensive tasks like mathematical reasoning.
△ Less
Submitted 17 February, 2026; v1 submitted 1 August, 2024;
originally announced August 2024.
-
Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems
Authors:
Yuepeng Chen,
Weiping Ding,
Hengrong Ju,
Jiashuang Huang,
Tao Yin
Abstract:
Feature selection is a vital technique in machine learning, as it can reduce computational complexity, improve model performance, and mitigate the risk of overfitting. However, the increasing complexity and dimensionality of datasets pose significant challenges in the selection of features. Focusing on these challenges, this paper proposes a cascaded two-stage feature clustering and selection algo…
▽ More
Feature selection is a vital technique in machine learning, as it can reduce computational complexity, improve model performance, and mitigate the risk of overfitting. However, the increasing complexity and dimensionality of datasets pose significant challenges in the selection of features. Focusing on these challenges, this paper proposes a cascaded two-stage feature clustering and selection algorithm for fuzzy decision systems. In the first stage, we reduce the search space by clustering relevant features and addressing inter-feature redundancy. In the second stage, a clustering-based sequentially forward selection method that explores the global and local structure of data is presented. We propose a novel metric for assessing the significance of features, which considers both global separability and local consistency. Global separability measures the degree of intra-class cohesion and inter-class separation based on fuzzy membership, providing a comprehensive understanding of data separability. Meanwhile, local consistency leverages the fuzzy neighborhood rough set model to capture uncertainty and fuzziness in the data. The effectiveness of our proposed algorithm is evaluated through experiments conducted on 18 public datasets and a real-world schizophrenia dataset. The experiment results demonstrate our algorithm's superiority over benchmarking algorithms in both classification accuracy and the number of selected features.
△ Less
Submitted 21 July, 2024;
originally announced July 2024.
-
Model editing for distribution shifts in uranium oxide morphological analysis
Authors:
Davis Brown,
Cody Nizinski,
Madelyn Shapiro,
Corey Fallon,
Tianzhixi Yin,
Henry Kvinge,
Jonathan H. Tu
Abstract:
Deep learning still struggles with certain kinds of scientific data. Notably, pretraining data may not provide coverage of relevant distribution shifts (e.g., shifts induced via the use of different measurement instruments). We consider deep learning models trained to classify the synthesis conditions of uranium ore concentrates (UOCs) and show that model editing is particularly effective for impr…
▽ More
Deep learning still struggles with certain kinds of scientific data. Notably, pretraining data may not provide coverage of relevant distribution shifts (e.g., shifts induced via the use of different measurement instruments). We consider deep learning models trained to classify the synthesis conditions of uranium ore concentrates (UOCs) and show that model editing is particularly effective for improving generalization to distribution shifts common in this domain. In particular, model editing outperforms finetuning on two curated datasets comprising of micrographs taken of U$_{3}$O$_{8}$ aged in humidity chambers and micrographs acquired with different scanning electron microscopes, respectively.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Continuous Execution of High-Level Collaborative Tasks for Heterogeneous Robot Teams
Authors:
Amy Fang,
Tenny Yin,
Jiawei Lin,
Hadas Kress-Gazit
Abstract:
We propose a control synthesis framework for a heterogeneous multi-robot system to satisfy collaborative tasks, where actions may take varying duration of time to complete. We encode tasks using the discrete logic LTL^ψ, which uses the concept of bindings to interleave robot actions and express information about relationship between specific task requirements and robot assignments. We present a sy…
▽ More
We propose a control synthesis framework for a heterogeneous multi-robot system to satisfy collaborative tasks, where actions may take varying duration of time to complete. We encode tasks using the discrete logic LTL^ψ, which uses the concept of bindings to interleave robot actions and express information about relationship between specific task requirements and robot assignments. We present a synthesis approach to automatically generate a teaming assignment and corresponding discrete behavior that is correct-by-construction for continuous execution, while also implementing synchronization policies to ensure collaborative portions of the task are satisfied. We demonstrate our approach on a physical multi-robot system.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Improved Distribution Matching Distillation for Fast Image Synthesis
Authors:
Tianwei Yin,
Michaël Gharbi,
Taesung Park,
Richard Zhang,
Eli Shechtman,
Fredo Durand,
William T. Freeman
Abstract:
Recent approaches have shown promises distilling diffusion models into efficient one-step generators. Among them, Distribution Matching Distillation (DMD) produces one-step generators that match their teacher in distribution, without enforcing a one-to-one correspondence with the sampling trajectories of their teachers. However, to ensure stable training, DMD requires an additional regression loss…
▽ More
Recent approaches have shown promises distilling diffusion models into efficient one-step generators. Among them, Distribution Matching Distillation (DMD) produces one-step generators that match their teacher in distribution, without enforcing a one-to-one correspondence with the sampling trajectories of their teachers. However, to ensure stable training, DMD requires an additional regression loss computed using a large set of noise-image pairs generated by the teacher with many steps of a deterministic sampler. This is costly for large-scale text-to-image synthesis and limits the student's quality, tying it too closely to the teacher's original sampling paths. We introduce DMD2, a set of techniques that lift this limitation and improve DMD training. First, we eliminate the regression loss and the need for expensive dataset construction. We show that the resulting instability is due to the fake critic not estimating the distribution of generated samples accurately and propose a two time-scale update rule as a remedy. Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images. This lets us train the student model on real data, mitigating the imperfect real score estimation from the teacher model, and enhancing quality. Lastly, we modify the training procedure to enable multi-step sampling. We identify and address the training-inference input mismatch problem in this setting, by simulating inference-time generator samples during training time. Taken together, our improvements set new benchmarks in one-step image generation, with FID scores of 1.28 on ImageNet-64x64 and 8.35 on zero-shot COCO 2014, surpassing the original teacher despite a 500X reduction in inference cost. Further, we show our approach can generate megapixel images by distilling SDXL, demonstrating exceptional visual quality among few-step methods.
△ Less
Submitted 24 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
PML-based boundary integral equation method for electromagnetic scattering problems in a layered-medium
Authors:
Gang Bao,
Wangtao Lu,
Tao Yin,
Lu Zhang
Abstract:
This paper proposes a new boundary integral equation (BIE) methodology based on the perfectly matched layer (PML) truncation technique for solving the electromagnetic scattering problems in a multi-layered medium. Instead of using the original PML stretched fields, artificial fields which are also equivalent to the solutions in the physical region are introduced. This significantly simplifies the…
▽ More
This paper proposes a new boundary integral equation (BIE) methodology based on the perfectly matched layer (PML) truncation technique for solving the electromagnetic scattering problems in a multi-layered medium. Instead of using the original PML stretched fields, artificial fields which are also equivalent to the solutions in the physical region are introduced. This significantly simplifies the study of the proposed methodology to derive the PML problem. Then some PML transformed layer potentials and the associated boundary integral operators (BIOs) are defined and the corresponding jump relations are shown. Under the assumption that the fields vanish on the PML boundary, the solution representations, as well as the related BIEs and regularization of the hyper-singular operators, in terms of the current density functions on the truncated interface, are derived. Numerical experiments are presented to demonstrate the efficiency and accuracy of the method.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
On a Robin-type non-singular coupling scheme for solving the wave scattering problems
Authors:
Xiaojuan Liu,
Maojun Li,
Tao Yin
Abstract:
This paper studies a non-singular coupling scheme for solving the acoustic and elastic wave scattering problems and its extension to the problems of Laplace and Lamé equations and the problem with a compactly supported inhomogeneity is also briefly discussed. Relying on the solution representation of the wave scattering problem, a Robin-type artificial boundary condition in terms of layer potentia…
▽ More
This paper studies a non-singular coupling scheme for solving the acoustic and elastic wave scattering problems and its extension to the problems of Laplace and Lamé equations and the problem with a compactly supported inhomogeneity is also briefly discussed. Relying on the solution representation of the wave scattering problem, a Robin-type artificial boundary condition in terms of layer potentials whose kernels are non-singular, is introduced to obtain a reduced problem on a bounded domain. The wellposedness of the reduced problems and the a priori error estimates of the corresponding finite element discretization are proved. Numerical examples are presented to demonstrate the accuracy and efficiency of the proposed method.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
Helmholtz decomposition based windowed Green function methods for elastic scattering problems on a half-space
Authors:
Tao Yin,
Lu Zhang,
Weiying Zheng,
Xiaopeng Zhu
Abstract:
This paper proposes a new Helmholtz decomposition based windowed Green function (HD-WGF) method for solving the time-harmonic elastic scattering problems on a half-space with Dirichlet boundary conditions in both 2D and 3D. The Helmholtz decomposition is applied to separate the pressure and shear waves, which satisfy the Helmholtz and Helmholtz/Maxwell equations, respectively, and the correspondin…
▽ More
This paper proposes a new Helmholtz decomposition based windowed Green function (HD-WGF) method for solving the time-harmonic elastic scattering problems on a half-space with Dirichlet boundary conditions in both 2D and 3D. The Helmholtz decomposition is applied to separate the pressure and shear waves, which satisfy the Helmholtz and Helmholtz/Maxwell equations, respectively, and the corresponding boundary integral equations of type $(\mathbb{I}+\mathbb{T})\bsφ=\bs f$, that couple these two waves on the unbounded surface, are derived based on the free-space fundamental solution of Helmholtz equation. This approach avoids the treatment of the complex elastic displacement tensor and traction operator that involved in the classical integral equation method for elastic problems. Then a smooth ``slow-rise'' windowing function is introduced to truncate the boundary integral equations and a ``correction'' strategy is proposed to ensure the uniformly fast convergence for all incident angles of plane incidence. Numerical experiments for both two and three dimensional problems are presented to demonstrate the accuracy and efficiency of the proposed method.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.
-
Non-iterative Methods in Inhomogeneous Background Inverse Scattering Imaging Problem Assisted by Swin Transformer Network
Authors:
Naike Du,
Tiantian Yin,
Jing Wang,
Rencheng Song,
Kuiwen Xu,
Bingyuan Liang,
Sheng Sun,
Xiuzhu Ye
Abstract:
A deep learning-assisted inversion method is proposed to solve the inhomogeneous background imaging problem. Three non-iterative methods, namely the distorted-Born (DB) major current coefficients method, the DB modified Born approximation method, and the DB connection method, are introduced to address the inhomogeneous background inverse scattering problem. These methods retain the multiple scatte…
▽ More
A deep learning-assisted inversion method is proposed to solve the inhomogeneous background imaging problem. Three non-iterative methods, namely the distorted-Born (DB) major current coefficients method, the DB modified Born approximation method, and the DB connection method, are introduced to address the inhomogeneous background inverse scattering problem. These methods retain the multiple scattering information by utilizing the major current obtained through singular value decomposition of the Green's function and the scattered field, without resourcing to optimization techniques. As a result, the proposed methods offer improved reconstruction resolution and accuracy for unknown objects embedded in inhomogeneous backgrounds, surpassing the backpropagation scheme (BPS) and Born approximation (BA) method that disregard the multiple scattering effect. To further enhance the resolution and accuracy of the reconstruction, a Shifted-Window (Swin) transformer network is employed for capturing super-resolution information in the images. The attention mechanism incorporated in the shifted window facilitates global interactions between objects, thereby enhancing the performance of the inhomogeneous background imaging algorithm while reducing computational complexity. Moreover, an adaptive training method is proposed to enhance the generalization ability of the network. The effectiveness of the proposed methods is demonstrated through both synthetic data and experimental data. Notably, super-resolution imaging is achieved with quasi real-time speed, indicating promising application potential for the proposed algorithms.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
One-step Diffusion with Distribution Matching Distillation
Authors:
Tianwei Yin,
Michaël Gharbi,
Richard Zhang,
Eli Shechtman,
Fredo Durand,
William T. Freeman,
Taesung Park
Abstract:
Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence whose gradient c…
▽ More
Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence whose gradient can be expressed as the difference between 2 score functions, one of the target distribution and the other of the synthetic distribution being produced by our one-step generator. The score functions are parameterized as two diffusion models trained separately on each distribution. Combined with a simple regression loss matching the large-scale structure of the multi-step diffusion outputs, our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k, comparable to Stable Diffusion but orders of magnitude faster. Utilizing FP16 inference, our model generates images at 20 FPS on modern hardware.
△ Less
Submitted 4 October, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
A symmetric Gauss-Seidel method for the steady-state Boltzmann equation
Authors:
Tianai Yin,
Zhenning Cai,
Yanli Wang
Abstract:
We introduce numerical solvers for the steady-state Boltzmann equation based on the symmetric Gauss-Seidel (SGS) method. Due to the quadratic collision operator in the Boltzmann equation, the SGS method requires solving a nonlinear system on each grid cell, and we consider two methods, namely Newton's method and the fixed-point iteration, in our numerical tests. For small Knudsen numbers, our meth…
▽ More
We introduce numerical solvers for the steady-state Boltzmann equation based on the symmetric Gauss-Seidel (SGS) method. Due to the quadratic collision operator in the Boltzmann equation, the SGS method requires solving a nonlinear system on each grid cell, and we consider two methods, namely Newton's method and the fixed-point iteration, in our numerical tests. For small Knudsen numbers, our method has an efficiency between the classical source iteration and the modern generalized synthetic iterative scheme, and the complexity of its implementation is closer to the source iteration. A variety of numerical tests are carried out to demonstrate its performance, and it is concluded that the proposed method is suitable for applications with moderate to large Knudsen numbers.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Resilient Control of Networked Microgrids using Vertical Federated Reinforcement Learning: Designs and Real-Time Test-Bed Validations
Authors:
Sayak Mukherjee,
Ramij R. Hossain,
Sheik M. Mohiuddin,
Yuan Liu,
Wei Du,
Veronica Adetola,
Rohit A. Jinsiwale,
Qiuhua Huang,
Tianzhixi Yin,
Ankit Singhal
Abstract:
Improving system-level resiliency of networked microgrids is an important aspect with increased population of inverter-based resources (IBRs). This paper (1) presents resilient control design in presence of adversarial cyber-events, and proposes a novel federated reinforcement learning (Fed-RL) approach to tackle (a) model complexities, unknown dynamical behaviors of IBR devices, (b) privacy issue…
▽ More
Improving system-level resiliency of networked microgrids is an important aspect with increased population of inverter-based resources (IBRs). This paper (1) presents resilient control design in presence of adversarial cyber-events, and proposes a novel federated reinforcement learning (Fed-RL) approach to tackle (a) model complexities, unknown dynamical behaviors of IBR devices, (b) privacy issues regarding data sharing in multi-party-owned networked grids, and (2) transfers learned controls from simulation to hardware-in-the-loop test-bed, thereby bridging the gap between simulation and real world. With these multi-prong objectives, first, we formulate a reinforcement learning (RL) training setup generating episodic trajectories with adversaries (attack signal) injected at the primary controllers of the grid forming (GFM) inverters where RL agents (or controllers) are being trained to mitigate the injected attacks. For networked microgrids, the horizontal Fed-RL method involving distinct independent environments is not appropriate, leading us to develop vertical variant Federated Soft Actor-Critic (FedSAC) algorithm to grasp the interconnected dynamics of networked microgrid. Next, utilizing OpenAI Gym interface, we built a custom simulation set-up in GridLAB-D/HELICS co-simulation platform, named Resilient RL Co-simulation (ResRLCoSIM), to train the RL agents with IEEE 123-bus benchmark test systems comprising 3 interconnected microgrids. Finally, the learned policies in simulation world are transferred to the real-time hardware-in-the-loop test-bed set-up developed using high-fidelity Hypersim platform. Experiments show that the simulator-trained RL controllers produce convincing results with the real-time test-bed set-up, validating the minimization of sim-to-real gap.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Federated Learning with Reduced Information Leakage and Computation
Authors:
Tongxin Yin,
Xuwei Tan,
Xueru Zhang,
Mohammad Mahdi Khalili,
Mingyan Liu
Abstract:
Federated learning (FL) is a distributed learning paradigm that allows multiple decentralized clients to collaboratively learn a common model without sharing local data. Although local data is not exposed directly, privacy concerns nonetheless exist as clients' sensitive information can be inferred from intermediate computations. Moreover, such information leakage accumulates substantially over ti…
▽ More
Federated learning (FL) is a distributed learning paradigm that allows multiple decentralized clients to collaboratively learn a common model without sharing local data. Although local data is not exposed directly, privacy concerns nonetheless exist as clients' sensitive information can be inferred from intermediate computations. Moreover, such information leakage accumulates substantially over time as the same data is repeatedly used during the iterative learning process. As a result, it can be particularly difficult to balance the privacy-accuracy trade-off when designing privacy-preserving FL algorithms. This paper introduces Upcycled-FL, a simple yet effective strategy that applies first-order approximation at every even round of model update. Under this strategy, half of the FL updates incur no information leakage and require much less computational and transmission costs. We first conduct the theoretical analysis on the convergence (rate) of Upcycled-FL and then apply two perturbation mechanisms to preserve privacy. Extensive experiments on both synthetic and real-world data show that the Upcycled-FL strategy can be adapted to many existing FL frameworks and consistently improve the privacy-accuracy trade-off.
△ Less
Submitted 1 October, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.