-
MedP-CLIP: Medical CLIP with Region-Aware Prompt Integration
Authors:
Jiahui Peng,
He Yao,
Jingwen Li,
Yanzhou Su,
Sibo Ju,
Yujie Lu,
Jin Ye,
Hongchun Lu,
Xue Li,
Lincheng Jiang,
Min Zhu,
Junlong Cheng
Abstract:
Contrastive Language-Image Pre-training (CLIP) has demonstrated outstanding performance in global image understanding and zero-shot transfer through large-scale text-image alignment. However, the core of medical image analysis often lies in the fine-grained understanding of specific anatomical structures or lesion regions. Therefore, precisely comprehending region-of-interest (RoI) information pro…
▽ More
Contrastive Language-Image Pre-training (CLIP) has demonstrated outstanding performance in global image understanding and zero-shot transfer through large-scale text-image alignment. However, the core of medical image analysis often lies in the fine-grained understanding of specific anatomical structures or lesion regions. Therefore, precisely comprehending region-of-interest (RoI) information provided by medical professionals or perception models becomes crucial. To address this need, we propose MedP-CLIP, a region-aware medical vision-language model (VLM). MedP-CLIP innovatively integrates medical prior knowledge and designs a feature-level region prompt integration mechanism, enabling it to flexibly respond to various prompt forms (e.g., points, bounding boxes, masks) while maintaining global contextual awareness when focusing on local regions. We pre-train the model on a meticulously constructed large-scale dataset (containing over 6.4 million medical images and 97.3 million region-level annotations), equipping it with cross-disease and cross-modality fine-grained spatial semantic understanding capabilities. Experiments demonstrate that MedP-CLIP significantly outperforms baseline methods in various medical tasks, including zero-shot recognition, interactive segmentation, and empowering multimodal large language models. This model provides a scalable, plug-and-play visual backbone for medical AI, combining holistic image understanding with precise regional analysis.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control
Authors:
Siwei Ju,
Jan Tauberschmidt,
Oleg Arenz,
Peter van Vliet,
Jan Peters
Abstract:
Learning high-performance control policies that remain consistent with expert behavior is a fundamental challenge in robotics. Reinforcement learning can discover high-performing strategies but often departs from desirable human behavior, whereas imitation learning is limited by demonstration quality and struggles to improve beyond expert data. We propose a behavior-constrained reinforcement learn…
▽ More
Learning high-performance control policies that remain consistent with expert behavior is a fundamental challenge in robotics. Reinforcement learning can discover high-performing strategies but often departs from desirable human behavior, whereas imitation learning is limited by demonstration quality and struggles to improve beyond expert data. We propose a behavior-constrained reinforcement learning framework that improves beyond demonstrations while explicitly controlling deviation from expert behavior. Because expert-consistent behavior in dynamic control is inherently trajectory-level, we introduce a receding-horizon predictive mechanism that models short-term future trajectories and provides look-ahead rewards during training. To account for the natural variability of human behavior under disturbances and changing conditions, we further condition the policy on reference trajectories, allowing it to represent a distribution of expert-consistent behaviors rather than a single deterministic target. Empirically, we evaluate the approach in high-fidelity race car simulation using data from professional drivers, a domain characterized by extreme dynamics and narrow performance margins. The learned policies achieve competitive lap times while maintaining close alignment with expert driving behavior, outperforming baseline methods in both performance and imitation quality. Beyond standard benchmarks, we conduct human-grounded evaluation in a driver-in-the-loop simulator and show that the learned policies reproduce setup-dependent driving characteristics consistent with the feedback of top-class professional race drivers. These results demonstrate that our method enables learning high-performance control policies that are both optimal and behavior-consistent, and can serve as reliable surrogates for human decision-making in complex control systems.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
MAESIL: Masked Autoencoder for Enhanced Self-supervised Medical Image Learning
Authors:
Kyeonghun Kim,
Hyeonseok Jung,
Youngung Han,
Junsu Lim,
YeonJu Jean,
Seongbin Park,
Eunseob Choi,
Hyunsu Go,
SeoYoung Ju,
Seohyoung Park,
Gyeongmin Kim,
MinJu Kwon,
KyungSeok Yuh,
Soo Yong Kim,
Ken Ying-Kai Liao,
Nam-Joon Kim,
Hyuk-Jae Lee
Abstract:
Training deep learning models for three-dimensional (3D) medical imaging, such as Computed Tomography (CT), is fundamentally challenged by the scarcity of labeled data. While pre-training on natural images is common, it results in a significant domain shift, limiting performance. Self-Supervised Learning (SSL) on unlabeled medical data has emerged as a powerful solution, but prominent frameworks o…
▽ More
Training deep learning models for three-dimensional (3D) medical imaging, such as Computed Tomography (CT), is fundamentally challenged by the scarcity of labeled data. While pre-training on natural images is common, it results in a significant domain shift, limiting performance. Self-Supervised Learning (SSL) on unlabeled medical data has emerged as a powerful solution, but prominent frameworks often fail to exploit the inherent 3D nature of CT scans. These methods typically process 3D scans as a collection of independent 2D slices, an approach that fundamentally discards critical axial coherence and the 3D structural context. To address this limitation, we propose the autoencoder for enhanced self-supervised medical image learning(MAESIL), a novel self-supervised learning framework designed to capture 3D structural information efficiently. The core innovation is the 'superpatch', a 3D chunk-based input unit that balances 3D context preservation with computational efficiency. Our framework partitions the volume into superpatches and employs a 3D masked autoencoder strategy with a dual-masking strategy to learn comprehensive spatial representations. We validated our approach on three diverse large-scale public CT datasets. Our experimental results show that MAESIL demonstrates significant improvements over existing methods such as AE, VAE and VQ-VAE in key reconstruction metrics such as PSNR and SSIM. This establishes MAESIL as a robust and practical pre-training solution for 3D medical imaging tasks.
△ Less
Submitted 1 April, 2026;
originally announced April 2026.
-
COTTA: Context-Aware Transfer Adaptation for Trajectory Prediction in Autonomous Driving
Authors:
Seohyoung Park,
Jaeyeol Lim,
Seoyoung Ju,
Kyeonghun Kim,
Nam-Joon Kim,
Hyuk-Jae Lee
Abstract:
Developing robust models to accurately predict the trajectories of surrounding agents is fundamental to autonomous driving safety. However, most public datasets, such as the Waymo Open Motion Dataset and Argoverse, are collected in Western road environments and do not reflect the unique traffic patterns, infrastructure, and driving behaviors of other regions, including South Korea. This domain dis…
▽ More
Developing robust models to accurately predict the trajectories of surrounding agents is fundamental to autonomous driving safety. However, most public datasets, such as the Waymo Open Motion Dataset and Argoverse, are collected in Western road environments and do not reflect the unique traffic patterns, infrastructure, and driving behaviors of other regions, including South Korea. This domain discrepancy leads to performance degradation when state-of-the-art models trained on Western data are deployed in different geographic contexts. In this work, we investigate the adaptability of Query-Centric Trajectory Prediction (QCNet) when transferred from U.S.-based data to Korean road environments. Using a Korean autonomous driving dataset, we compare four training strategies: zero-shot transfer, training from scratch, full fine-tuning, and encoder freezing. Experimental results demonstrate that leveraging pretrained knowledge significantly improves prediction performance. Specifically, selectively fine-tuning the decoder while freezing the encoder yields the best trade-off between accuracy and training efficiency, reducing prediction error by over 66% compared to training from scratch. This study provides practical insights into effective transfer learning strategies for deploying trajectory prediction models in new geographic domains.
△ Less
Submitted 31 March, 2026;
originally announced April 2026.
-
CIPHER: Counterfeit Image Pattern High-level Examination via Representation
Authors:
Kyeonghun Kim,
Youngung Han,
Seoyoung Ju,
Yeonju Jean,
YooHyun Kim,
Minseo Choi,
SuYeon Lim,
Kyungtae Park,
Seungwoo Baek,
Sieun Hyeon,
Nam-Joon Kim,
Hyuk-Jae Lee
Abstract:
The rapid progress of generative adversarial networks (GANs) and diffusion models has enabled the creation of synthetic faces that are increasingly difficult to distinguish from real images. This progress, however, has also amplified the risks of misinformation, fraud, and identity abuse, underscoring the urgent need for detectors that remain robust across diverse generative models. In this work,…
▽ More
The rapid progress of generative adversarial networks (GANs) and diffusion models has enabled the creation of synthetic faces that are increasingly difficult to distinguish from real images. This progress, however, has also amplified the risks of misinformation, fraud, and identity abuse, underscoring the urgent need for detectors that remain robust across diverse generative models. In this work, we introduce Counterfeit Image Pattern High-level Examination via Representation(CIPHER), a deepfake detection framework that systematically reuses and fine-tunes discriminators originally trained for image generation. By extracting scale-adaptive features from ProGAN discriminators and temporal-consistency features from diffusion models, CIPHER captures generation-agnostic artifacts that conventional detectors often overlook. Through extensive experiments across nine state-of-the-art generative models, CIPHER demonstrates superior cross-model detection performance, achieving up to 74.33% F1-score and outperforming existing ViT-based detectors by over 30% in F1-score on average. Notably, our approach maintains robust performance on challenging datasets where baseline methods fail, with up to 88% F1-score on CIFAKE compared to near-zero performance from conventional detectors. These results validate the effectiveness of discriminator reuse and cross-model fine-tuning, establishing CIPHER as a promising approach toward building more generalizable and robust deepfake detection systems in an era of rapidly evolving generative technologies.
△ Less
Submitted 31 March, 2026;
originally announced March 2026.
-
FOSCU: Feasibility of Synthetic MRI Generation via Duo-Diffusion Models for Enhancement of 3D U-Nets in Hepatic Segmentation
Authors:
Youngung Han,
Kyeonghun Kim,
Seoyoung Ju,
Yeonju Jean,
Minkyung Cha,
Seohyoung Park,
Hyeonseok Jung,
Nam-Joon Kim,
Woo Kyoung Jeong,
Ken Ying-Kai Liao,
Hyuk-Jae Lee
Abstract:
Medical image segmentation faces fundamental challenges including restricted access, costly annotation, and data shortage to clinical datasets through Picture Archiving and Communication Systems (PACS). These systemic barriers significantly impede the development of robust segmentation algorithms. To address these challenges, we propose FOSCU, which integrates Duo-Diffusion, a 3D latent diffusion…
▽ More
Medical image segmentation faces fundamental challenges including restricted access, costly annotation, and data shortage to clinical datasets through Picture Archiving and Communication Systems (PACS). These systemic barriers significantly impede the development of robust segmentation algorithms. To address these challenges, we propose FOSCU, which integrates Duo-Diffusion, a 3D latent diffusion model with ControlNet that simultaneously generates high-resolution, anatomically realistic synthetic MRI volumes and corresponding segmentation labels, and an enhanced 3D U-Net training pipeline. Duo-Diffusion employs segmentation-conditioned diffusion to ensure spatial consistency and precise anatomical detail in the generated data. Experimental evaluation on 720 abdominal MRI scans shows that models trained with combined real and synthetic data yield a mean Dice score gain of 0.67% over those using only real data, and achieve a 36.4% reduction in Fréchet Inception Distance (FID), reflecting enhanced image fidelity.
△ Less
Submitted 31 March, 2026;
originally announced March 2026.
-
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
Authors:
Zhongying Deng,
Cheng Tang,
Ziyan Huang,
Jiashi Lin,
Ying Chen,
Junzhi Ning,
Chenglong Ma,
Jiyao Liu,
Wei Li,
Yinghao Zhu,
Shujian Gao,
Yanyan Huang,
Sibo Ju,
Yanzhou Su,
Pengcheng Chen,
Wenhao Tang,
Tianbin Li,
Haoyu Wang,
Yuanfeng Ji,
Hui Sun,
Shaobo Min,
Liang Peng,
Feilong Tang,
Haochen Xue,
Rulin Zhou
, et al. (102 additional authors not shown)
Abstract:
Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical expertise and strict ethical and privacy constraints, resulting in a scarcity of…
▽ More
Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical expertise and strict ethical and privacy constraints, resulting in a scarcity of large-scale unified medical datasets and hindering the development of powerful medical foundation models. In this work, we present the largest survey to date of medical image datasets, covering over 1,000 open-access datasets with a systematic catalog of their modalities, tasks, anatomies, annotations, limitations, and potential for integration. Our analysis exposes a landscape that is modest in scale, fragmented across narrowly scoped tasks, and unevenly distributed across organs and modalities, which in turn limits the utility of existing medical image datasets for developing versatile and robust medical foundation models. To turn fragmentation into scale, we propose a metadata-driven fusion paradigm (MDFP) that integrates public datasets with shared modalities or tasks, thereby transforming multiple small data silos into larger, more coherent resources. Building on MDFP, we release an interactive discovery portal that enables end-to-end, automated medical image dataset integration, and compile all surveyed datasets into a unified, structured table that clearly summarizes their key characteristics and provides reference links, offering the community an accessible and comprehensive repository. By charting the current terrain and offering a principled path to dataset consolidation, our survey provides a practical roadmap for scaling medical imaging corpora, supporting faster data discovery, more principled dataset creation, and more capable medical foundation models.
△ Less
Submitted 28 March, 2026;
originally announced March 2026.
-
3D-LLDM: Label-Guided 3D Latent Diffusion Model for Improving High-Resolution Synthetic MR Imaging in Hepatic Structure Segmentation
Authors:
Kyeonghun Kim,
Jaehyeok Bae,
Youngung Han,
Joo Young Bae,
Seoyoung Ju,
Junsu Lim,
Gyeongmin Kim,
Nam-Joon Kim,
Woo Kyoung Jeong,
Ken Ying-Kai Liao,
Won Jae Lee,
Pa Hong,
Hyuk-Jae Lee
Abstract:
Deep learning and generative models are advancing rapidly, with synthetic data increasingly being integrated into training pipelines for downstream analysis tasks. However, in medical imaging, their adoption remains constrained by the scarcity of reliable annotated datasets. To address this limitation, we propose 3D-LLDM, a label-guided 3D latent diffusion model that generates high-quality synthet…
▽ More
Deep learning and generative models are advancing rapidly, with synthetic data increasingly being integrated into training pipelines for downstream analysis tasks. However, in medical imaging, their adoption remains constrained by the scarcity of reliable annotated datasets. To address this limitation, we propose 3D-LLDM, a label-guided 3D latent diffusion model that generates high-quality synthetic magnetic resonance (MR) volumes with corresponding anatomical segmentation masks. Our approach uses hepatobiliary phase MR images enhanced with the Gd-EOB-DTPA contrast agent to derive structural masks for the liver, portal vein, hepatic vein, and hepatocellular carcinoma, which then guide volumetric synthesis through a ControlNet-based architecture. Trained on 720 real clinical hepatobiliary phase MR scans from Samsung Medical Center, 3D-LLDM achieves a Fréchet Inception Distance (FID) of 28.31, improving over GANs by 70.9% and over state-of-the-art diffusion baselines by 26.7%. When used for data augmentation, the synthetic volumes improve hepatocellular carcinoma segmentation by up to 11.153% Dice score across five CNN architectures.
△ Less
Submitted 24 March, 2026;
originally announced March 2026.
-
ForestPrune: High-ratio Visual Token Compression for Video Multimodal Large Language Models via Spatial-Temporal Forest Modeling
Authors:
Shaobo Ju,
Baiyang Song,
Tao Chen,
Jiapeng Zhang,
Qiong Wu,
Chao Chang,
HuaiXi Wang,
Yiyi Zhou,
Rongrong Ji
Abstract:
Due to the great saving of computation and memory overhead, token compression has become a research hot-spot for MLLMs and achieved remarkable progress in image-language tasks. However, for the video, existing methods still fall short of high-ratio token compression. We attribute this shortcoming to the insufficient modeling of temporal and continual video content, and propose a novel and training…
▽ More
Due to the great saving of computation and memory overhead, token compression has become a research hot-spot for MLLMs and achieved remarkable progress in image-language tasks. However, for the video, existing methods still fall short of high-ratio token compression. We attribute this shortcoming to the insufficient modeling of temporal and continual video content, and propose a novel and training-free token pruning method for video MLLMs, termed ForestPrune, which achieves effective and high-ratio pruning via Spatial-temporal Forest Modeling. In practice, ForestPrune construct token forests across video frames based on the semantic, spatial and temporal constraints, making an overall comprehension of videos. Afterwards, ForestPrune evaluates the importance of token trees and nodes based on tree depth and node roles, thereby obtaining a globally optimal pruning decision. To validate ForestPrune, we apply it to two representative video MLLMs, namely LLaVA-Video and LLaVA-OneVision, and conduct extensive experiments on a bunch of video benchmarks. The experimental results not only show the great effectiveness for video MLLMs, e.g., retaining 95.8% average accuracy while reducing 90% tokens for LLaVA-OneVision, but also show its superior performance and efficiency than the compared token compression methods, e.g., +10.1% accuracy on MLVU and -81.4% pruning time than FrameFusion on LLaVA-Video.
△ Less
Submitted 12 April, 2026; v1 submitted 24 March, 2026;
originally announced March 2026.
-
SymCircuit: Bayesian Structure Inference for Tractable Probabilistic Circuits via Entropy-Regularized Reinforcement Learning
Authors:
Y. Sungtaek Ju
Abstract:
Probabilistic circuit (PC) structure learning is hampered by greedy algorithms that make irreversible, locally optimal decisions. We propose SymCircuit, which replaces greedy search with a learned generative policy trained via entropy-regularized reinforcement learning. Instantiating the RL-as-inference framework in the PC domain, we show the optimal policy is a tempered Bayesian posterior, recove…
▽ More
Probabilistic circuit (PC) structure learning is hampered by greedy algorithms that make irreversible, locally optimal decisions. We propose SymCircuit, which replaces greedy search with a learned generative policy trained via entropy-regularized reinforcement learning. Instantiating the RL-as-inference framework in the PC domain, we show the optimal policy is a tempered Bayesian posterior, recovering the exact posterior when the regularization temperature is set inversely proportional to the dataset size. The policy is implemented as SymFormer, a grammar-constrained autoregressive Transformer with tree-relative self-attention that guarantees valid circuits at every generation step. We introduce option-level REINFORCE, restricting gradient updates to structural decisions rather than all tokens, yielding an SNR (signal to noise ratio) improvement and >10 times sample efficiency gain on the NLTCS dataset. A three-layer uncertainty decomposition (structural via model averaging, parametric via the delta method, leaf via conjugate Dirichlet-Categorical propagation) is grounded in the multilinear polynomial structure of PC outputs. On NLTCS, SymCircuit closes 93% of the gap to LearnSPN; preliminary results on Plants (69 variables) suggest scalability.
△ Less
Submitted 20 March, 2026;
originally announced March 2026.
-
SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation
Authors:
Yujie Lu,
Jingwen Li,
Sibo Ju,
Yanzhou Su,
he yao,
Yisong Liu,
Min Zhu,
Junlong Cheng
Abstract:
Medical image segmentation is vital for clinical diagnosis and quantitative analysis, yet remains challenging due to the heterogeneity of imaging modalities and the high cost of pixel-level annotations. Although general interactive segmentation models like SAM have achieved remarkable progress, their transfer to medical imaging still faces two key bottlenecks: (i) the lack of adaptive mechanisms f…
▽ More
Medical image segmentation is vital for clinical diagnosis and quantitative analysis, yet remains challenging due to the heterogeneity of imaging modalities and the high cost of pixel-level annotations. Although general interactive segmentation models like SAM have achieved remarkable progress, their transfer to medical imaging still faces two key bottlenecks: (i) the lack of adaptive mechanisms for modality- and anatomy-specific tasks, which limits generalization in out-of-distribution medical scenarios; and (ii) current medical adaptation methods fine-tune on large, heterogeneous datasets without selection, leading to noisy supervision, higher cost, and negative transfer. To address these issues, we propose SegMoTE, an efficient and adaptive framework for medical image segmentation. SegMoTE preserves SAM's original prompt interface, efficient inference, and zero-shot generalization while introducing only a small number of learnable parameters to dynamically adapt across modalities and tasks. In addition, we design a progressive prompt tokenization mechanism that enables fully automatic segmentation, significantly reducing annotation dependence. Trained on MedSeg-HQ, a curated dataset less than 1% of existing large-scale datasets, SegMoTE achieves SOTA performance across diverse imaging modalities and anatomical tasks. It represents the first efficient, robust, and scalable adaptation of general segmentation models to the medical domain under extremely low annotation cost, advancing the practical deployment of foundation vision models in clinical applications.
△ Less
Submitted 22 February, 2026;
originally announced February 2026.
-
arXiv:2602.17708
[pdf]
physics.chem-ph
astro-ph.IM
cs.LG
physics.ao-ph
physics.comp-ph
physics.plasm-ph
Spectral Homogenization of the Radiative Transfer Equation via Low-Rank Tensor Train Decomposition
Authors:
Y. Sungtaek Ju
Abstract:
Radiative transfer in absorbing-scattering media requires solving a transport equation across a spectral domain with 10^5 - 10^6 molecular absorption lines. Line-by-line (LBL) computation is prohibitively expensive, while existing approximations sacrifice spectral fidelity. We show that the Young-measure homogenization framework produces solution tensors I that admit low-rank tensor-train (TT) dec…
▽ More
Radiative transfer in absorbing-scattering media requires solving a transport equation across a spectral domain with 10^5 - 10^6 molecular absorption lines. Line-by-line (LBL) computation is prohibitively expensive, while existing approximations sacrifice spectral fidelity. We show that the Young-measure homogenization framework produces solution tensors I that admit low-rank tensor-train (TT) decompositions whose bond dimensions remain bounded as the spectral resolution Ns increases. Using molecular line parameters from the HITRAN database for H2O and CO2, we demonstrate that: (i) the TT rank saturates at r = 8 (at tolerance e = 10^-6) from Ns = 16 to 4096, independent of single-scattering albedo, Henyey-Greenstein asymmetry, temperature, and pressure; (ii) quantized tensor-train (QTT) representations achieve sub-linear storage scaling; (iii) in a controlled comparison using identical opacity data and transport solver, the homogenized approach achieves over an order of magnitude lower L2 error than the correlated-k distribution at equal cost; and (iv) for atomic plasma opacity (aluminum at 60 eV, TOPS database), the TT rank saturates at r = 15 with fundamentally different spectral structure (bound-bound and bound-free transitions spanning 12 decades of dynamic range), confirming that rank boundedness is a property of the transport equation rather than any particular opacity source. These results establish that the spectral complexity of radiative transfer has a finite effective rank exploitable by tensor decomposition, complementing the spatial-angular compression achieved by existing TT and dynamical low-rank approaches.
△ Less
Submitted 12 February, 2026;
originally announced February 2026.
-
SymPlex: A Structure-Aware Transformer for Symbolic PDE Solving
Authors:
Yesom Park,
Annie C. Lu,
Shao-Ching Huang,
Qiyang Hu,
Y. Sungtaek Ju,
Stanley Osher
Abstract:
We propose SymPlex, a reinforcement learning framework for discovering analytical symbolic solutions to partial differential equations (PDEs) without access to ground-truth expressions. SymPlex formulates symbolic PDE solving as tree-structured decision-making and optimizes candidate solutions using only the PDE and its boundary conditions. At its core is SymFormer, a structure-aware Transformer t…
▽ More
We propose SymPlex, a reinforcement learning framework for discovering analytical symbolic solutions to partial differential equations (PDEs) without access to ground-truth expressions. SymPlex formulates symbolic PDE solving as tree-structured decision-making and optimizes candidate solutions using only the PDE and its boundary conditions. At its core is SymFormer, a structure-aware Transformer that models hierarchical symbolic dependencies via tree-relative self-attention and enforces syntactic validity through grammar-constrained autoregressive decoding, overcoming the limited expressivity of sequence-based generators. Unlike numerical and neural approaches that approximate solutions in discretized or implicit function spaces, SymPlex operates directly in symbolic expression space, enabling interpretable and human-readable solutions that naturally represent non-smooth behavior and explicit parametric dependence. Empirical results demonstrate exact recovery of non-smooth and parametric PDE solutions using deep learning-based symbolic methods.
△ Less
Submitted 3 February, 2026;
originally announced February 2026.
-
MVP-LAM: Learning Action-Centric Latent Action via Cross-Viewpoint Reconstruction
Authors:
Jung Min Lee,
Dohyeok Lee,
Seokhun Ju,
Taehyun Cho,
Jin Woo Koo,
Li Zhao,
Sangwoo Hong,
Jungwoo Lee
Abstract:
Learning \emph{latent actions} from diverse human videos enables scaling robot learning beyond embodiment-specific robot datasets, and these latent actions have recently been used as pseudo-action labels for vision-language-action (VLA) model pretraining. To make VLA pretraining effective, latent actions should contain information about the underlying agent's actions despite the absence of ground-…
▽ More
Learning \emph{latent actions} from diverse human videos enables scaling robot learning beyond embodiment-specific robot datasets, and these latent actions have recently been used as pseudo-action labels for vision-language-action (VLA) model pretraining. To make VLA pretraining effective, latent actions should contain information about the underlying agent's actions despite the absence of ground-truth labels. We propose \textbf{M}ulti-\textbf{V}iew\textbf{P}oint \textbf{L}atent \textbf{A}ction \textbf{M}odel (\textbf{MVP-LAM}), which learns discrete latent actions that are highly informative about ground-truth actions from time-synchronized multi-view videos. MVP-LAM trains latent actions with a \emph{cross-viewpoint reconstruction} objective, so that a latent action inferred from one view must explain the future in another view, reducing reliance on viewpoint-specific cues. On Bridge V2, MVP-LAM produces more action-centric latent actions, achieving higher mutual information with ground-truth actions and improved action prediction, including under out-of-distribution evaluation. Finally, pretraining VLAs with MVP-LAM latent actions improves downstream manipulation performance on the SIMPLER and LIBERO-Long benchmarks.
△ Less
Submitted 3 February, 2026;
originally announced February 2026.
-
DecisionLLM: Large Language Models for Long Sequence Decision Exploration
Authors:
Xiaowei Lv,
Zhilin Zhang,
Yijun Li,
Yusen Huo,
Siyuan Ju,
Xuyan Li,
Chunxiang Hong,
Tianyu Wang,
Yongcai Wang,
Peng Sun,
Chuan Yu,
Jian Xu,
Bo Zheng
Abstract:
Long-sequence decision-making, which is usually addressed through reinforcement learning (RL), is a critical component for optimizing strategic operations in dynamic environments, such as real-time bidding in computational advertising. The Decision Transformer (DT) introduced a powerful paradigm by framing RL as an autoregressive sequence modeling problem. Concurrently, Large Language Models (LLMs…
▽ More
Long-sequence decision-making, which is usually addressed through reinforcement learning (RL), is a critical component for optimizing strategic operations in dynamic environments, such as real-time bidding in computational advertising. The Decision Transformer (DT) introduced a powerful paradigm by framing RL as an autoregressive sequence modeling problem. Concurrently, Large Language Models (LLMs) have demonstrated remarkable success in complex reasoning and planning tasks. This inspires us whether LLMs, which share the same Transformer foundation, but operate at a much larger scale, can unlock new levels of performance in long-horizon sequential decision-making problem. This work investigates the application of LLMs to offline decision making tasks. A fundamental challenge in this domain is the LLMs' inherent inability to interpret continuous values, as they lack a native understanding of numerical magnitude and order when values are represented as text strings. To address this, we propose treating trajectories as a distinct modality. By learning to align trajectory data with natural language task descriptions, our model can autoregressively predict future decisions within a cohesive framework we term DecisionLLM. We establish a set of scaling laws governing this paradigm, demonstrating that performance hinges on three factors: model scale, data volume, and data quality. In offline experimental benchmarks and bidding scenarios, DecisionLLM achieves strong performance. Specifically, DecisionLLM-3B outperforms the traditional Decision Transformer (DT) by 69.4 on Maze2D umaze-v1 and by 0.085 on AuctionNet. It extends the AIGB paradigm and points to promising directions for future exploration in online bidding.
△ Less
Submitted 15 January, 2026;
originally announced January 2026.
-
Comparison of SCAN+U and r2SCAN+U for Charge Density Wave Instability and Lattice Dynamics in CuTe
Authors:
Seungha Ju,
Sooran Kim
Abstract:
Identifying an appropriate exchange-correlation functional and computational conditions is essential for explaining the fundamental physics of materials and predicting their properties. Here, we investigate the performance of the meta-GGA functionals SCAN and r2SCAN, with and without a Hubbard U, for describing the charge density wave (CDW) in the quasi-one-dimensional material CuTe. By examining…
▽ More
Identifying an appropriate exchange-correlation functional and computational conditions is essential for explaining the fundamental physics of materials and predicting their properties. Here, we investigate the performance of the meta-GGA functionals SCAN and r2SCAN, with and without a Hubbard U, for describing the charge density wave (CDW) in the quasi-one-dimensional material CuTe. By examining the Te-Te bond modulation, phonon dispersions, and electronic structures, we identify clear differences in how the two functionals capture the structural and dynamical properties of the CDW formation. r2SCAN+U reproduces the experimentally observed Te-chain distortions in the CDW phase and the phonon soft mode at qCDW=(0.4, 0.0, 0.5) in the non-CDW phase, whereas SCAN exhibits unphysical phonon behavior. The atomic displacements of the soft mode agree well with the experimental Te modulation. Despite their similar electronic structures and optimized lattice constants, our results demonstrate that r2SCAN is a more suitable choice than SCAN for describing CDW formation and lattice dynamics in CuTe.
△ Less
Submitted 15 January, 2026;
originally announced January 2026.
-
Optical echoes of light near a black hole
Authors:
Suting Ju,
Jingxuan Zhang,
Li-Gang Wang
Abstract:
The light deflection under a strong gravitational field, referred to as strong gravitational lensing, provides a powerful probe of spacetime geometry. Besides, laboratory analogue models are employed to study the effects of curved spacetime and explore the design of optical devices. Here, applying the framework of analogue gravity, we reveal the behavior of the optical echo from a pulsed point-lik…
▽ More
The light deflection under a strong gravitational field, referred to as strong gravitational lensing, provides a powerful probe of spacetime geometry. Besides, laboratory analogue models are employed to study the effects of curved spacetime and explore the design of optical devices. Here, applying the framework of analogue gravity, we reveal the behavior of the optical echo from a pulsed point-like source near a black hole, which is strongly dependent on the interplay of the black hole's photon sphere and the source's duration. We model the Schwarzschild spacetime as a Flamm paraboloid and calculate the echo response, using analytical geodesic solutions and the Huygens-Fresnel principle. Particularly, when the spatial scale of pulse duration is comparable to the photon sphere, continuous ``echo tails" appear along bright interference fringes in temporal response. Analysis in both the temporal and frequency domains reveals that these echo tails are a signature of resonance between the incoming pulse and the photon sphere. This work provides a wave-optics perspective on the interaction between dynamic sources and black holes, offering a table top window on strong gravitational lensing.
△ Less
Submitted 5 January, 2026;
originally announced January 2026.
-
Uncertainty-Aware Flow Field Reconstruction Using SVGP Kolmogorov-Arnold Networks
Authors:
Y. Sungtaek Ju
Abstract:
Reconstructing time-resolved flow fields from temporally sparse velocimetry measurements is critical for characterizing many complex thermal-fluid systems. We introduce a machine learning framework for uncertainty-aware flow reconstruction using sparse variational Gaussian processes in the Kolmogorov-Arnold network topology (SVGP-KAN). This approach extends the classical foundations of Linear Stoc…
▽ More
Reconstructing time-resolved flow fields from temporally sparse velocimetry measurements is critical for characterizing many complex thermal-fluid systems. We introduce a machine learning framework for uncertainty-aware flow reconstruction using sparse variational Gaussian processes in the Kolmogorov-Arnold network topology (SVGP-KAN). This approach extends the classical foundations of Linear Stochastic Estimation (LSE) and Spectral Analysis Modal Methods (SAMM) while enabling principled epistemic uncertainty quantification. We perform a systematic comparison of our framework with the classical reconstruction methods as well as Kalman filtering. Using synthetic data from pulsed impingement jet flows, we assess performance across fractional PIV sampling rates ranging from 0.5% to 10%. Evaluation metrics include reconstruction error, generalization gap, structure preservation, and uncertainty calibration. Our SVGP-KAN methods achieve reconstruction accuracy comparable to established methods, while also providing well-calibrated uncertainty estimates that reliably indicate when and where predictions degrade. The results demonstrate a robust, data-driven framework for flow field reconstruction with meaningful uncertainty quantification and offer practical guidance for experimental design in periodic flows.
△ Less
Submitted 26 December, 2025;
originally announced December 2025.
-
A DeepSeek-Powered AI System for Automated Chest Radiograph Interpretation in Clinical Practice
Authors:
Yaowei Bai,
Ruiheng Zhang,
Yu Lei,
Xuhua Duan,
Jingfeng Yao,
Shuguang Ju,
Chaoyang Wang,
Wei Yao,
Yiwan Guo,
Guilin Zhang,
Chao Wan,
Qian Yuan,
Lei Chen,
Wenjuan Tang,
Biqiang Zhu,
Xinggang Wang,
Tao Sun,
Wei Zhou,
Dacheng Tao,
Yongchao Xu,
Chuansheng Zheng,
Huangxuan Zhao,
Bo Du
Abstract:
A global shortage of radiologists has been exacerbated by the significant volume of chest X-ray workloads, particularly in primary care. Although multimodal large language models show promise, existing evaluations predominantly rely on automated metrics or retrospective analyses, lacking rigorous prospective clinical validation. Janus-Pro-CXR (1B), a chest X-ray interpretation system based on Deep…
▽ More
A global shortage of radiologists has been exacerbated by the significant volume of chest X-ray workloads, particularly in primary care. Although multimodal large language models show promise, existing evaluations predominantly rely on automated metrics or retrospective analyses, lacking rigorous prospective clinical validation. Janus-Pro-CXR (1B), a chest X-ray interpretation system based on DeepSeek Janus-Pro model, was developed and rigorously validated through a multicenter prospective trial (NCT07117266). Our system outperforms state-of-the-art X-ray report generation models in automated report generation, surpassing even larger-scale models including ChatGPT 4o (200B parameters), while demonstrating reliable detection of six clinically critical radiographic findings. Retrospective evaluation confirms significantly higher report accuracy than Janus-Pro and ChatGPT 4o. In prospective clinical deployment, AI assistance significantly improved report quality scores, reduced interpretation time by 18.3% (P < 0.001), and was preferred by a majority of experts in 54.3% of cases. Through lightweight architecture and domain-specific optimization, Janus-Pro-CXR improves diagnostic reliability and workflow efficiency, particularly in resource-constrained settings. The model architecture and implementation framework will be open-sourced to facilitate the clinical translation of AI-assisted radiology solutions.
△ Less
Submitted 23 December, 2025;
originally announced December 2025.
-
Towards Effective Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval
Authors:
Tao Chen,
Shaobo Ju,
Qiong Wu,
Chenxin Fang,
Kun Zhang,
Jun Peng,
Hui Li,
Yiyi Zhou,
Rongrong Ji
Abstract:
Due to excessive memory overhead, most Multimodal Large Language Models (MLLMs) can only process videos of limited frames. In this paper, we propose an effective and efficient paradigm to remedy this shortcoming, termed One-shot video-Clip based Retrieval-Augmented Generation (OneClip-RAG). Compared with existing video RAG methods, OneClip-RAG makes full use of the merits of video clips for augmen…
▽ More
Due to excessive memory overhead, most Multimodal Large Language Models (MLLMs) can only process videos of limited frames. In this paper, we propose an effective and efficient paradigm to remedy this shortcoming, termed One-shot video-Clip based Retrieval-Augmented Generation (OneClip-RAG). Compared with existing video RAG methods, OneClip-RAG makes full use of the merits of video clips for augmented video understanding in terms of both knowledge integrity and semantic coherence. Besides, it is also equipped with a novel query-guided video chunking algorithm that can unify clip chunking and cross-modal retrieval in one processing step, avoiding redundant computations. To improve instruction following, we further propose a new dataset called SynLongVideo and design a progressive training regime for OneClip-RAG. OneClip-RAG is plugged into three recent MLLMs and validated on a set of long-video benchmarks. Experimental results not only show the obvious performance gains by OneClip-RAG over MLLMs, e.g., boosting Qwen3-VL 8B to the level of GPT-5 on MLVU, but also show its superior efficiency in handling long videos. e.g., enabling LLaVA-Video understand up to an hour of videos in less than 1.2 minutes on a single 4090 GPU.
△ Less
Submitted 9 April, 2026; v1 submitted 9 December, 2025;
originally announced December 2025.
-
Uncertainty Quantification for Scientific Machine Learning using Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN)
Authors:
Y. Sungtaek Ju
Abstract:
Kolmogorov-Arnold Networks have emerged as interpretable alternatives to traditional multi-layer perceptrons. However, standard implementations lack principled uncertainty quantification capabilities essential for many scientific applications. We present a framework integrating sparse variational Gaussian process inference with the Kolmogorov-Arnold topology, enabling scalable Bayesian inference w…
▽ More
Kolmogorov-Arnold Networks have emerged as interpretable alternatives to traditional multi-layer perceptrons. However, standard implementations lack principled uncertainty quantification capabilities essential for many scientific applications. We present a framework integrating sparse variational Gaussian process inference with the Kolmogorov-Arnold topology, enabling scalable Bayesian inference with computational complexity quasi-linear in sample size. Through analytic moment matching, we propagate uncertainty through deep additive structures while maintaining interpretability. We use three example studies to demonstrate the framework's ability to distinguish aleatoric from epistemic uncertainty: calibration of heteroscedastic measurement noise in fluid flow reconstruction, quantification of prediction confidence degradation in multi-step forecasting of advection-diffusion dynamics, and out-of-distribution detection in convolutional autoencoders. These results suggest Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KANs) is a promising architecture for uncertainty-aware learning in scientific machine learning.
△ Less
Submitted 9 December, 2025; v1 submitted 4 December, 2025;
originally announced December 2025.
-
Scalable and Interpretable Scientific Discovery via Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN)
Authors:
Y. Sungtaek Ju
Abstract:
Kolmogorov-Arnold Networks (KANs) offer a promising alternative to Multi-Layer Perceptron (MLP) by placing learnable univariate functions on network edges, enhancing interpretability. However, standard KANs lack probabilistic outputs, limiting their utility in applications requiring uncertainty quantification. While recent Gaussian Process (GP) extensions to KANs address this, they utilize exact i…
▽ More
Kolmogorov-Arnold Networks (KANs) offer a promising alternative to Multi-Layer Perceptron (MLP) by placing learnable univariate functions on network edges, enhancing interpretability. However, standard KANs lack probabilistic outputs, limiting their utility in applications requiring uncertainty quantification. While recent Gaussian Process (GP) extensions to KANs address this, they utilize exact inference methods that scale cubically with data size N, restricting their application to smaller datasets. We introduce the Sparse Variational GP-KAN (SVGP-KAN), an architecture that integrates sparse variational inference with the KAN topology. By employing $M$ inducing points and analytic moment matching, our method reduces computational complexity from $O(N^3)$ to $O(NM^2)$ or linear in sample size, enabling the application of probabilistic KANs to larger scientific datasets. Furthermore, we demonstrate that integrating a permutation-based importance analysis enables the network to function as a framework for structural identification, identifying relevant inputs and classifying functional relationships.
△ Less
Submitted 28 November, 2025;
originally announced December 2025.
-
Learning Generalizable Visuomotor Policy through Dynamics-Alignment
Authors:
Dohyeok Lee,
Jung Min Lee,
Munkyung Kim,
Seokhun Ju,
Jin Woo Koo,
Kyungjae Lee,
Dohyeong Kim,
TaeHyun Cho,
Jungwoo Lee
Abstract:
Behavior cloning methods for robot learning suffer from poor generalization due to limited data support beyond expert demonstrations. Recent approaches leveraging video prediction models have shown promising results by learning rich spatiotemporal representations from large-scale datasets. However, these models learn action-agnostic dynamics that cannot distinguish between different control inputs…
▽ More
Behavior cloning methods for robot learning suffer from poor generalization due to limited data support beyond expert demonstrations. Recent approaches leveraging video prediction models have shown promising results by learning rich spatiotemporal representations from large-scale datasets. However, these models learn action-agnostic dynamics that cannot distinguish between different control inputs, limiting their utility for precise manipulation tasks and requiring large pretraining datasets. We propose a Dynamics-Aligned Flow Matching Policy (DAP) that integrates dynamics prediction into policy learning. Our method introduces a novel architecture where policy and dynamics models provide mutual corrective feedback during action generation, enabling self-correction and improved generalization. Empirical validation demonstrates generalization performance superior to baseline methods on real-world robotic manipulation tasks, showing particular robustness in OOD scenarios including visual distractions and lighting variations.
△ Less
Submitted 30 October, 2025;
originally announced October 2025.
-
Optimizing Cross-Domain Transfer for Universal Machine Learning Interatomic Potentials
Authors:
Jaesun Kim,
Jinmu You,
Yutack Park,
Yunsung Lim,
Yujin Kang,
Jisu Kim,
Haekwan Jeon,
Suyeon Ju,
Deokgi Hong,
Seung Yul Lee,
Saerom Choi,
Yongdeok Kim,
Jae W. Lee,
Seungwu Han
Abstract:
Accurate yet transferable machine-learning interatomic potentials (MLIPs) are essential for accelerating materials and chemical discovery. However, most universal MLIPs overfit to narrow datasets or computational protocols, limiting their reliability across chemical and functional domains. We introduce a transferable multi-domain training strategy that jointly optimizes universal and task-specific…
▽ More
Accurate yet transferable machine-learning interatomic potentials (MLIPs) are essential for accelerating materials and chemical discovery. However, most universal MLIPs overfit to narrow datasets or computational protocols, limiting their reliability across chemical and functional domains. We introduce a transferable multi-domain training strategy that jointly optimizes universal and task-specific parameters through selective regularization, coupled with a domain-bridging set (DBS) that aligns potential-energy surfaces across datasets. Systematic ablation experiments show that small DBS fractions (0.1%) and targeted regularization synergistically enhance out-of-distribution generalization while preserving in-domain fidelity. Trained on fifteen open databases spanning molecules, crystals, and surfaces, our model, SevenNet-Omni, achieves state-of-the-art cross-domain accuracy, including adsorption-energy errors below 0.06 eV on metallic surfaces and 0.1 eV on metal-organic frameworks. Despite containing only 0.5% r$^2$SCAN data, SevenNet-Omni reproduces high-fidelity r$^2$SCAN energetics, demonstrating effective cross-functional transfer from large PBE datasets. This framework offers a scalable route toward universal, transferable MLIPs that bridge quantum-mechanical fidelities and chemical domains.
△ Less
Submitted 6 November, 2025; v1 submitted 13 October, 2025;
originally announced October 2025.
-
CECT-Mamba: a Hierarchical Contrast-enhanced-aware Model for Pancreatic Tumor Subtyping from Multi-phase CECT
Authors:
Zhifang Gong,
Shuo Gao,
Ben Zhao,
Yingjing Xu,
Yijun Yang,
Shenghong Ju,
Guangquan Zhou
Abstract:
Contrast-enhanced computed tomography (CECT) is the primary imaging technique that provides valuable spatial-temporal information about lesions, enabling the accurate diagnosis and subclassification of pancreatic tumors. However, the high heterogeneity and variability of pancreatic tumors still pose substantial challenges for precise subtyping diagnosis. Previous methods fail to effectively explor…
▽ More
Contrast-enhanced computed tomography (CECT) is the primary imaging technique that provides valuable spatial-temporal information about lesions, enabling the accurate diagnosis and subclassification of pancreatic tumors. However, the high heterogeneity and variability of pancreatic tumors still pose substantial challenges for precise subtyping diagnosis. Previous methods fail to effectively explore the contextual information across multiple CECT phases commonly used in radiologists' diagnostic workflows, thereby limiting their performance. In this paper, we introduce, for the first time, an automatic way to combine the multi-phase CECT data to discriminate between pancreatic tumor subtypes, among which the key is using Mamba with promising learnability and simplicity to encourage both temporal and spatial modeling from multi-phase CECT. Specifically, we propose a dual hierarchical contrast-enhanced-aware Mamba module incorporating two novel spatial and temporal sampling sequences to explore intra and inter-phase contrast variations of lesions. A similarity-guided refinement module is also imposed into the temporal scanning modeling to emphasize the learning on local tumor regions with more obvious temporal variations. Moreover, we design the space complementary integrator and multi-granularity fusion module to encode and aggregate the semantics across different scales, achieving more efficient learning for subtyping pancreatic tumors. The experimental results on an in-house dataset of 270 clinical cases achieve an accuracy of 97.4% and an AUC of 98.6% in distinguishing between pancreatic ductal adenocarcinoma (PDAC) and pancreatic neuroendocrine tumors (PNETs), demonstrating its potential as a more accurate and efficient tool.
△ Less
Submitted 16 September, 2025;
originally announced September 2025.
-
From Bench to Bedside: A DeepSeek-Powered AI System for Automated Chest Radiograph Interpretation in Clinical Practice
Authors:
Yaowei Bai,
Ruiheng Zhang,
Yu Lei,
Jingfeng Yao,
Shuguang Ju,
Chaoyang Wang,
Wei Yao,
Yiwan Guo,
Guilin Zhang,
Chao Wan,
Qian Yuan,
Xuhua Duan,
Xinggang Wang,
Tao Sun,
Yongchao Xu,
Chuansheng Zheng,
Huangxuan Zhao,
Bo Du
Abstract:
A global shortage of radiologists has been exacerbated by the significant volume of chest X-ray workloads, particularly in primary care. Although multimodal large language models show promise, existing evaluations predominantly rely on automated metrics or retrospective analyses, lacking rigorous prospective clinical validation. Janus-Pro-CXR (1B), a chest X-ray interpretation system based on Deep…
▽ More
A global shortage of radiologists has been exacerbated by the significant volume of chest X-ray workloads, particularly in primary care. Although multimodal large language models show promise, existing evaluations predominantly rely on automated metrics or retrospective analyses, lacking rigorous prospective clinical validation. Janus-Pro-CXR (1B), a chest X-ray interpretation system based on DeepSeek Janus-Pro model, was developed and rigorously validated through a multicenter prospective trial (NCT06874647). Our system outperforms state-of-the-art X-ray report generation models in automated report generation, surpassing even larger-scale models including ChatGPT 4o (200B parameters), while demonstrating robust detection of eight clinically critical radiographic findings (area under the curve, AUC > 0.8). Retrospective evaluation confirms significantly higher report accuracy than Janus-Pro and ChatGPT 4o. In prospective clinical deployment, AI assistance significantly improved report quality scores (4.37 vs. 4.11, P < 0.001), reduced interpretation time by 18.5% (P < 0.001), and was preferred by a majority of experts (3 out of 5) in 52.7% of cases. Through lightweight architecture and domain-specific optimization, Janus-Pro-CXR improves diagnostic reliability and workflow efficiency, particularly in resource-constrained settings. The model architecture and implementation framework will be open-sourced to facilitate the clinical translation of AI-assisted radiology solutions.
△ Less
Submitted 31 May, 2025;
originally announced July 2025.
-
Dynamics of thin film flows on a vertical fibre with vapor absorption
Authors:
Souradip Chattopadhyay,
Zihao Yu,
Y. Sungtaek Ju,
Hangjie Ji
Abstract:
Water vapor capture through free surface flows plays a crucial role in various industrial applications, such as liquid desiccant air conditioning systems, water harvesting, and dewatering. This paper studies the dynamics of a silicone liquid sorbent (also known as water-absorbing silicone oil) flowing down a vertical cylindrical fibre while absorbing water vapor. We propose a one-sided thin-film-t…
▽ More
Water vapor capture through free surface flows plays a crucial role in various industrial applications, such as liquid desiccant air conditioning systems, water harvesting, and dewatering. This paper studies the dynamics of a silicone liquid sorbent (also known as water-absorbing silicone oil) flowing down a vertical cylindrical fibre while absorbing water vapor. We propose a one-sided thin-film-type model for these dynamics, where the governing equations form a coupled system of nonlinear fourth-order partial differential equations for the liquid film thickness and oil concentration. The model incorporates gravity, surface tension, Marangoni effects induced by concentration gradients, and non-mass-conserving effects due to absorption flux. Interfacial instabilities, driven by the competition between mass-conserving and non-mass-conserving effects, are investigated via stability analysis. We numerically show that water absorption can lead to the formation of irregular wavy patterns and trigger droplet coalescence downstream. Systematic simulations further identify parameter ranges for the Marangoni number and absorption parameter that lead to the onset of droplet coalescence dynamics and regime transitions.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Enhancing LLMs' Reasoning-Intensive Multimedia Search Capabilities through Fine-Tuning and Reinforcement Learning
Authors:
Jinzheng Li,
Sibo Ju,
Yanzhou Su,
Hongguang Li,
Yiqing Shen
Abstract:
Existing large language models (LLMs) driven search agents typically rely on prompt engineering to decouple the user queries into search plans, limiting their effectiveness in complex scenarios requiring reasoning. Furthermore, they suffer from excessive token consumption due to Python-based search plan representations and inadequate integration of multimedia elements for both input processing and…
▽ More
Existing large language models (LLMs) driven search agents typically rely on prompt engineering to decouple the user queries into search plans, limiting their effectiveness in complex scenarios requiring reasoning. Furthermore, they suffer from excessive token consumption due to Python-based search plan representations and inadequate integration of multimedia elements for both input processing and response generation. To address these challenges, we introduce SearchExpert, a training method for LLMs to improve their multimedia search capabilities in response to complex search queries. Firstly, we reformulate the search plan in an efficient natural language representation to reduce token consumption. Then, we propose the supervised fine-tuning for searching (SFTS) to fine-tune LLM to adapt to these representations, together with an automated dataset construction pipeline. Secondly, to improve reasoning-intensive search capabilities, we propose the reinforcement learning from search feedback (RLSF) that takes the search results planned by LLM as the reward signals. Thirdly, we propose a multimedia understanding and generation agent that enables the fine-tuned LLM to process visual input and produce visual output during inference. Finally, we establish an automated benchmark construction pipeline and a human evaluation framework. Our resulting benchmark, SearchExpertBench-25, comprises 200 multiple-choice questions spanning financial and international news scenarios that require reasoning in searching. Experiments demonstrate that SearchExpert outperforms the commercial LLM search method (Perplexity Pro) by 36.60% on the existing FinSearchBench-24 benchmark and 54.54% on our proposed SearchExpertBench-25. Human evaluations further confirm the superior readability.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Authors:
Taehyun Cho,
Seokhun Ju,
Seungyub Han,
Dohyeong Kim,
Kyungjae Lee,
Jungwoo Lee
Abstract:
To design rewards that align with human goals, Reinforcement Learning from Human Feedback (RLHF) has emerged as a prominent technique for learning reward functions from human preferences and optimizing policies via reinforcement learning algorithms. However, existing RLHF methods often misinterpret trajectories as being generated by an optimal policy, causing inaccurate likelihood estimation and s…
▽ More
To design rewards that align with human goals, Reinforcement Learning from Human Feedback (RLHF) has emerged as a prominent technique for learning reward functions from human preferences and optimizing policies via reinforcement learning algorithms. However, existing RLHF methods often misinterpret trajectories as being generated by an optimal policy, causing inaccurate likelihood estimation and suboptimal learning. Inspired by Direct Preference Optimization framework which directly learns optimal policy without explicit reward, we propose policy-labeled preference learning (PPL), to resolve likelihood mismatch issues by modeling human preferences with regret, which reflects behavior policy information. We also provide a contrastive KL regularization, derived from regret-based principles, to enhance RLHF in sequential decision making. Experiments in high-dimensional continuous control tasks demonstrate PPL's significant improvements in offline RLHF performance and its effectiveness in online settings.
△ Less
Submitted 13 May, 2025; v1 submitted 6 May, 2025;
originally announced May 2025.
-
GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning
Authors:
Yanzhou Su,
Tianbin Li,
Jiyao Liu,
Chenglong Ma,
Junzhi Ning,
Cheng Tang,
Sibo Ju,
Jin Ye,
Pengcheng Chen,
Ming Hu,
Shixiang Tang,
Lihao Liu,
Bin Fu,
Wenqi Shao,
Xiaowei Hu,
Xiangwen Liao,
Yuanfeng Ji,
Junjun He
Abstract:
Recent advances in general medical AI have made significant strides, but existing models often lack the reasoning capabilities needed for complex medical decision-making. This paper presents GMAI-VL-R1, a multimodal medical reasoning model enhanced by reinforcement learning (RL) to improve its reasoning abilities. Through iterative training, GMAI-VL-R1 optimizes decision-making, significantly boos…
▽ More
Recent advances in general medical AI have made significant strides, but existing models often lack the reasoning capabilities needed for complex medical decision-making. This paper presents GMAI-VL-R1, a multimodal medical reasoning model enhanced by reinforcement learning (RL) to improve its reasoning abilities. Through iterative training, GMAI-VL-R1 optimizes decision-making, significantly boosting diagnostic accuracy and clinical support. We also develop a reasoning data synthesis method, generating step-by-step reasoning data via rejection sampling, which further enhances the model's generalization. Experimental results show that after RL training, GMAI-VL-R1 excels in tasks such as medical image diagnosis and visual question answering. While the model demonstrates basic memorization with supervised fine-tuning, RL is crucial for true generalization. Our work establishes new evaluation benchmarks and paves the way for future advancements in medical reasoning models. Code, data, and model will be released at \href{https://github.com/uni-medical/GMAI-VL-R1}{this link}.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation
Authors:
Zhi Qin,
Qianhui Gui,
Mouxiao Bian,
Rui Wang,
Hong Ge,
Dandan Yao,
Ziying Sun,
Yuan Zhao,
Yu Zhang,
Hui Shi,
Dongdong Wang,
Chenxin Song,
Shenghong Ju,
Lihao Liu,
Junjun He,
Jie Xu,
Yuan-Cheng Wang
Abstract:
Medical imaging quality control (QC) is essential for accurate diagnosis, yet traditional QC methods remain labor-intensive and subjective. To address this challenge, in this study, we establish a standardized dataset and evaluation framework for medical imaging QC, systematically assessing large language models (LLMs) in image quality assessment and report standardization. Specifically, we first…
▽ More
Medical imaging quality control (QC) is essential for accurate diagnosis, yet traditional QC methods remain labor-intensive and subjective. To address this challenge, in this study, we establish a standardized dataset and evaluation framework for medical imaging QC, systematically assessing large language models (LLMs) in image quality assessment and report standardization. Specifically, we first constructed and anonymized a dataset of 161 chest X-ray (CXR) radiographs and 219 CT reports for evaluation. Then, multiple LLMs, including Gemini 2.0-Flash, GPT-4o, and DeepSeek-R1, were evaluated based on recall, precision, and F1 score to detect technical errors and inconsistencies. Experimental results show that Gemini 2.0-Flash achieved a Macro F1 score of 90 in CXR tasks, demonstrating strong generalization but limited fine-grained performance. DeepSeek-R1 excelled in CT report auditing with a 62.23\% recall rate, outperforming other models. However, its distilled variants performed poorly, while InternLM2.5-7B-chat exhibited the highest additional discovery rate, indicating broader but less precise error detection. These findings highlight the potential of LLMs in medical imaging QC, with DeepSeek-R1 and Gemini 2.0-Flash demonstrating superior performance.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
GoRA: Gradient-driven Adaptive Low Rank Adaptation
Authors:
Haonan He,
Peng Ye,
Yuchen Ren,
Yuan Yuan,
Luyang Zhou,
Shucun Ju,
Lei Chen
Abstract:
Low-Rank Adaptation (LoRA) is a crucial method for efficiently fine-tuning large language models (LLMs), with its effectiveness influenced by two key factors: rank selection and weight initialization. While numerous LoRA variants have been proposed to improve performance by addressing one of these aspects, they often compromise usability or computational efficiency. In this paper, we analyze and i…
▽ More
Low-Rank Adaptation (LoRA) is a crucial method for efficiently fine-tuning large language models (LLMs), with its effectiveness influenced by two key factors: rank selection and weight initialization. While numerous LoRA variants have been proposed to improve performance by addressing one of these aspects, they often compromise usability or computational efficiency. In this paper, we analyze and identify the core limitations of existing approaches and propose a novel framework--GoRA (Gradient-driven Adaptive Low Rank Adaptation)--that simultaneously adapts both the rank and initialization strategy within a unified framework. GoRA leverages gradient information during training to dynamically assign optimal ranks and initialize low-rank adapter weights in an adaptive manner. To our knowledge, GoRA is the first method that not only addresses the limitations of prior approaches--which often focus on either rank selection or initialization in isolation--but also unifies both aspects within a single framework, enabling more effective and efficient adaptation. Extensive experiments across various architectures and modalities show that GoRA consistently outperforms existing LoRA-based methods while preserving the efficiency of vanilla LoRA. For example, when fine-tuning Llama3.1-8B-Base for mathematical reasoning, GoRA achieves a 5.13-point improvement over standard LoRA and even outperforms full fine-tuning by 2.05 points under high-rank settings. Code is available at: https://github.com/hhnqqq/MyTransformers.
△ Less
Submitted 24 October, 2025; v1 submitted 13 February, 2025;
originally announced February 2025.
-
Application of pretrained universal machine-learning interatomic potential for physicochemical simulation of liquid electrolytes in Li-ion battery
Authors:
Suyeon Ju,
Jinmu You,
Gijin Kim,
Yutack Park,
Hyungmin An,
Seungwu Han
Abstract:
Achieving higher operational voltages, faster charging, and broader temperature ranges for Li-ion batteries necessitates advancements in electrolyte engineering. However, the complexity of optimizing combinations of solvents, salts, and additives has limited the effectiveness of both experimental and computational screening methods for liquid electrolytes. Recently, pretrained universal machine-le…
▽ More
Achieving higher operational voltages, faster charging, and broader temperature ranges for Li-ion batteries necessitates advancements in electrolyte engineering. However, the complexity of optimizing combinations of solvents, salts, and additives has limited the effectiveness of both experimental and computational screening methods for liquid electrolytes. Recently, pretrained universal machine-learning interatomic potentials (MLIPs) have emerged as promising tools for computational exploration of complex chemical spaces with high accuracy and efficiency. In this study, we evaluated the performance of the state-of-the-art equivariant pretrained MLIP, SevenNet-0, in predicting key properties of liquid electrolytes, including solvation behavior, density, and ion transport. To assess its suitability for extensive material screening, we considered a dataset comprising 20 solvents. Although SevenNet-0 was predominantly trained on inorganic compounds, its predictions for the properties of liquid electrolytes showed good agreement with experimental and $\textit{ab initio}$ data. However, systematic errors were identified, particularly in the predicted density of liquid electrolytes. To address this limitation, we fine-tuned SevenNet-0, achieving improved accuracy at a significantly reduced computational cost compared to developing bespoke models. Analysis of the training set suggested that the model achieved its accuracy by generalizing across the chemical space rather than memorizing specific configurations. This work highlights the potential of SevenNet-0 as a powerful tool for future engineering of liquid electrolyte systems.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.
-
Magnetically tuned topological phase in graphene nanoribbon heterojunctions
Authors:
Wei-Jian Li,
Da-Fei Sun,
Sheng Ju,
Ai-Lei He,
Yuan Zhou
Abstract:
The interplay between topology and magnetism often triggers the exotic quantum phases. Here, we report an accessible scheme to engineer the robust $\mathbb{Z}_{2}$ topology by intrinsic magnetism, originating from the zigzag segment connecting two armchair segments with different width, in one-dimensional graphene nanoribbon heterojunctions. Our first-principle and model simulations reveal that th…
▽ More
The interplay between topology and magnetism often triggers the exotic quantum phases. Here, we report an accessible scheme to engineer the robust $\mathbb{Z}_{2}$ topology by intrinsic magnetism, originating from the zigzag segment connecting two armchair segments with different width, in one-dimensional graphene nanoribbon heterojunctions. Our first-principle and model simulations reveal that the emergent spin polarization substantially modifies the dimerization between junction states, forming the special SSH mechanism depending on the magnetic configurations. Interestingly, the topological phase in magnetic state is only determined by the width of the narrow armchair segment, in sharp contrast with that in the normal state. In addition, the emergent magnetism increases the bulk energy band gap by an order of magnitude than that in the nonmagnetic state. We also discuss the $\mathbb{Z}$ topology of the junction states and the termination-dependent of topological end states. Our results bring new way to tune the topology in graphene nanoribbon heterostructure, providing a new platform for future one-dimensional topological devices and molecular-scale spintronics.
△ Less
Submitted 1 December, 2024;
originally announced December 2024.
-
Off-Policy Selection for Initiating Human-Centric Experimental Design
Authors:
Ge Gao,
Xi Yang,
Qitong Gao,
Song Ju,
Miroslav Pajic,
Min Chi
Abstract:
In human-centric tasks such as healthcare and education, the heterogeneity among patients and students necessitates personalized treatments and instructional interventions. While reinforcement learning (RL) has been utilized in those tasks, off-policy selection (OPS) is pivotal to close the loop by offline evaluating and selecting policies without online interactions, yet current OPS methods often…
▽ More
In human-centric tasks such as healthcare and education, the heterogeneity among patients and students necessitates personalized treatments and instructional interventions. While reinforcement learning (RL) has been utilized in those tasks, off-policy selection (OPS) is pivotal to close the loop by offline evaluating and selecting policies without online interactions, yet current OPS methods often overlook the heterogeneity among participants. Our work is centered on resolving a pivotal challenge in human-centric systems (HCSs): how to select a policy to deploy when a new participant joining the cohort, without having access to any prior offline data collected over the participant? We introduce First-Glance Off-Policy Selection (FPS), a novel approach that systematically addresses participant heterogeneity through sub-group segmentation and tailored OPS criteria to each sub-group. By grouping individuals with similar traits, FPS facilitates personalized policy selection aligned with unique characteristics of each participant or group of participants. FPS is evaluated via two important but challenging applications, intelligent tutoring systems and a healthcare application for sepsis treatment and intervention. FPS presents significant advancement in enhancing learning outcomes of students and in-hospital care outcomes.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
Momentum-Resolved Fingerprint of Mottness in Layer-Dimerized Nb$_3$Br$_8$
Authors:
Mihir Date,
Francesco Petocchi,
Yun Yen,
Jonas A. Krieger,
Banabir Pal,
Vicky Hasse,
Emily C. McFarlane,
Chris Körner,
Jiho Yoon,
Matthew D. Watson,
Vladimir N. Strocov,
Yuanfeng Xu,
Ilya Kostanovski,
Mazhar N. Ali,
Sailong Ju,
Nicholas C. Plumb,
Michael A. Sentef,
Georg Woltersdorf,
Michael Schüler,
Philipp Werner,
Claudia Felser,
Stuart S. P. Parkin,
Niels B. M. Schröter
Abstract:
In a well-ordered crystalline solid, insulating behaviour can arise from two mechanisms: electrons can either scatter off a periodic potential, thus forming band gaps that can lead to a band insulator, or they localize due to strong interactions, resulting in a Mott insulator. For an even number of electrons per unit cell, either band- or Mott-insulators can theoretically occur. However, unambiguo…
▽ More
In a well-ordered crystalline solid, insulating behaviour can arise from two mechanisms: electrons can either scatter off a periodic potential, thus forming band gaps that can lead to a band insulator, or they localize due to strong interactions, resulting in a Mott insulator. For an even number of electrons per unit cell, either band- or Mott-insulators can theoretically occur. However, unambiguously identifying an unconventional Mott-insulator with an even number of electrons experimentally has remained a longstanding challenge due to the lack of a momentum-resolved fingerprint. This challenge has recently become pressing for the layer dimerized van der Waals compound Nb$_3$Br$_8$, which exhibits a puzzling magnetic field-free diode effect when used as a weak link in Josephson junctions, but has previously been considered to be a band-insulator. In this work, we present a unique momentum-resolved signature of a Mott-insulating phase in the spectral function of Nb$_3$Br$_8$: the top of the highest occupied band along the out-of-plane dimerization direction $k_z$ has a momentum space separation of $Δk_z=2π/d$, whereas the valence band maximum of a band insulator would be separated by less than $Δk_z=π/d$, where $d$ is the average spacing between the layers. As the strong electron correlations inherent in Mott insulators can lead to unconventional superconductivity, identifying Nb$_3$Br$_8$ as an unconventional Mott-insulator is crucial for understanding its apparent time-reversal symmetry breaking Josephson diode effect. Moreover, the momentum-resolved signature employed here could be used to detect quantum phase transition between band- and Mott-insulating phases in van der Waals heterostructures, where interlayer interactions and correlations can be easily tuned to drive such transition.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis
Authors:
Shiyu Wang,
Jiawei Li,
Xiaoming Shi,
Zhou Ye,
Baichuan Mo,
Wenze Lin,
Shengtong Ju,
Zhixuan Chu,
Ming Jin
Abstract:
Time series analysis plays a critical role in numerous applications, supporting tasks such as forecasting, classification, anomaly detection, and imputation. In this work, we present the time series pattern machine (TSPM), a model designed to excel in a broad range of time series tasks through powerful representation and pattern extraction capabilities. Traditional time series models often struggl…
▽ More
Time series analysis plays a critical role in numerous applications, supporting tasks such as forecasting, classification, anomaly detection, and imputation. In this work, we present the time series pattern machine (TSPM), a model designed to excel in a broad range of time series tasks through powerful representation and pattern extraction capabilities. Traditional time series models often struggle to capture universal patterns, limiting their effectiveness across diverse tasks. To address this, we define multiple scales in the time domain and various resolutions in the frequency domain, employing various mixing strategies to extract intricate, task-adaptive time series patterns. Specifically, we introduce a general-purpose TSPM that processes multi-scale time series using (1) multi-resolution time imaging (MRTI), (2) time image decomposition (TID), (3) multi-scale mixing (MCM), and (4) multi-resolution mixing (MRM) to extract comprehensive temporal patterns. MRTI transforms multi-scale time series into multi-resolution time images, capturing patterns across both temporal and frequency domains. TID leverages dual-axis attention to extract seasonal and trend patterns, while MCM hierarchically aggregates these patterns across scales. MRM adaptively integrates all representations across resolutions. This method achieves state-of-the-art performance across 8 time series analytical tasks, consistently surpassing both general-purpose and task-specific models. Our work marks a promising step toward the next generation of TSPMs, paving the way for further advancements in time series analysis.
△ Less
Submitted 19 May, 2025; v1 submitted 21 October, 2024;
originally announced October 2024.
-
Quantum-Confined Tunable Ferromagnetism on the Surface of a van der Waals Antiferromagnet NaCrTe2
Authors:
Yidian Li,
Xian Du,
Junjie Wang,
Runzhe Xu,
Wenxuan Zhao,
Kaiyi Zhai,
Jieyi Liu,
Houke Chen,
Yiheng Yang,
Nicolas C. Plumb,
Sailong Ju,
Ming Shi,
Zhongkai Liu,
Jiangang Guo,
Xiaolong Chen,
Yulin Chen,
Lexian Yang
Abstract:
The surface of three-dimensional materials provides an ideal and versatile platform to explore quantum-confined physics. Here, we systematically investigate the electronic structure of Na-intercalated CrTe2, a van der Waals antiferromagnet, using angle-resolved photoemission spectroscopy and ab-initio calculations. The measured band structure deviates from the calculation of bulk NaCrTe2 but agree…
▽ More
The surface of three-dimensional materials provides an ideal and versatile platform to explore quantum-confined physics. Here, we systematically investigate the electronic structure of Na-intercalated CrTe2, a van der Waals antiferromagnet, using angle-resolved photoemission spectroscopy and ab-initio calculations. The measured band structure deviates from the calculation of bulk NaCrTe2 but agrees with that of ferromagnetic monolayer CrTe2. Consistently, we observe an unexpected exchange splitting of the band dispersions, persisting well above the Néel temperature of bulk NaCrTe2. We argue that NaCrTe2 features a quantum-confined 2D ferromagnetic state in the topmost surface layer due to strong ferromagnetic correlation in the CrTe2 layer. Moreover, the exchange splitting and the critical temperature can be controlled by surface doping of alkali-metal atoms, suggesting a feasible tunability of the surface ferromagnetism. Our work not only presents a simple platform to explore tunable 2D ferromagnetism but also provides important insights into the quantum-confined low-dimensional magnetic states.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
Shadow Augmentation for Handwashing Action Recognition: from Synthetic to Real Datasets
Authors:
Shengtai Ju,
Amy R. Reibman
Abstract:
Video analytics systems designed for deployment in outdoor conditions can be vulnerable to many environmental changes, particularly changes in shadow. Existing works have shown that shadow and its introduced distribution shift can cause system performance to degrade sharply. In this paper, we explore mitigation strategies to shadow-induced breakdown points of an action recognition system, using th…
▽ More
Video analytics systems designed for deployment in outdoor conditions can be vulnerable to many environmental changes, particularly changes in shadow. Existing works have shown that shadow and its introduced distribution shift can cause system performance to degrade sharply. In this paper, we explore mitigation strategies to shadow-induced breakdown points of an action recognition system, using the specific application of handwashing action recognition for improving food safety. Using synthetic data, we explore the optimal shadow attributes to be included when training an action recognition system in order to improve performance under different shadow conditions. Experimental results indicate that heavier and larger shadow is more effective at mitigating the breakdown points. Building upon this observation, we propose a shadow augmentation method to be applied to real-world data. Results demonstrate the effectiveness of the shadow augmentation method for model training and consistency of its effectiveness across different neural network architectures and datasets.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation
Authors:
Taehyun Cho,
Seungyub Han,
Seokhun Ju,
Dohyeong Kim,
Kyungjae Lee,
Jungwoo Lee
Abstract:
Distributional reinforcement learning improves performance by capturing environmental stochasticity, but a comprehensive theoretical understanding of its effectiveness remains elusive. In addition, the intractable element of the infinite dimensionality of distributions has been overlooked. In this paper, we present a regret analysis of distributional reinforcement learning with general value funct…
▽ More
Distributional reinforcement learning improves performance by capturing environmental stochasticity, but a comprehensive theoretical understanding of its effectiveness remains elusive. In addition, the intractable element of the infinite dimensionality of distributions has been overlooked. In this paper, we present a regret analysis of distributional reinforcement learning with general value function approximation in a finite episodic Markov decision process setting. We first introduce a key notion of $\textit{Bellman unbiasedness}$ which is essential for exactly learnable and provably efficient distributional updates in an online manner. Among all types of statistical functionals for representing infinite-dimensional return distributions, our theoretical results demonstrate that only moment functionals can exactly capture the statistical information. Secondly, we propose a provably efficient algorithm, $\texttt{SF-LSVI}$, that achieves a tight regret bound of $\tilde{O}(d_E H^{\frac{3}{2}}\sqrt{K})$ where $H$ is the horizon, $K$ is the number of episodes, and $d_E$ is the eluder dimension of a function class.
△ Less
Submitted 13 May, 2025; v1 submitted 30 July, 2024;
originally announced July 2024.
-
Exploring the Impact of Hand Pose and Shadow on Hand-washing Action Recognition
Authors:
Shengtai Ju,
Amy R. Reibman
Abstract:
In the real world, camera-based application systems can face many challenges, including environmental factors and distribution shift. In this paper, we investigate how pose and shadow impact a classifier's performance, using the specific application of handwashing action recognition. To accomplish this, we generate synthetic data with desired variations to introduce controlled distribution shift.…
▽ More
In the real world, camera-based application systems can face many challenges, including environmental factors and distribution shift. In this paper, we investigate how pose and shadow impact a classifier's performance, using the specific application of handwashing action recognition. To accomplish this, we generate synthetic data with desired variations to introduce controlled distribution shift. Using our synthetic dataset, we define a classifier's breakdown points to be where the system's performance starts to degrade sharply, and we show these are heavily impacted by pose and shadow conditions. In particular, heavier and larger shadows create earlier breakdown points. Also, it is intriguing to observe model accuracy drop to almost zero with bigger changes in pose. Moreover, we propose a simple mitigation strategy for pose-induced breakdown points by utilizing additional training data from non-canonical poses. Results show that the optimal choices of additional training poses are those with moderate deviations from the canonical poses with 50-60 degrees of rotation.
△ Less
Submitted 19 June, 2024;
originally announced July 2024.
-
High-throughput discovery of metal oxides with high thermoelectric performance via interpretable feature engineering on small data
Authors:
Shengluo Ma,
Yongchao Rao,
Xiang Huang,
Shenghong Ju
Abstract:
In this work, we have proposed a data-driven screening framework combining the interpretable machine learning with high-throughput calculations to identify a series of metal oxides that exhibit both high-temperature tolerance and high power factors. Aiming at the problem of weak generalization ability of small data with power factors at high temperatures, we employ symbolic regression for feature…
▽ More
In this work, we have proposed a data-driven screening framework combining the interpretable machine learning with high-throughput calculations to identify a series of metal oxides that exhibit both high-temperature tolerance and high power factors. Aiming at the problem of weak generalization ability of small data with power factors at high temperatures, we employ symbolic regression for feature creation which enhances the robustness of the model while preserving the physical meaning of features. 33 candidate metal oxides are finally targeted for high-temperature thermoelectric applications from a pool of 48,694 compounds in the Materials Project database. The Boltzmann transport theory is utilized to perform electrical transport properties calculations at 1,000 K. The relaxation time is approximated by employing constant electron-phonon coupling based on the deformation potential theory. Considering band degeneracy, the electron group velocity is obtained using the momentum matrix element method, yielding 28 materials with power factors greater than 50 $μW cm^{-1} K^{-2} $. The high-throughput framework we proposed is instrumental in the selection of metal oxides for high-temperature thermoelectric applications. Furthermore, our data-driven analysis and transport calculation suggest that metal oxides rich in elements such as cerium (Ce), tin (Sn), and lead (Pb) tend to exhibit high power factors at high temperatures.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Heuristic-enhanced Candidates Selection strategy for GPTs tackle Few-Shot Aspect-Based Sentiment Analysis
Authors:
Baoxing Jiang,
Yujie Wan,
Shenggen Ju
Abstract:
Few-Shot Aspect-Based Sentiment Analysis (FSABSA) is an indispensable and highly challenging task in natural language processing. However, methods based on Pre-trained Language Models (PLMs) struggle to accommodate multiple sub-tasks, and methods based on Generative Pre-trained Transformers (GPTs) perform poorly. To address the above issues, the paper designs a Heuristic-enhanced Candidates Select…
▽ More
Few-Shot Aspect-Based Sentiment Analysis (FSABSA) is an indispensable and highly challenging task in natural language processing. However, methods based on Pre-trained Language Models (PLMs) struggle to accommodate multiple sub-tasks, and methods based on Generative Pre-trained Transformers (GPTs) perform poorly. To address the above issues, the paper designs a Heuristic-enhanced Candidates Selection (HCS) strategy and further proposes All in One (AiO) model based on it. The model works in a two-stage, which simultaneously accommodates the accuracy of PLMs and the generalization capability of GPTs. Specifically, in the first stage, a backbone model based on PLMs generates rough heuristic candidates for the input sentence. In the second stage, AiO leverages LLMs' contextual learning capabilities to generate precise predictions. The study conducted comprehensive comparative and ablation experiments on five benchmark datasets. The experimental results demonstrate that the proposed model can better adapt to multiple sub-tasks, and also outperforms the methods that directly utilize GPTs.
△ Less
Submitted 19 August, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
arXiv:2403.15887
[pdf]
cond-mat.soft
cond-mat.mtrl-sci
physics.app-ph
physics.chem-ph
physics.comp-ph
Tutorial: AI-assisted exploration and active design of polymers with high intrinsic thermal conductivity
Authors:
Xiang Huang,
Shenghong Ju
Abstract:
Designing polymers with high intrinsic thermal conductivity (TC) is critically important for the thermal management of organic electronics and photonics. However, this is a challenging task owing to the diversity of the chemical space and the barriers to advanced synthetic experiments/characterization techniques for polymers. In this Tutorial, the fundamentals and implementation of combining class…
▽ More
Designing polymers with high intrinsic thermal conductivity (TC) is critically important for the thermal management of organic electronics and photonics. However, this is a challenging task owing to the diversity of the chemical space and the barriers to advanced synthetic experiments/characterization techniques for polymers. In this Tutorial, the fundamentals and implementation of combining classical molecular dynamics simulation and machine learning (ML) for the development of polymers with high TC are comprehensively introduced. We begin by describing the core components of a universal ML framework, involving polymer datasets, property calculators, feature engineering and informatics algorithms. Then, the process of constructing interpretable regression algorithms for TC prediction is introduced, aiming to extract the underlying relationships between microstructures and TCs for polymers. We also explore the design of sequence-ordered polymers with high TC using lightweight and mainstream active learning algorithms. Lastly, we conclude by addressing the current limitations and suggesting potential avenues for future research on this topic.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Exploiting Emotion-Semantic Correlations for Empathetic Response Generation
Authors:
Zhou Yang,
Zhaochun Ren,
Yufeng Wang,
Xiaofei Zhu,
Zhihao Chen,
Tiecheng Cai,
Yunbing Wu,
Yisong Su,
Sibo Ju,
Xiangwen Liao
Abstract:
Empathetic response generation aims to generate empathetic responses by understanding the speaker's emotional feelings from the language of dialogue. Recent methods capture emotional words in the language of communicators and construct them as static vectors to perceive nuanced emotions. However, linguistic research has shown that emotional words in language are dynamic and have correlations with…
▽ More
Empathetic response generation aims to generate empathetic responses by understanding the speaker's emotional feelings from the language of dialogue. Recent methods capture emotional words in the language of communicators and construct them as static vectors to perceive nuanced emotions. However, linguistic research has shown that emotional words in language are dynamic and have correlations with other grammar semantic roles, i.e., words with semantic meanings, in grammar. Previous methods overlook these two characteristics, which easily lead to misunderstandings of emotions and neglect of key semantics. To address this issue, we propose a dynamical Emotion-Semantic Correlation Model (ESCM) for empathetic dialogue generation tasks. ESCM constructs dynamic emotion-semantic vectors through the interaction of context and emotions. We introduce dependency trees to reflect the correlations between emotions and semantics. Based on dynamic emotion-semantic vectors and dependency trees, we propose a dynamic correlation graph convolutional network to guide the model in learning context meanings in dialogue and generating empathetic responses. Experimental results on the EMPATHETIC-DIALOGUES dataset show that ESCM understands semantics and emotions more accurately and expresses fluent and informative empathetic responses. Our analysis results also indicate that the correlations between emotions and semantics are frequently used in dialogues, which is of great significance for empathetic perception and expression.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
AI-assisted inverse design of sequence-ordered high intrinsic thermal conductivity polymers
Authors:
Xiang Huang,
C. Y. Zhao,
Hong Wang,
Shenghong Ju
Abstract:
Artificial intelligence (AI) promotes the polymer design paradigm from a traditional trial-and-error approach to a data-driven style. Achieving high thermal conductivity (TC) for intrinsic polymers is urgent because of their importance in the thermal management of many industrial applications such as microelectronic devices and integrated circuits. In this work, we have proposed a robust AI-assist…
▽ More
Artificial intelligence (AI) promotes the polymer design paradigm from a traditional trial-and-error approach to a data-driven style. Achieving high thermal conductivity (TC) for intrinsic polymers is urgent because of their importance in the thermal management of many industrial applications such as microelectronic devices and integrated circuits. In this work, we have proposed a robust AI-assisted workflow for the inverse design of high TC polymers. By using 1144 polymers with known computational TCs, we construct a surrogate deep neural network model for TC prediction and extract a polymer-unit library with 32 sequences. Two state-of-the-art multi-objective optimization algorithms of unified non-dominated sorting genetic algorithm III (U-NSGA-III) and q-noisy expected hypervolume improvement (qNEHVI) are employed for sequence-ordered polymer design with both high TC and synthetic possibility. For triblock polymer design, the result indicates that qNHEVI is capable of exploring a diversity of optimal polymers at the Pareto front, but the uncertainty in Quasi-Monte Carlo sampling makes the trials costly. The performance of U-NSGA-III is affected by the initial random structures and usually falls into a locally optimal solution, but it takes fewer attempts with lower costs. 20 parallel U-NSGA-III runs are conducted to design the pentablock polymers with high TC, and half of the candidates among 1921 generated polymers achieve the targets (TC > 0.4 W/(mK) and SA < 3.0). Ultimately, we check the TC of 50 promising polymers through molecular dynamics simulations and reveal the intrinsic connections between microstructures and TCs. Our developed AI-assisted inverse design approach for polymers is flexible and universal, and can be extended to the design of polymers with other target properties.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Tunable thermal conductivity of sustainable geopolymers by Si/Al ratio and moisture content: insights from atomistic simulations
Authors:
Wenkai Liu,
Shenghong Ju
Abstract:
In this work, the effects of Si/Al ratio and moisture content on thermal transport in sustainable geopolymers has been comprehensively investigated by using the molecular dynamics simulation. The thermal conductivity of geopolymer systems increases with the increase of Si/Al ratio, and the phonon vibration frequency region which plays a major role in the main increase of its thermal conductivity i…
▽ More
In this work, the effects of Si/Al ratio and moisture content on thermal transport in sustainable geopolymers has been comprehensively investigated by using the molecular dynamics simulation. The thermal conductivity of geopolymer systems increases with the increase of Si/Al ratio, and the phonon vibration frequency region which plays a major role in the main increase of its thermal conductivity is 8-25 THz, while the rest of the frequency interval contribute less. With the increase of moisture content, the thermal conductivity of geopolymer systems decreases at first, then increases and finally tends to be stable, which is contrary to the changing trend of porosity of the system. This is mainly because the existence of pores will lead to phonon scattering during thermal transport, which in turn affects the thermal conductivity of the system. When the moisture content is 5%, the thermal conductivity reaches a minimum value of about 1.103 W/(mK), which is 40.2% lower than the thermal conductivity of the system without water molecule. This work will help to enhance the physical level understanding of the relationship between the geopolymer structures and thermal transport properties.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Spectral Switches of Light in Curved Space
Authors:
Suting Ju,
Chenni Xu,
Li-Gang Wang
Abstract:
Acting as analog models of curved spacetime, surfaces of revolution employed for exploring novel optical effects are followed with great interest nowadays to enhance our comprehension of the universe. It is of general interest to understand the spectral effect of light propagating through a long distance in the universe. Here, we address the issue on how curved space affects the phenomenon of spec…
▽ More
Acting as analog models of curved spacetime, surfaces of revolution employed for exploring novel optical effects are followed with great interest nowadays to enhance our comprehension of the universe. It is of general interest to understand the spectral effect of light propagating through a long distance in the universe. Here, we address the issue on how curved space affects the phenomenon of spectral switches, a spectral sudden change during propagation caused by a finite size of a light source. By using the point spread function of curved space under the paraxial approximation, the expression of the on-axis output spectrum is derived and calculated numerically. A theoretical way to find on-axis spectral switches is also derived, which interprets the effect of spatial curvature of surfaces on spectral switches as a modification of effective Fresnel number. We find that the spectral switches on surfaces with positive Gaussian curvature are closer to the source, compared with the flat surface case, while the effect is opposite on surfaces with negative Gaussian curvature. We also find that the spectral switches farther away from the light source are more sensitive to the change in Gaussian curvature. This work deepens our understanding of the properties of fully and partially coherent lights propagating on two-dimensional curved space.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Disorder-dependent Li diffusion in $\mathrm{Li_6PS_5Cl}$ investigated by machine learning potential
Authors:
Jiho Lee,
Suyeon Ju,
Seungwoo Hwang,
Jinmu You,
Jisu Jung,
Youngho Kang,
Seungwu Han
Abstract:
Solid-state electrolytes with argyrodite structures, such as $\mathrm{Li_6PS_5Cl}$, have attracted considerable attention due to their superior safety compared to liquid electrolytes and higher ionic conductivity than other solid electrolytes. Although experimental efforts have been made to enhance conductivity by controlling the degree of disorder, the underlying diffusion mechanism is not yet fu…
▽ More
Solid-state electrolytes with argyrodite structures, such as $\mathrm{Li_6PS_5Cl}$, have attracted considerable attention due to their superior safety compared to liquid electrolytes and higher ionic conductivity than other solid electrolytes. Although experimental efforts have been made to enhance conductivity by controlling the degree of disorder, the underlying diffusion mechanism is not yet fully understood. Moreover, existing theoretical analyses based on ab initio MD simulations have limitations in addressing various types of disorder at room temperature. In this study, we directly investigate Li-ion diffusion in $\mathrm{Li_6PS_5Cl}$ at 300 K using large-scale, long-term MD simulations empowered by machine learning potentials (MLPs). To ensure the convergence of conductivity values within an error range of 10%, we employ a 25 ns simulation using a $5\times5\times5$ supercell containing 6500 atoms. The computed Li-ion conductivity, activation energies, and equilibrium site occupancies align well with experimental observations. Notably, Li-ion conductivity peaks when Cl ions occupy 25% of the 4c sites, rather than at 50% where the disorder is maximized. This phenomenon is explained by the interplay between inter-cage and intra-cage jumps. By elucidating the key factors affecting Li-ion diffusion in $\mathrm{Li_6PS_5Cl}$, this work paves the way for optimizing ionic conductivity in the argyrodite family.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
A positivity-preserving numerical method for a thin liquid film on a vertical cylindrical fiber
Authors:
Bohyun Kim,
Hangjie Ji,
Andrea L. Bertozzi,
Abolfazl Sadeghpour,
Y. Sungtaek Ju
Abstract:
When a thin liquid film flows down on a vertical fiber, one can observe the complex and captivating interfacial dynamics of an unsteady flow. Such dynamics are applicable in various fluid experiments due to their high surface area-to-volume ratio. Recent studies verified that when the flow undergoes regime transitions, the magnitude of the film thickness changes dramatically, making numerical simu…
▽ More
When a thin liquid film flows down on a vertical fiber, one can observe the complex and captivating interfacial dynamics of an unsteady flow. Such dynamics are applicable in various fluid experiments due to their high surface area-to-volume ratio. Recent studies verified that when the flow undergoes regime transitions, the magnitude of the film thickness changes dramatically, making numerical simulations challenging. In this paper, we present a computationally efficient numerical method that can maintain the positivity of the film thickness as well as conserve the volume of the fluid under the coarse mesh setting. A series of comparisons to laboratory experiments and previously proposed numerical methods supports the validity of our numerical method. We also prove that our method is second-order consistent in space and satisfies the entropy estimate.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.