arXiv:2604.14749 [pdf, ps, other]

Which bird does not have wings: Negative-constrained KGQA with Schema-guided Semantic Matching and Self-directed Refinement

Authors: Midan Shim, Seokju Hwang, Kaehyun Um, Kyong-Ho Lee

Abstract: Large language models still struggle with faithfulness and hallucinations despite their remarkable reasoning abilities. In Knowledge Graph Question Answering (KGQA), semantic parsing-based approaches address the limitations by understanding constraints in a user's question and converting them into a logical form to execute on a knowledge graph. However, existing KGQA benchmarks and methods are bia… ▽ More Large language models still struggle with faithfulness and hallucinations despite their remarkable reasoning abilities. In Knowledge Graph Question Answering (KGQA), semantic parsing-based approaches address the limitations by understanding constraints in a user's question and converting them into a logical form to execute on a knowledge graph. However, existing KGQA benchmarks and methods are biased toward positive and calculation constraints. Negative constraints are neglected, although they frequently appear in real-world questions. In this paper, we introduce a new task, NEgative-conSTrained (NEST) KGQA, where each question contains at least one negative constraint, and a corresponding dataset, NestKGQA. We also design PyLF, a Python-formatted logical form, since existing logical forms are hardly suitable to express negation clearly while maintaining readability. Furthermore, NEST questions naturally contain multiple constraints. To mitigate their semantic complexity, we present a novel framework named CUCKOO, specialized to multiple-constrained questions and ensuring semantic executability. CUCKOO first generates a constraint-aware logical form draft and performs schema-guided semantic matching. It then selectively applies self-directed refinement only when executing improper logical forms yields an empty result, reducing cost while improving robustness. Experimental results demonstrate that CUCKOO consistently outperforms baselines on both conventional KGQA and NEST-KGQA benchmarks under few-shot settings. △ Less

Submitted 16 April, 2026; originally announced April 2026.

Comments: ACL 2026 findings

arXiv:2604.14004 [pdf, ps, other]

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

Authors: Kangsan Kim, Minki Kang, Taeil Kim, Yanlai Yang, Mengye Ren, Sung Ju Hwang

Abstract: Memory-based self-evolution has emerged as a promising paradigm for coding agents. However, existing approaches typically restrict memory utilization to homogeneous task domains, failing to leverage the shared infrastructural foundations, such as runtime environments and programming languages, that exist across diverse real-world coding problems. To address this limitation, we investigate \textbf{… ▽ More Memory-based self-evolution has emerged as a promising paradigm for coding agents. However, existing approaches typically restrict memory utilization to homogeneous task domains, failing to leverage the shared infrastructural foundations, such as runtime environments and programming languages, that exist across diverse real-world coding problems. To address this limitation, we investigate \textbf{Memory Transfer Learning} (MTL) by harnessing a unified memory pool from heterogeneous domains. We evaluate performance across 6 coding benchmarks using four memory representations, ranging from concrete traces to abstract insights. Our experiments demonstrate that cross-domain memory improves average performance by 3.7\%, primarily by transferring meta-knowledge, such as validation routines, rather than task-specific code. Importantly, we find that abstraction dictates transferability; high-level insights generalize well, whereas low-level traces often induce negative transfer due to excessive specificity. Furthermore, we show that transfer effectiveness scales with the size of the memory pool, and memory can be transferred even between different models. Our work establishes empirical design principles for expanding memory utilization beyond single-domain silos. Project page: https://memorytransfer.github.io/ △ Less

Submitted 15 April, 2026; originally announced April 2026.

Comments: Preprint

arXiv:2604.12113 [pdf, ps, other]

PR-MaGIC: Prompt Refinement Via Mask Decoder Gradient Flow For In-Context Segmentation

Authors: Minjae Lee, Sungwoo Hur, Soojin Hwang, Won Hwa Kim

Abstract: Visual Foundation Models (VFMs) such as the Segment Anything Model (SAM) have significantly advanced broad use of image segmentation. However, SAM and its variants necessitate substantial manual effort for prompt generation and additional training for specific applications. Recent approaches address these limitations by integrating SAM into in-context (one/few shot) segmentation, enabling auto-pro… ▽ More Visual Foundation Models (VFMs) such as the Segment Anything Model (SAM) have significantly advanced broad use of image segmentation. However, SAM and its variants necessitate substantial manual effort for prompt generation and additional training for specific applications. Recent approaches address these limitations by integrating SAM into in-context (one/few shot) segmentation, enabling auto-prompting through semantic alignment between query and support images. Despite these efforts, they still generate sub-optimal prompts that degrade segmentation quality due to visual inconsistencies between support and query images. To tackle this limitation, we introduce PR-MaGIC (Prompt Refinement via Mask Decoder Gradient Flow for In-Context Segmentation), a training-free test-time framework that refines prompts via gradient flow derived from SAM's mask decoder. PR-MaGIC seamlessly integrates into in-context segmentation frameworks, being theoretically grounded yet practically stabilized through a simple top-1 selection strategy that ensures robust performance across samples. Extensive evaluations demonstrate that PR-MaGIC consistently improves segmentation quality across various benchmarks, effectively mitigating inadequate prompts without requiring additional training or architectural modifications. △ Less

Submitted 13 April, 2026; originally announced April 2026.

arXiv:2604.11514 [pdf, ps, other]

DuET: Dual Execution for Test Output Prediction with Generated Code and Pseudocode

Authors: Hojae Han, Jaejin Kim, Seung-won Hwang, Yu Jin Kim, Moontae Lee

Abstract: This work addresses test output prediction, a key challenge in test case generation. To improve the reliability of predicted outputs by LLMs, prior approaches generate code first to ground predictions. One grounding strategy is direct execution of generated code, but even minor errors can cause failures. To address this, we introduce LLM-based pseudocode execution, which grounds prediction on more… ▽ More This work addresses test output prediction, a key challenge in test case generation. To improve the reliability of predicted outputs by LLMs, prior approaches generate code first to ground predictions. One grounding strategy is direct execution of generated code, but even minor errors can cause failures. To address this, we introduce LLM-based pseudocode execution, which grounds prediction on more error-resilient pseudocode and simulates execution via LLM reasoning. We further propose DuET, a dual-execution framework that combines both approaches by functional majority voting. Our analysis shows the two approaches are complementary in overcoming the limitations of direct execution suffering from code errors, and pseudocode reasoning from hallucination. On LiveCodeBench, DuET achieves the state-of-the-art performance, improving Pass@1 by 13.6 pp. △ Less

Submitted 13 April, 2026; originally announced April 2026.

Comments: Findings of ACL 2026

arXiv:2604.02497 [pdf, ps, other]

Delaunay Canopy: Building Wireframe Reconstruction from Airborne LiDAR Point Clouds via Delaunay Graph

Authors: Donghyun Kim, Chanyoung Kim, Youngjoong Kwon, Seong Jae Hwang

Abstract: Reconstructing building wireframe from airborne LiDAR point clouds yields a compact, topology-centric representation that enables structural understanding beyond dense meshes. Yet a key limitation persists: conventional methods have failed to achieve accurate wireframe reconstruction in regions afflicted by significant noise, sparsity, or internal corners. This failure stems from the inability to… ▽ More Reconstructing building wireframe from airborne LiDAR point clouds yields a compact, topology-centric representation that enables structural understanding beyond dense meshes. Yet a key limitation persists: conventional methods have failed to achieve accurate wireframe reconstruction in regions afflicted by significant noise, sparsity, or internal corners. This failure stems from the inability to establish an adaptive search space to effectively leverage the rich 3D geometry of large, sparse building point clouds. In this work, we address this challenge with Delaunay Canopy, which utilizes the Delaunay graph as a geometric prior to define a geometrically adaptive search space. Central to our approach is Delaunay Graph Scoring, which not only reconstructs the underlying geometric manifold but also yields region-wise curvature signatures to robustly guide the reconstruction. Built on this foundation, our corner and wire selection modules leverage the Delaunay-induced prior to focus on highly probable elements, thereby shaping the search space and enabling accurate prediction even in previously intractable regions. Extensive experiments on the Building3D Tallinn city and entry-level datasets demonstrate state-of-the-art wireframe reconstruction, delivering accurate predictions across diverse and complex building geometries. △ Less

Submitted 2 April, 2026; originally announced April 2026.

arXiv:2604.01993 [pdf, ps, other]

SAFE: Stepwise Atomic Feedback for Error correction in Multi-hop Reasoning

Authors: Daeyong Kwon, Soyoung Yoon, Seung-won Hwang

Abstract: Multi-hop QA benchmarks frequently reward Large Language Models (LLMs) for spurious correctness, masking ungrounded or flawed reasoning steps. To shift toward rigorous reasoning, we propose SAFE, a dynamic benchmarking framework that replaces the ungrounded Chain-of-Thought (CoT) with a strictly verifiable sequence of grounded entities. Our framework operates across two phases: (1) train-time veri… ▽ More Multi-hop QA benchmarks frequently reward Large Language Models (LLMs) for spurious correctness, masking ungrounded or flawed reasoning steps. To shift toward rigorous reasoning, we propose SAFE, a dynamic benchmarking framework that replaces the ungrounded Chain-of-Thought (CoT) with a strictly verifiable sequence of grounded entities. Our framework operates across two phases: (1) train-time verification, where we establish an atomic error taxonomy and a Knowledge Graph (KG)-grounded verification pipeline to eliminate noisy supervision in standard benchmarks, identifying up to 14% of instances as unanswerable, and (2) inference-time verification, where a feedback model trained on this verified dataset dynamically detects ungrounded steps in real-time. Experimental results demonstrate that SAFE not only exposes the critical flaws of existing benchmarks at train-time, but also significantly outperforms standard baselines, achieving an average accuracy gain of 8.4 pp while guaranteeing verifiable trajectories at inference-time. △ Less

Submitted 2 April, 2026; originally announced April 2026.

arXiv:2603.27294 [pdf, ps, other]

doi 10.1109/LRA.2026.3678110

Class-Distribution Guided Active Learning for 3D Occupancy Prediction in Autonomous Driving

Authors: Wonjune Kim, In-Jae Lee, Sihwan Hwang, Sanmin Kim, Dongsuk Kum

Abstract: 3D occupancy prediction provides dense spatial understanding critical for safe autonomous driving. However, this task suffers from a severe class imbalance due to its volumetric representation, where safety-critical objects (bicycles, traffic cones, pedestrians) occupy minimal voxels compared to dominant backgrounds. Additionally, voxel-level annotation is costly, yet dedicating effort to dominant… ▽ More 3D occupancy prediction provides dense spatial understanding critical for safe autonomous driving. However, this task suffers from a severe class imbalance due to its volumetric representation, where safety-critical objects (bicycles, traffic cones, pedestrians) occupy minimal voxels compared to dominant backgrounds. Additionally, voxel-level annotation is costly, yet dedicating effort to dominant classes is inefficient. To address these challenges, we propose a class-distribution guided active learning framework for selecting training samples to annotate in autonomous driving datasets. Our approach combines three complementary criteria to select the training samples. Inter-sample diversity prioritizes samples whose predicted class distributions differ from those of the labeled set, intra-set diversity prevents redundant sampling within each acquisition cycle, and frequency-weighted uncertainty emphasizes rare classes by reweighting voxel-level entropy with inverse per-sample class proportions. We ensure evaluation validity by using a geographically disjoint train/validation split of Occ3D-nuScenes, which reduces train-validation overlap and mitigates potential map memorization. With only 42.4% labeled data, our framework reaches 26.62 mIoU, comparable to full supervision and outperforming active learning baselines at the same budget. We further validate generality on SemanticKITTI using a different architecture, demonstrating consistent effectiveness across datasets. △ Less

Submitted 28 March, 2026; originally announced March 2026.

Comments: IEEE RA-L 2026

Journal ref: IEEE Robotics and Automation Letters (2026)

arXiv:2603.25247 [pdf, ps, other]

FEAST: Fully Connected Expressive Attention for Spatial Transcriptomics

Authors: Taejin Jeong, Joohyeok Kim, Jinyeong Kim, Chanyoung Kim, Seong Jae Hwang

Abstract: Spatial Transcriptomics (ST) provides spatially-resolved gene expression, offering crucial insights into tissue architecture and complex diseases. However, its prohibitive cost limits widespread adoption, leading to significant attention on inferring spatial gene expression from readily available whole slide images. While graph neural networks have been proposed to model interactions between tissu… ▽ More Spatial Transcriptomics (ST) provides spatially-resolved gene expression, offering crucial insights into tissue architecture and complex diseases. However, its prohibitive cost limits widespread adoption, leading to significant attention on inferring spatial gene expression from readily available whole slide images. While graph neural networks have been proposed to model interactions between tissue regions, their reliance on pre-defined sparse graphs prevents them from considering potentially interacting spot pairs, resulting in a structural limitation in capturing complex biological relationships. To address this, we propose FEAST (Fully connected Expressive Attention for Spatial Transcriptomics), an attention-based framework that models the tissue as a fully connected graph, enabling the consideration of all pairwise interactions. To better reflect biological interactions, we introduce negative-aware attention, which models both excitatory and inhibitory interactions, capturing essential negative relationships that standard attention often overlooks. Furthermore, to mitigate the information loss from truncated or ignored context in standard spot image extraction, we introduce an off-grid sampling strategy that gathers additional images from intermediate regions, allowing the model to capture a richer morphological context. Experiments on public ST datasets show that FEAST surpasses state-of-the-art methods in gene expression prediction while providing biologically plausible attention maps that clarify positive and negative interactions. Our code is available at https://github.com/starforTJ/ FEAST. △ Less

Submitted 26 March, 2026; originally announced March 2026.

arXiv:2603.23186 [pdf, ps, other]

ViKey: Enhancing Temporal Understanding in Videos via Visual Prompting

Authors: Yeonkyung Lee, Dayun Ju, Youngmin Kim, Seil Kang, Seong Jae Hwang

Abstract: Recent advancements in Video Large Language Models (VideoLLMs) have enabled strong performance across diverse multimodal video tasks. To reduce the high computational cost of processing dense video frames, efficiency-oriented methods such as frame selection have been widely adopted. While effective at minimizing redundancy, these methods often cause notable performance drops on tasks requiring tem… ▽ More Recent advancements in Video Large Language Models (VideoLLMs) have enabled strong performance across diverse multimodal video tasks. To reduce the high computational cost of processing dense video frames, efficiency-oriented methods such as frame selection have been widely adopted. While effective at minimizing redundancy, these methods often cause notable performance drops on tasks requiring temporal reasoning. Unlike humans, who can infer event progression from sparse visual cues, VideoLLMs frequently misinterpret temporal relations when intermediate frames are omitted. To address this limitation, we explore visual prompting (VP) as a lightweight yet effective way to enhance temporal understanding in VideoLLMs. Our analysis reveals that simply annotating each frame with explicit ordinal information helps the model perceive temporal continuity. This visual cue also supports frame-level referencing and mitigates positional ambiguity within a sparsely sampled sequence. Building on these insights, we introduce ViKey, a training-free framework that combines VP with a lightweight Keyword-Frame Mapping (KFM) module. KFM leverages frame indices as dictionary-like keys to link textual cues to the most relevant frames, providing explicit temporal anchors during inference. Despite its simplicity, our approach substantially improves temporal reasoning and, on some datasets, preserves dense-frame baseline performance with as few as 20% of frames. △ Less

Submitted 24 March, 2026; originally announced March 2026.

Comments: accepted to CVPR2026

arXiv:2603.22341 [pdf, ps, other]

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

Authors: Hyomin Lee, Sangwoo Park, Yumin Choi, Sohyun An, Seanie Lee, Sung Ju Hwang

Abstract: While prior red-teaming efforts have focused on eliciting harmful text outputs from large language models (LLMs), such approaches fail to capture agent-specific vulnerabilities that emerge through multi-step tool execution, particularly in rapidly growing ecosystems such as the Model Context Protocol (MCP). To address this gap, we propose a trajectory-aware evolutionary search method, T-MAP, which… ▽ More While prior red-teaming efforts have focused on eliciting harmful text outputs from large language models (LLMs), such approaches fail to capture agent-specific vulnerabilities that emerge through multi-step tool execution, particularly in rapidly growing ecosystems such as the Model Context Protocol (MCP). To address this gap, we propose a trajectory-aware evolutionary search method, T-MAP, which leverages execution trajectories to guide the discovery of adversarial prompts. Our approach enables the automatic generation of attacks that not only bypass safety guardrails but also reliably realize harmful objectives through actual tool interactions. Empirical evaluations across diverse MCP environments demonstrate that T-MAP substantially outperforms baselines in attack realization rate (ARR) and remains effective against frontier models, including GPT-5.2, Gemini-3-Pro, Qwen3.5, and GLM-5, thereby revealing previously underexplored vulnerabilities in autonomous LLM agents. △ Less

Submitted 21 March, 2026; originally announced March 2026.

arXiv:2603.21529 [pdf, ps, other]

doi 10.1145/3770854.3785698

SynSym: A Synthetic Data Generation Framework for Psychiatric Symptom Identification

Authors: Migyeong Kang, Jihyun Kim, Hyolim Jeon, Sunwoo Hwang, Jihyun An, Yonghoon Kim, Haewoon Kwak, Jisun An, Jinyoung Han

Abstract: Psychiatric symptom identification on social media aims to infer fine-grained mental health symptoms from user-generated posts, allowing a detailed understanding of users' mental states. However, the construction of large-scale symptom-level datasets remains challenging due to the resource-intensive nature of expert labeling and the lack of standardized annotation guidelines, which in turn limits… ▽ More Psychiatric symptom identification on social media aims to infer fine-grained mental health symptoms from user-generated posts, allowing a detailed understanding of users' mental states. However, the construction of large-scale symptom-level datasets remains challenging due to the resource-intensive nature of expert labeling and the lack of standardized annotation guidelines, which in turn limits the generalizability of models to identify diverse symptom expressions from user-generated text. To address these issues, we propose SynSym, a synthetic data generation framework for constructing generalizable datasets for symptom identification. Leveraging large language models (LLMs), SynSym constructs high-quality training samples by (1) expanding each symptom into sub-concepts to enhance the diversity of generated expressions, (2) producing synthetic expressions that reflect psychiatric symptoms in diverse linguistic styles, and (3) composing realistic multi-symptom expressions, informed by clinical co-occurrence patterns. We validate SynSym on three benchmark datasets covering different styles of depressive symptom expression. Experimental results demonstrate that models trained solely on the synthetic data generated by SynSym perform comparably to those trained on real data, and benefit further from additional fine-tuning with real data. These findings underscore the potential of synthetic data as an alternative resource to real-world annotations in psychiatric symptom modeling, and SynSym serves as a practical framework for generating clinically relevant and realistic symptom expressions. △ Less

Submitted 22 March, 2026; originally announced March 2026.

ACM Class: I.2.7

arXiv:2603.18892 [pdf, ps, other]

MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model

Authors: Youngwan Lee, Soojin Jang, Yoorhim Cho, Seunghwan Lee, Yong-Ju Lee, Sung Ju Hwang

Abstract: Spatial reasoning is foundational for Vision-Language Models (VLMs), particularly when deployed as Vision-Language-Action (VLA) agents in physical environments. However, existing benchmarks predominantly focus on elementary, single-hop relations, neglecting the multi-hop compositional reasoning and precise visual grounding essential for real-world scenarios. To address this, we introduce MultihopS… ▽ More Spatial reasoning is foundational for Vision-Language Models (VLMs), particularly when deployed as Vision-Language-Action (VLA) agents in physical environments. However, existing benchmarks predominantly focus on elementary, single-hop relations, neglecting the multi-hop compositional reasoning and precise visual grounding essential for real-world scenarios. To address this, we introduce MultihopSpatial, offering three key contributions: (1) A comprehensive benchmark designed for multi-hop and compositional spatial reasoning, featuring 1- to 3-hop complex queries across diverse spatial perspectives. (2) Acc@50IoU, a complementary metric that simultaneously evaluates reasoning and visual grounding by requiring both answer selection and precise bounding box prediction - capabilities vital for robust VLA deployment. (3) MultihopSpatial-Train, a dedicated large-scale training corpus to foster spatial intelligence. Extensive evaluation of 37 state-of-the-art VLMs yields eight key insights, revealing that compositional spatial reasoning remains a formidable challenge. Finally, we demonstrate that reinforcement learning post-training on our corpus enhances both intrinsic VLM spatial reasoning and downstream embodied manipulation performance. △ Less

Submitted 19 March, 2026; originally announced March 2026.

Comments: Project page: https://youngwanlee.github.io/multihopspatial

arXiv:2603.17651 [pdf, ps, other]

Anchoring and Rescaling Attention for Semantically Coherent Inbetweening

Authors: Tae Eun Choi, Sumin Shim, Junhyeok Kim, Seong Jae Hwang

Abstract: Generative inbetweening (GI) seeks to synthesize realistic intermediate frames between the first and last keyframes beyond mere interpolation. As sequences become sparser and motions larger, previous GI models struggle with inconsistent frames with unstable pacing and semantic misalignment. Since GI involves fixed endpoints and numerous plausible paths, this task requires additional guidance gaine… ▽ More Generative inbetweening (GI) seeks to synthesize realistic intermediate frames between the first and last keyframes beyond mere interpolation. As sequences become sparser and motions larger, previous GI models struggle with inconsistent frames with unstable pacing and semantic misalignment. Since GI involves fixed endpoints and numerous plausible paths, this task requires additional guidance gained from the keyframes and text to specify the intended path. Thus, we give semantic and temporal guidance from the keyframes and text onto each intermediate frame through Keyframe-anchored Attention Bias. We also better enforce frame consistency with Rescaled Temporal RoPE, which allows self-attention to attend to keyframes more faithfully. TGI-Bench, the first benchmark specifically designed for text-conditioned GI evaluation, enables challenge-targeted evaluation to analyze GI models. Without additional training, our method achieves state-of-the-art frame consistency, semantic fidelity, and pace stability for both short and long sequences across diverse challenges. △ Less

Submitted 18 March, 2026; originally announced March 2026.

Comments: Accepted to CVPR 2026; Code is released at https://github.com/teunchoi/TGI

arXiv:2603.15825 [pdf, ps, other]

ODIN: Searching for LyC emission from Lyman-$α$ emitters at $z=4.5$ in the E-COSMOS and XMM-LSS fields

Authors: Eunsuk Seo, Hyunmi Song, Lucia Guaita, Kyoung-Soo Lee, Eric Gawiser, Robin Ciardullo, Arjun Dey, Seok-Jun Chang, Nicole Firestone, Stephen Gwyn, Ho Seong Hwang, Sungryong Hong, Sang Hyeok Im, Woong-Seob Jeong, Jaehyun Lee, Seong-Kook Lee, Chanbom Park, Vandana Ramakrishnan, Marcin Sawicki, Yujin Yang, Ann Zabludoff

Abstract: We investigated Lyman-continuum (LyC) emission from Lyman-$α$ emitters (LAEs) at $z=4.5$, identified in the One-hundred-deg$^2$ DECam Imaging in Narrowbands (ODIN) survey. Of the 7,498 LAEs (4,101 in COSMOS and 3,397 in XMM-LSS), we excluded LAEs that are either likely low-z objects or contaminated by neighboring sources. Additional background modeling process with thorough quality assessments lea… ▽ More We investigated Lyman-continuum (LyC) emission from Lyman-$α$ emitters (LAEs) at $z=4.5$, identified in the One-hundred-deg$^2$ DECam Imaging in Narrowbands (ODIN) survey. Of the 7,498 LAEs (4,101 in COSMOS and 3,397 in XMM-LSS), we excluded LAEs that are either likely low-z objects or contaminated by neighboring sources. Additional background modeling process with thorough quality assessments leaves a final sample of 851 galaxies. We then performed forced photometry on $u/u^*$-band images from the CFHT large area $u$-band deep survey (CLAUDS) to measure their LyC fluxes. This represents the largest sample of $z=4.5$ LAEs searched for such a purpose. Within this sample, we identified 12 `gold' and 39 `silver' LyC-emitting candidates, with LyC fluxes detected of $>3σ$ and between $2σ$ and $3σ$, respectively, in the range of 5.16--55.29 nJy. No LyC signal is detected in the weighted mean stack of the final sample ($0.20 \pm 0.37$ nJy). Given the UVC magnitudes of LAEs in our sample, the expected LyC emission is likely below the detection limit even when stacking the full sample of ODIN LAEs. Nevertheless, having a large sample of LAEs remains valuable for identifying individual LyC leaker candidates. Among the gold and silver candidates, the LyC flux appears to correlate positively with UVC flux and negatively with Ly$α$ equivalent width, although the correlations are weak. A larger sample of LyC leakers will allow a more robust confirmation of these trends and provide better insights into their physical origins. △ Less

Submitted 16 March, 2026; originally announced March 2026.

Comments: 30 pages, 18 figures, accepted for publication in ApJ

arXiv:2603.14405 [pdf, ps, other]

ES-Merging: Biological MLLM Merging via Embedding Space Signals

Authors: Wonbin Lee, Dongki Kim, Sung Ju Hwang

Abstract: Biological multimodal large language models (MLLMs) have emerged as powerful foundation models for scientific discovery. However, existing models are specialized to a single modality, limiting their ability to solve inherently cross-modal scientific problems. While model merging is an efficient method to combine the different modalities into a unified MLLM, existing methods rely on input-agnostic… ▽ More Biological multimodal large language models (MLLMs) have emerged as powerful foundation models for scientific discovery. However, existing models are specialized to a single modality, limiting their ability to solve inherently cross-modal scientific problems. While model merging is an efficient method to combine the different modalities into a unified MLLM, existing methods rely on input-agnostic parameter space heuristics that fail to faithfully capture modality specialization. To overcome this limitation, we propose a representation-aware merging framework that estimates merging coefficients from embedding space signals. We first design a probe input that consists of different modality tokens and forward it through each specialized MLLM to obtain layer-wise embedding responses that reflect modality-specific representation changes. We then estimate complementary merging coefficients at two granularities from the embedding space: layer-wise coefficients from coarse-grained signals and element-wise coefficients from fine-grained signals, which are jointly combined for robust coefficient estimation. Experiments on interactive effect prediction benchmarks show that our method outperforms existing merging methods and even surpasses task-specific fine-tuned models, establishing that embedding space signals provide a principled and effective foundation for cross-modal MLLM merging. △ Less

Submitted 15 March, 2026; originally announced March 2026.

arXiv:2603.10319 [pdf, ps, other]

New classification method for the dynamical state of galaxy clusters with a Gaussian mixture model

Authors: Hyowon Kim, Marco Canducci, Rory Smith, Peter Tino, Yara Jaffe, Ho Seong Hwang, Jihye Shin, Kyungwon Chun

Abstract: Galaxy clusters are the largest gravitationally bound systems, and they continue their growth through mergers in a hierarchical ΛCDM Universe. Therefore, we can describe the merger stage of a cluster as the dynamical state of clusters. Previous studies have investigated this phenomenon, but several limitations remain, including reliance on dichotomous classifications, constraints on the number of… ▽ More Galaxy clusters are the largest gravitationally bound systems, and they continue their growth through mergers in a hierarchical ΛCDM Universe. Therefore, we can describe the merger stage of a cluster as the dynamical state of clusters. Previous studies have investigated this phenomenon, but several limitations remain, including reliance on dichotomous classifications, constraints on the number of indicators used, absence of reliability, and incompatibility of methods between observation and simulation studies. To overcome this, we developed an enhanced and observation-applicable cluster dynamical state classification method using the Bayesian classifier with the class-conditional Gaussian mixture distribution model using the N-cluster Run simulation data. The Bayesian classifier was designed for two merger stages (merger and relaxed) as well as three merger stages (recent merger, ancient merger, and relaxed) to provide a more detailed interpretation of the merger processes. In the results, using a larger number of indicators yields better results, with their order of importance being: magnitude difference, center offset, sparsity, Kuiper V statistic, and mirror asymmetry. Additionally, our analyses show that a projected classifier (built on the 6D space, but evaluated on lower dimensional projections) consistently produces better outcomes than non-projected classifiers (i.e., classifiers built directly on the corresponding low dimensional spaces), which means limited observation data can be used to classify with enhanced performance. Furthermore, the new classification method outperforms our previous research. This new method can suggest a way of overcoming previous limitations and provides new insights by providing the reliability of dynamical state classification results. △ Less

Submitted 10 March, 2026; originally announced March 2026.

Comments: 18pages, 11 figures

arXiv:2603.09905 [pdf, ps, other]

ODIN: Spectroscopic Validation of Ly$α$-Emitting Galaxy Samples with DESI

Authors: Ethan Pinarski, Govind Ramgopal, Nicole Firestone, Kyoung-Soo Lee, Eric Gawiser, Arjun Dey, A. Raichoor, Francisco Valdes, Robin Ciardullo, Jessica N. Aguilar, S. Ahlen, D. Bianchi, D. Brooks, F. J. Castander, M. Candela Cerdosino, T. Claybaugh, A. Cuceu, K. S. Dawson, A. de la Macorra, P. Doel, S. Ferraro, A. Font-Ribera, J. E. Forero-Romero, E. Gaztañaga, S. Gontcho A Gontcho , et al. (43 additional authors not shown)

Abstract: The One-hundred-deg^2 DECam Imaging in Narrowbands (ODIN) survey is conducting the widest-field deep narrow-band imaging of the equatorial and southern skies. ODIN uses three custom-built narrow-band (NB) filters that sample Lya-emitting galaxies (LAEs) within thin cosmic slices centered at z=2.4, 3.1, and 4.5. In this work, we utilize extensive DESI spectroscopy of ODIN-selected galaxies in the C… ▽ More The One-hundred-deg^2 DECam Imaging in Narrowbands (ODIN) survey is conducting the widest-field deep narrow-band imaging of the equatorial and southern skies. ODIN uses three custom-built narrow-band (NB) filters that sample Lya-emitting galaxies (LAEs) within thin cosmic slices centered at z=2.4, 3.1, and 4.5. In this work, we utilize extensive DESI spectroscopy of ODIN-selected galaxies in the COSMOS and XMM-LSS fields to validate our LAE selection. 2-4 hr exposures with DESI yielded redshift confirmation of 3,075 ODIN LAE candidates with NB magnitudes brighter than 26~mag. Restricting to objects that yield high-confidence redshifts, the confirmation rates are (93, 96, 92)% at z=(2.4, 3.1, 4.5). The primary contaminants consist of active galactic nuclei at the expected Lya redshift range and lower redshifts (C IV, C III]), with the remainder being star-forming galaxies ([O II] and [O III]). We find minimal contamination from [O II] emitters in our sample (<~1%), implying that our REW>20 A narrow-band excess photometry requirement is sufficient to remove them. △ Less

Submitted 10 March, 2026; originally announced March 2026.

Comments: 15 pages, 6 figures, Submitted to Astrophysical Journal

arXiv:2603.09827 [pdf, ps, other]

MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents

Authors: Kangsan Kim, Yanlai Yang, Suji Kim, Woongyeong Yeo, Youngwan Lee, Mengye Ren, Sung Ju Hwang

Abstract: As embodied models become powerful, humans will collaborate with multiple embodied AI agents at their workplace or home in the future. To ensure better communication between human users and the multi-agent system, it is crucial to interpret incoming information from agents in parallel and refer to the appropriate context for each query. Existing challenges include effectively compressing and commu… ▽ More As embodied models become powerful, humans will collaborate with multiple embodied AI agents at their workplace or home in the future. To ensure better communication between human users and the multi-agent system, it is crucial to interpret incoming information from agents in parallel and refer to the appropriate context for each query. Existing challenges include effectively compressing and communicating high volumes of individual sensory inputs in the form of video and correctly aggregating multiple egocentric videos to construct system-level memory. In this work, we first formally define a novel problem of understanding multiple long-horizon egocentric videos simultaneously collected from embodied agents. To facilitate research in this direction, we introduce MultiAgent-EgoQA (MA-EgoQA), a benchmark designed to systemically evaluate existing models in our scenario. MA-EgoQA provides 1.7k questions unique to multiple egocentric streams, spanning five categories: social interaction, task coordination, theory-of-mind, temporal reasoning, and environmental interaction. We further propose a simple baseline model for MA-EgoQA named EgoMAS, which leverages shared memory across embodied agents and agent-wise dynamic retrieval. Through comprehensive evaluation across diverse baselines and EgoMAS on MA-EgoQA, we find that current approaches are unable to effectively handle multiple egocentric streams, highlighting the need for future advances in system-level understanding across the agents. The code and benchmark are available at https://ma-egoqa.github.io. △ Less

Submitted 10 March, 2026; v1 submitted 10 March, 2026; originally announced March 2026.

Comments: Under review

arXiv:2603.09739 [pdf, ps, other]

ODIN: Confirmation and 3D Reconstruction of Six Massive Protoclusters at Cosmic Noon

Authors: Ashley Ortiz, Vandana Ramakrishnan, Kyoung-Soo Lee, Arjun Dey, Yucheng Guo, Ethan Pinarski, Anand Raichoor, Francisco Valdes, J. Aguilar, Steven Ahlen, Maria Celeste Artale, Davide Bianchi, August Bliese, David Brooks, Rebecca Canning, Maria Cerdosino, Todd Claybaugh, Andrei Cuceu, Axel de la Macorra, Peter Doel, Jaime Forero, Eric Gawiser, Enrique Gaztanaga, Satya Gontcho, Caryl Gronwall , et al. (42 additional authors not shown)

Abstract: Protoclusters represent sites of accelerated galaxy formation and extreme astrophysical activity characteristic of dense environments. Identifying massive protoclusters and mapping their spatial structures are therefore crucial first steps in understanding how the large-scale environment influences galaxy evolution. We combine wide-field Ly$α$ imaging from the ODIN survey with extensive DESI and a… ▽ More Protoclusters represent sites of accelerated galaxy formation and extreme astrophysical activity characteristic of dense environments. Identifying massive protoclusters and mapping their spatial structures are therefore crucial first steps in understanding how the large-scale environment influences galaxy evolution. We combine wide-field Ly$α$ imaging from the ODIN survey with extensive DESI and ancillary spectroscopy across the extended COSMOS and XMM-LSS fields ($\approx$14 deg$^2$) to search for massive protoclusters. We confirm six systems at $z\approx 2.4$ and $z\approx 3.1$, reconstruct their three-dimensional structures, estimate descendant halo masses, and, for one structure at $z\approx 3.12$, demonstrate that overlapping narrowband filters ($NB497$ and $N501$) provide accurate redshift tomography for emission-line galaxies. One protocluster at $z\approx 2.45$ overlaps with one of the LATIS tomographic fields, enabling direct comparison between galaxy and H {\sc i} overdensities traced by Ly$α$ forest absorption. Another at $z\approx 3.12$ hosts a massive quiescent galaxy ($M_{\ast} \approx 1.2 \times 10^{11}M_\odot$), indicating early quenching in a dense environment. By comparing Ly$α$ emission properties across environments, we find that protocluster galaxies exhibit higher median line fluxes and a deficit of faint emitters relative to the field. The effect is strongest when both 2D and 3D density information are combined, indicating that galaxies in the densest protocluster cores are most affected by environmental processes. This effect is stronger at $z\approx3.1$ than at $z\approx2.4$, suggesting possible redshift evolution. △ Less

Submitted 11 March, 2026; v1 submitted 10 March, 2026; originally announced March 2026.

arXiv:2603.09185 [pdf, ps, other]

DEO: Training-Free Direct Embedding Optimization for Negation-Aware Retrieval

Authors: Taegyeong Lee, Jiwon Park, Seunghyun Hwang, JooYoung Jang

Abstract: Recent advances in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) have enabled diverse retrieval methods. However, existing retrieval methods often fail to accurately retrieve results for negation and exclusion queries. To address this limitation, prior approaches rely on embedding adaptation or fine-tuning, which introduce additional computational cost and deployment comple… ▽ More Recent advances in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) have enabled diverse retrieval methods. However, existing retrieval methods often fail to accurately retrieve results for negation and exclusion queries. To address this limitation, prior approaches rely on embedding adaptation or fine-tuning, which introduce additional computational cost and deployment complexity. We propose Direct Embedding Optimization (DEO), a training-free method for negation-aware text and multimodal retrieval. DEO decomposes queries into positive and negative components and optimizes the query embedding with a contrastive objective. Without additional training data or model updates, DEO outperforms baselines on NegConstraint, with gains of +0.0738 nDCG@10 and +0.1028 MAP@100, while improving Recall@5 by +6\% over OpenAI CLIP in multimodal retrieval. These results demonstrate the practicality of DEO for negation- and exclusion-aware retrieval in real-world settings. △ Less

Submitted 10 March, 2026; originally announced March 2026.

arXiv:2603.09060 [pdf, ps, other]

Universal Family-Vicsek scaling in quantum gases far from equilibrium

Authors: Kiryang Kwon, Kazuya Fujimoto, Junhyeok Hur, Byungjin Lee, Samgyu Hwang, Sumin Kim, Ryusuke Hamazaki, Yuki Kawaguchi, Jae-yoon Choi

Abstract: Fluctuations in the growing surfaces of classical systems can exhibit universal scaling behavior, known as Family-Vicsek (FV) scaling. Although this phenomenon was originally discovered in classical stochastic models, recent theoretical studies have demonstrated the presence of FV scaling in quantum many-body systems as well. Here, we observe the universal FV scaling in a one-dimensional Bose gas… ▽ More Fluctuations in the growing surfaces of classical systems can exhibit universal scaling behavior, known as Family-Vicsek (FV) scaling. Although this phenomenon was originally discovered in classical stochastic models, recent theoretical studies have demonstrated the presence of FV scaling in quantum many-body systems as well. Here, we observe the universal FV scaling in a one-dimensional Bose gas in an optical lattice. By monitoring the fluctuations of particle number in half of the system, which corresponds to the surface roughness, we extract all scaling exponents and demonstrate that the entire relaxation-from the growth of quantum fluctuations to their saturation-is captured by a single universal scaling function. Our results demonstrate that universal scaling laws of classical surface growth extend to quantum many-body systems, establishing a unified framework for nonequilibrium universality across classical and quantum systems. △ Less

Submitted 9 March, 2026; originally announced March 2026.

Comments: 9 pages, 5 figures, and supplemental materials

arXiv:2603.07870 [pdf, ps, other]

Keller-Segel-Navier-Stokes systems involving general sensitivities with Signal-Dependent Power-Law Decay

Authors: Jaewook Ahn, Sukjung Hwang

Abstract: This paper investigates a two-dimensional Keller--Segel--Navier--Stokes system with a tensor-valued chemotactic sensitivity $S(x,n,c)$. Under a signal-dependent power-decay condition $|S(x,n,c)| \le s_0 (s_1+c)^{-γ}$, we establish the global existence and uniform-in-time boundedness of classical solutions for both fluid-coupled ($γ> 1/2$) and fluid-free ($γ> 0$) systems. The proof relies on a sequ… ▽ More This paper investigates a two-dimensional Keller--Segel--Navier--Stokes system with a tensor-valued chemotactic sensitivity $S(x,n,c)$. Under a signal-dependent power-decay condition $|S(x,n,c)| \le s_0 (s_1+c)^{-γ}$, we establish the global existence and uniform-in-time boundedness of classical solutions for both fluid-coupled ($γ> 1/2$) and fluid-free ($γ> 0$) systems. The proof relies on a sequence of localized energy estimates, including the $L^{2}_{\rm loc}$-smallness of the weighted gradient of the signal concentration, to overcome the mathematical difficulties arising from signal production and fluid transport. Furthermore, under specific structural assumptions on the sensitivity tensor, we prove that solutions of the fluid-free system converge exponentially to the spatially homogeneous steady state. To this end, we establish an interpolation inequality involving the Hölder norm, which is of independent interest and seems to have broad applications. △ Less

Submitted 8 March, 2026; originally announced March 2026.

MSC Class: 35B45; 35A09; 35K57; 35Q35; 35Q92

arXiv:2603.05793 [pdf, ps, other]

A Closed-Loop CPR Training Glove with Integrated Tactile Sensing and Haptic Feedback

Authors: Jaeyoung Moon, Mingzhuo Ma, Qifeng Yang, Youjin Choi, Seokhyun Hwang, Samuel Burden, Kyung-Joong Kim, Yiyue Luo

Abstract: Cardiopulmonary resuscitation (CPR) is a critical life-saving procedure, and effective training benefits from self-directed practice beyond instructor-led sessions. In this paper, we propose a closed-loop CPR training glove that integrates a high-resolution tactile sensing array and vibrotactile actuators for self-directed practice. The tactile sensing array measures distributed pressures across t… ▽ More Cardiopulmonary resuscitation (CPR) is a critical life-saving procedure, and effective training benefits from self-directed practice beyond instructor-led sessions. In this paper, we propose a closed-loop CPR training glove that integrates a high-resolution tactile sensing array and vibrotactile actuators for self-directed practice. The tactile sensing array measures distributed pressures across the palm and dorsum to enable real-time estimation of compression rate, force, and hand pose. Based on these estimations, the glove delivers immediate haptic feedback to guide the user for proper CPR, reducing reliance on external audio-visual displays. We quantified the tactile sensor performance by measuring wide-range sensitivity (~0.85 over 0-600 N), computing hysteresis (56.04%), testing stability (11.05% drift over 300 cycles), and estimating global signal-to-noise ratio (18.90 +/- 2.41 dB at 600 N). Our closed-loop pipeline provides continuous modeling and feedback of key performance metrics essential for high-quality CPR. Our lightweight statistical models achieves >92% accuracy for force estimation and hand pose classification within sub-millisecond inference time. Our user study (N=8) showed that haptic feedback reduced visual distraction compared to audio-visual cues, though simplified patterns were required for reliable perception under dynamic load. These results highlight the feasibility of the proposed system and offer design insights for future haptic CPR self-training system. △ Less

Submitted 5 March, 2026; originally announced March 2026.

Comments: 8pages, 10 figures, This paper is accepted in ICRA 2026

arXiv:2603.03663 [pdf]

doi 10.1038/s43246-026-01118-9

Plasmonic polaron in self-intercalated 1T-TiS2

Authors: Byoung Ki Choi, Woojin Choi, Zhiyu Tao, Ji-Eun Lee, Sae Hee Ryu, Seungrok Mun, Hyobeom Lee, Kyoungree Park, Seha Lee, Hayoon Im, Yong Zhong, Hyejin Ryu, Min Jae Kim, Sue Hyeon Hwang, Xuetao Zhu, Jiandong Guo, Jong Mok Ok, Jaekwang Lee, Haeyong Kang, Sungkyun Park, Jonathan D. Denlinger, Heung-Sik Kim, Aaron Bostwick, Zhi-Xun Shen, Choongyu Hwang , et al. (2 additional authors not shown)

Abstract: Electron-boson coupling is central to a comprehensive understanding of the diverse physical phenomena emerging from many-body interactions. Yet less attention has been paid to how plasmons, collective bosonic modes of electron density oscillation, interact with conduction electrons and how external parameters can tune this interaction. Here, we present a clear display of composite quasiparticles s… ▽ More Electron-boson coupling is central to a comprehensive understanding of the diverse physical phenomena emerging from many-body interactions. Yet less attention has been paid to how plasmons, collective bosonic modes of electron density oscillation, interact with conduction electrons and how external parameters can tune this interaction. Here, we present a clear display of composite quasiparticles stemming from electron-plasmon coupling, known as the plasmonic polaron, in self-intercalated 1T-TiS2, by using angle-resolved photoemission spectroscopy (ARPES), high-resolution electron energy loss spectroscopy (HR-EELS) and first-principles calculations. The single particle spectral function exhibits a distinctive plasmon-loss satellite with the same characteristic energy scale determined by HR-EELS measurements. The bosonic energy scale of plasmonic polaron is tunable by controlling charge carrier density and temperature, distinguishing itself from conventional polarons arising from electron-phonon interactions. Furthermore, we find that the dielectric screening strongly affects the formation of the plasmonic polaron states. Our findings provide direct spectroscopic evidence of plasmonic polarons and establish self-intercalated layered materials as a promising platform for studying, controlling, and harnessing plasmonic interactions in quantum materials. △ Less

Submitted 3 March, 2026; originally announced March 2026.

arXiv:2603.02919 [pdf, ps, other]

Interpretable Motion-Attentive Maps: Spatio-Temporally Localizing Concepts in Video Diffusion Transformers

Authors: Youngjun Jun, Seil Kang, Woojung Han, Seong Jae Hwang

Abstract: Video Diffusion Transformers (DiTs) have been synthesizing high-quality video with high fidelity from given text descriptions involving motion. However, understanding how Video DiTs convert motion words into video remains insufficient. Furthermore, while prior studies on interpretable saliency maps primarily target objects, motion-related behavior in Video DiTs remains largely unexplored. In this… ▽ More Video Diffusion Transformers (DiTs) have been synthesizing high-quality video with high fidelity from given text descriptions involving motion. However, understanding how Video DiTs convert motion words into video remains insufficient. Furthermore, while prior studies on interpretable saliency maps primarily target objects, motion-related behavior in Video DiTs remains largely unexplored. In this paper, we investigate concrete motion features that specify when and which object moves for a given motion concept. First, to spatially localize, we introduce GramCol, which adaptively produces per-frame saliency maps for any text concept, including both motion and non-motion. Second, we propose a motion-feature selection algorithm to obtain an Interpretable Motion-Attentive Map (IMAP) that localizes motion spatially and temporally. Our method discovers concept saliency maps without the need for any gradient calculation or parameter update. Experimentally, our method shows outstanding localization capability on the motion localization task and zero-shot video semantic segmentation, providing interpretable and clearer saliency maps for both motion and non-motion concepts. △ Less

Submitted 9 March, 2026; v1 submitted 3 March, 2026; originally announced March 2026.

Comments: CVPR 2026

arXiv:2603.01793 [pdf, ps, other]

Construction of infinite time bubble tower solutions to critical wave maps equation

Authors: Seunghwan Hwang, Kihyun Kim

Abstract: We construct infinite time bubble tower solutions to the critical wave maps equation taking values in the two-sphere. More precisely, for any integers $k\geq3$ and $J\geq1$, we construct a solution that is global in one time direction, has $k$-corotational symmetry, and asymptotically decomposes into $J$-many concentric bubbles of alternating signs with asymptotically vanishing radiation. The scal… ▽ More We construct infinite time bubble tower solutions to the critical wave maps equation taking values in the two-sphere. More precisely, for any integers $k\geq3$ and $J\geq1$, we construct a solution that is global in one time direction, has $k$-corotational symmetry, and asymptotically decomposes into $J$-many concentric bubbles of alternating signs with asymptotically vanishing radiation. The scales of each bubble are of order $t^{-α_{j}}$ with $α_{j}=(\frac{k}{k-2})^{j-1}-1$. This shows the existence of multi-bubble solutions with an arbitrary number of bubbles in soliton resolution, provided that $k\geq3$, global existence in one time direction, and alternating signs are considered. Our proof is based on modulation analysis with the method of backward construction. The key new ingredient is a Morawetz-type functional that provides suitable monotonicity estimates for solutions around multi-bubble configurations. △ Less

Submitted 5 March, 2026; v1 submitted 2 March, 2026; originally announced March 2026.

Comments: 39 pages

MSC Class: 35B44; 35L05; 35L71; 37K40

arXiv:2602.19631 [pdf, ps, other]

Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection

Authors: Uichan Lee, Jeonghyeon Kim, Sangheum Hwang

Abstract: Recent advances in text-to-image (T2I) diffusion models have seen rapid and widespread adoption. However, their powerful generative capabilities raise concerns about potential misuse for synthesizing harmful, private, or copyrighted content. To mitigate such risks, concept erasure techniques have emerged as a promising solution. Prior works have primarily focused on fine-tuning the denoising compo… ▽ More Recent advances in text-to-image (T2I) diffusion models have seen rapid and widespread adoption. However, their powerful generative capabilities raise concerns about potential misuse for synthesizing harmful, private, or copyrighted content. To mitigate such risks, concept erasure techniques have emerged as a promising solution. Prior works have primarily focused on fine-tuning the denoising component (e.g., the U-Net backbone). However, recent causal tracing studies suggest that visual attribute information is localized in the early self-attention layers of the text encoder, indicating a potential alternative for concept erasing. Building on this insight, we conduct preliminary experiments and find that directly fine-tuning early layers can suppress target concepts but often degrades the generation quality of non-target concepts. To overcome this limitation, we propose High-Level Representation Misdirection (HiRM), which misdirects high-level semantic representations of target concepts in the text encoder toward designated vectors such as random directions or semantically defined directions (e.g., supercategories), while updating only early layers that contain causal states of visual attributes. Our decoupling strategy enables precise concept removal with minimal impact on unrelated concepts, as demonstrated by strong results on UnlearnCanvas and NSFW benchmarks across diverse targets (e.g., objects, styles, nudity). HiRM also preserves generative utility at low training cost, transfers to state-of-the-art architectures such as Flux without additional training, and shows synergistic effects with denoiser-based concept erasing methods. △ Less

Submitted 23 February, 2026; originally announced February 2026.

Comments: Accepted at ICLR 2026. The first two authors contributed equally

arXiv:2602.18885 [pdf, ps, other]

Learning Adaptive Perturbation-Conditioned Contexts for Robust Transcriptional Response Prediction

Authors: Yinhua Piao, Hyomin Kim, Seonghwan Kim, Yunhak Oh, Junhyeok Jeon, Sang-Yeon Hwang, Jaechang Lim, Woo Youn Kim, Chanyoung Park, Sungsoo Ahn

Abstract: Predicting high-dimensional transcriptional responses to genetic perturbations is challenging due to severe experimental noise and sparse gene-level effects. Existing methods often suffer from mean collapse, where high correlation is achieved by predicting global average expression rather than perturbation-specific responses, leading to many false positives and limited biological interpretability.… ▽ More Predicting high-dimensional transcriptional responses to genetic perturbations is challenging due to severe experimental noise and sparse gene-level effects. Existing methods often suffer from mean collapse, where high correlation is achieved by predicting global average expression rather than perturbation-specific responses, leading to many false positives and limited biological interpretability. Recent approaches incorporate biological knowledge graphs into perturbation models, but these graphs are typically treated as dense and static, which can propagate noise and obscure true perturbation signals. We propose AdaPert, a perturbation-conditioned framework that addresses mean collapse by explicitly modeling sparsity and biological structure. AdaPert learns perturbation-specific subgraphs from biological knowledge graphs and applies adaptive learning to separate true signals from noise. Across multiple genetic perturbation benchmarks, AdaPert consistently outperforms existing baselines and achieves substantial improvements on DEG-aware evaluation metrics, indicating more accurate recovery of perturbation-specific transcriptional changes. △ Less

Submitted 21 February, 2026; originally announced February 2026.

Comments: 19 pages, 10 figures, 9 tables

arXiv:2602.18271 [pdf, ps, other]

Two-Stage Multiple Test Procedures Controlling False Discovery Rate with auxiliary variable and their Application to Set4Delta Mutant Data

Authors: Seohwa Hwang, Mark Louie Ramos, DoHwan Park, Junyong Park, Johan Lim, Erin Green

Abstract: In this paper, we present novel methodologies that incorporate auxiliary variables for multiple hypotheses testing related to the main point of interest while effectively controlling the false discovery rate. When dealing with multiple tests concerning the primary variable of interest, researchers can use auxiliary variables to set preconditions for the significance of primary variables, thereby e… ▽ More In this paper, we present novel methodologies that incorporate auxiliary variables for multiple hypotheses testing related to the main point of interest while effectively controlling the false discovery rate. When dealing with multiple tests concerning the primary variable of interest, researchers can use auxiliary variables to set preconditions for the significance of primary variables, thereby enhancing test efficacy. Depending on the auxiliary variable's role, we propose two approaches: one terminates testing of the primary variable if it does not meet predefined conditions, and the other adjusts the evaluation criteria based on the auxiliary variable. Employing the copula method, we elucidate the dependence between the auxiliary and primary variables by deriving their joint distribution from individual marginal distributions.Our numerical studies, compared with existing methods, demonstrate that the proposed methodologies effectively control the FDR and yield greater statistical power than previous approaches solely based on the primary variable. As an illustrative example, we apply our methods to the Set4$Δ$ mutant dataset. Our findings highlight the distinctions between our methodologies and traditional approaches, emphasising the potential advantages of our methods in introducing the auxiliary variable for selecting more genes. △ Less

Submitted 20 February, 2026; originally announced February 2026.

Comments: 24 pages, 5 figures

arXiv:2602.18241 [pdf, ps, other]

Online FDR Controlling procedures for statistical SIS Model and its application to COVID19 data

Authors: Seohwa Hwang, Junyong Park

Abstract: We propose an online false discovery rate (FDR) controlling method based on conditional local FDR (LIS), designed for infectious disease datasets that are discrete and exhibit complex dependencies. Unlike existing online FDR methods, which often assume independence or suffer from low statistical power in dependent settings, our approach effectively controls FDR while maintaining high detection pow… ▽ More We propose an online false discovery rate (FDR) controlling method based on conditional local FDR (LIS), designed for infectious disease datasets that are discrete and exhibit complex dependencies. Unlike existing online FDR methods, which often assume independence or suffer from low statistical power in dependent settings, our approach effectively controls FDR while maintaining high detection power in realistic epidemic scenarios. For disease modeling, we establish a Dynamic Bayesian Network (DBN) structure within the Susceptible-Infected-Susceptible (SIS) model, a widely used epidemiological framework for infectious diseases. Our method requires no additional tuning parameters apart from the width of the sliding window, making it practical for real-time disease monitoring. From a statistical perspective, we prove that our method ensures valid FDR control under stationary and ergodic dependencies, extending online hypothesis testing to a broader range of dependent and discrete datasets. Additionally, our method achieves higher statistical power than existing approaches by leveraging LIS, which has been shown to be more powerful than traditional $p$-value-based methods. We validate our method through extensive simulations and real-world applications, including the analysis of infectious disease incidence data. Our results demonstrate that the proposed approach outperforms existing methods by achieving higher detection power while maintaining rigorous FDR control. △ Less

Submitted 20 February, 2026; originally announced February 2026.

Comments: 20 pages, 7 figures

arXiv:2602.18186 [pdf, ps, other]

Box Thirding: Anytime Best Arm Identification under Insufficient Sampling

Authors: Seohwa Hwang, Junyong Park

Abstract: We introduce Box Thirding (B3), a flexible and efficient algorithm for Best Arm Identification (BAI) under fixed-budget constraints. It is designed for both anytime BAI and scenarios with large N, where the number of arms is too large for exhaustive evaluation within a limited budget T. The algorithm employs an iterative ternary comparison: in each iteration, three arms are compared--the best-perf… ▽ More We introduce Box Thirding (B3), a flexible and efficient algorithm for Best Arm Identification (BAI) under fixed-budget constraints. It is designed for both anytime BAI and scenarios with large N, where the number of arms is too large for exhaustive evaluation within a limited budget T. The algorithm employs an iterative ternary comparison: in each iteration, three arms are compared--the best-performing arm is explored further, the median is deferred for future comparisons, and the weakest is discarded. Even without prior knowledge of T, B3 achieves an epsilon-best arm misidentification probability comparable to Successive Halving (SH), which requires T as a predefined parameter, applied to a randomly selected subset of c0 arms that fit within the budget. Empirical results show that B3 outperforms existing methods under limited-budget constraints in terms of simple regret, as demonstrated on the New Yorker Cartoon Caption Contest dataset. △ Less

Submitted 20 February, 2026; originally announced February 2026.

Comments: 29 pages, 5 figures

MSC Class: 62L05

arXiv:2602.17891 [pdf, ps, other]

HookLens: Visual Analytics for Understanding React Hooks Structures

Authors: Suyeon Hwang, Minkyu Kweon, Jeongmin Rhee, Soohyun Lee, Seokhyeon Park, Seokweon Jung, Hyeon Jeon, Jinwook Seo

Abstract: Maintaining and refactoring React web applications is challenging, as React code often becomes complex due to its core API called Hooks. For example, Hooks often lead developers to create complex dependencies among components, making code behavior unpredictable and reducing maintainability, i.e., anti-patterns. To address this challenge, we present HookLens, an interactive visual analytics system… ▽ More Maintaining and refactoring React web applications is challenging, as React code often becomes complex due to its core API called Hooks. For example, Hooks often lead developers to create complex dependencies among components, making code behavior unpredictable and reducing maintainability, i.e., anti-patterns. To address this challenge, we present HookLens, an interactive visual analytics system that helps developers understand howHooks define dependencies and data flows between components. Informed by an iterative design process with experienced React developers, HookLens supports users to efficiently understand the structure and dependencies between components and to identify anti-patterns. A quantitative user study with 12 React developers demonstrates that HookLens significantly improves participants' accuracy in detecting anti-patterns compared to conventional code editors. Moreover, a comparative study with state-of-the-art LLM-based coding assistants confirms that these improvements even surpass the capabilities of such coding assistants on the same task. △ Less

Submitted 22 March, 2026; v1 submitted 19 February, 2026; originally announced February 2026.

Comments: IEEE PacificVis 2026, conference track

arXiv:2602.17186 [pdf, ps, other]

Selective Training for Large Vision Language Models via Visual Information Gain

Authors: Seulbi Lee, Sangheum Hwang

Abstract: Large Vision Language Models (LVLMs) have achieved remarkable progress, yet they often suffer from language bias, producing answers without relying on visual evidence. While prior work attempts to mitigate this issue through decoding strategies, architectural modifications, or curated instruction data, they typically lack a quantitative measure of how much individual training samples or tokens act… ▽ More Large Vision Language Models (LVLMs) have achieved remarkable progress, yet they often suffer from language bias, producing answers without relying on visual evidence. While prior work attempts to mitigate this issue through decoding strategies, architectural modifications, or curated instruction data, they typically lack a quantitative measure of how much individual training samples or tokens actually benefit from the image. In this work, we introduce Visual Information Gain (VIG), a perplexity-based metric that measures the reduction in prediction uncertainty provided by visual input. VIG enables fine-grained analysis at both sample and token levels, effectively highlighting visually grounded elements such as colors, spatial relations, and attributes. Leveraging this, we propose a VIG-guided selective training scheme that prioritizes high-VIG samples and tokens. This approach improves visual grounding and mitigates language bias, achieving superior performance with significantly reduced supervision by focusing exclusively on visually informative samples and tokens. △ Less

Submitted 19 February, 2026; originally announced February 2026.

arXiv:2602.16704 [pdf, ps, other]

Reinforced Fast Weights with Next-Sequence Prediction

Authors: Hee Seung Hwang, Xindi Wu, Sanghyuk Chun, Olga Russakovsky

Abstract: Fast weight architectures offer a promising alternative to attention-based transformers for long-context modeling by maintaining constant memory overhead regardless of context length. However, their potential is limited by the next-token prediction (NTP) training paradigm. NTP optimizes single-token predictions and ignores semantic coherence across multiple tokens following a prefix. Consequently,… ▽ More Fast weight architectures offer a promising alternative to attention-based transformers for long-context modeling by maintaining constant memory overhead regardless of context length. However, their potential is limited by the next-token prediction (NTP) training paradigm. NTP optimizes single-token predictions and ignores semantic coherence across multiple tokens following a prefix. Consequently, fast weight models, which dynamically update their parameters to store contextual information, learn suboptimal representations that fail to capture long-range dependencies. We introduce REFINE (Reinforced Fast weIghts with Next sEquence prediction), a reinforcement learning framework that trains fast weight models under the next-sequence prediction (NSP) objective. REFINE selects informative token positions based on prediction entropy, generates multi-token rollouts, assigns self-supervised sequence-level rewards, and optimizes the model with group relative policy optimization (GRPO). REFINE is applicable throughout the training lifecycle of pre-trained language models: mid-training, post-training, and test-time training. Our experiments on LaCT-760M and DeltaNet-1.3B demonstrate that REFINE consistently outperforms supervised fine-tuning with NTP across needle-in-a-haystack retrieval, long-context question answering, and diverse tasks in LongBench. REFINE provides an effective and versatile framework for improving long-context modeling in fast weight architectures. △ Less

Submitted 18 February, 2026; originally announced February 2026.

arXiv:2602.16147 [pdf, ps, other]

ASPEN: Spectral-Temporal Fusion for Cross-Subject Brain Decoding

Authors: Megan Lee, Seung Ha Hwang, Inhyeok Choi, Shreyas Darade, Mengchun Zhang, Kateryna Shapovalenko

Abstract: Cross-subject generalization in EEG-based brain-computer interfaces (BCIs) remains challenging due to individual variability in neural signals. We investigate whether spectral representations offer more stable features for cross-subject transfer than temporal waveforms. Through correlation analyses across three EEG paradigms (SSVEP, P300, and Motor Imagery), we find that spectral features exhibit… ▽ More Cross-subject generalization in EEG-based brain-computer interfaces (BCIs) remains challenging due to individual variability in neural signals. We investigate whether spectral representations offer more stable features for cross-subject transfer than temporal waveforms. Through correlation analyses across three EEG paradigms (SSVEP, P300, and Motor Imagery), we find that spectral features exhibit consistently higher cross-subject similarity than temporal signals. Motivated by this observation, we introduce ASPEN, a hybrid architecture that combines spectral and temporal feature streams via multiplicative fusion, requiring cross-modal agreement for features to propagate. Experiments across six benchmark datasets reveal that ASPEN is able to dynamically achieve the optimal spectral-temporal balance depending on the paradigm. ASPEN achieves the best unseen-subject accuracy on three of six datasets and competitive performance on others, demonstrating that multiplicative multimodal fusion enables effective cross-subject generalization. △ Less

Submitted 17 February, 2026; originally announced February 2026.

arXiv:2602.13119 [pdf, ps, other]

doi 10.1145/3788062

"It's More of a Lifestyle'': Design Considerations for Supporting Everyday Practices in Community-Based Farming

Authors: Minghe Lu, Zhanming Chen, May Sunmin Hwang, Ji Youn Shin

Abstract: Farming plays a significant role in the economy by supporting related industries such as food, retail, and local services. Community-based small farms, while offering unique social and cultural benefits, face persistent challenges, including limited access to formal education and underdeveloped infrastructure, which have been discussed in prior research. This study focuses on community-driven fact… ▽ More Farming plays a significant role in the economy by supporting related industries such as food, retail, and local services. Community-based small farms, while offering unique social and cultural benefits, face persistent challenges, including limited access to formal education and underdeveloped infrastructure, which have been discussed in prior research. This study focuses on community-driven factors, such as workarounds for recording critical information and practices for passing down farming knowledge across generations. Through 11 semi-structured interviews with farmers from a small ethnic community, the Hmong, we explore how bonding social capital, rooted in close family and community ties, supports informal knowledge exchange and creates pathways to bridging and linking capital. These relationships help farmers connect to broader networks, resources, and institutions. Our findings highlight opportunities for designing technologies that support and strengthen existing support systems. We discuss how technologies should be designed to reflect the cultural values, unique practices, and intergenerational relationships embedded in community-based farms. △ Less

Submitted 13 February, 2026; originally announced February 2026.

Comments: 31 pages, 8 figures, conference

Journal ref: Proc. ACM Hum.-Comput. Interact. 10, 2, Article CSCW026 (April 2026), 31 pages

arXiv:2602.11751 [pdf, ps, other]

Evolution of submillimeter galaxies across cosmic-web environments

Authors: Ankit Kumar, M. Celeste Artale, Antonio D. Montero-Dorta, Lucia Guaita, Joop Schaye, Kyoung-Soo Lee, Alexandra Pope, Facundo Rodriguez, Eric Gawiser, Ho Seong Hwang, Paulina Troncoso Iribarren, Jaehyun Lee, Seong-Kook Lee, Changbom Park, Yujin Yang

Abstract: Submillimeter galaxies (SMGs) provide valuable insights into galaxy formation and evolution and are likely influenced by their cosmic environment. However, their rarity makes environmental trends difficult to establish. We use the FLAMINGO simulation, which simultaneously reproduces the redshift distribution and number counts of SMGs. We use the DisPerSE to identify filamentary structures at… ▽ More Submillimeter galaxies (SMGs) provide valuable insights into galaxy formation and evolution and are likely influenced by their cosmic environment. However, their rarity makes environmental trends difficult to establish. We use the FLAMINGO simulation, which simultaneously reproduces the redshift distribution and number counts of SMGs. We use the DisPerSE to identify filamentary structures at $z=4$, 3, 2, 1.5, and 1. We define inner cluster-halo, outer cluster-halo, inner filament, outer filament, and void/wall environments at each redshift considering mass evolution of cluster-halos and density evolution of filaments. For a fixed stellar-mass cut of $M_* \geq 10^{9}$ M$_{\odot}$, the fraction of SMGs in the inner cluster-halo environment declines from $\sim30\%$ at $z=4$ to $\sim3\%$ by $z=1$, and similar trends are observed in other environments. The abundance of SMGs within a cluster-halo increases with halo mass, mirroring the increase in the total galaxy population. Consequently, the ratio of SMG halo occupation to that of all galaxies is largely insensitive to halo mass, but varies with redshift. In contrast, the ratio of the halo occupation of non-SMGs to that of all galaxies declines with halo mass and shows little redshift evolution. We show that the central and satellite SMGs form two distinct populations in inner cluster-halos. SMGs occupy the metal-rich side of the metallicity distribution, but rarely attain the highest metallicities because ongoing enrichment is limited by gas depletion. The brightest SMGs (S$_{850} > 10$ mJy) are found exclusively in inner cluster-halos, highlighting a strong connection between SMG luminosity and environmental density. Our results show that SMGs dominate star formation in dense environments, contributing up to $80\%$ of the SFR in inner cluster-halos at $z=4$, but less than $50\%$ in low-density regions. △ Less

Submitted 12 February, 2026; originally announced February 2026.

Comments: 13 pages, 6 figures, submitted to A&A [abstract reduced to meet arXiv character limit]

arXiv:2602.10603 [pdf, ps, other]

dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning

Authors: Arnav Shah, Junzhe Li, Parsa Idehpour, Adibvafa Fallahpour, Brandon Wang, Sukjun Hwang, Bo Wang, Patrick D. Hsu, Hani Goodarzi, Albert Gu

Abstract: Genomic foundation models have the potential to decode DNA syntax, yet face a fundamental tradeoff in their input representation. Standard fixed-vocabulary tokenizers fragment biologically meaningful motifs such as codons and regulatory elements, while nucleotide-level models preserve biological coherence but incur prohibitive computational costs for long contexts. We introduce dnaHNet, a state-of… ▽ More Genomic foundation models have the potential to decode DNA syntax, yet face a fundamental tradeoff in their input representation. Standard fixed-vocabulary tokenizers fragment biologically meaningful motifs such as codons and regulatory elements, while nucleotide-level models preserve biological coherence but incur prohibitive computational costs for long contexts. We introduce dnaHNet, a state-of-the-art tokenizer-free autoregressive model that segments and models genomic sequences end-to-end. Using a differentiable dynamic chunking mechanism, dnaHNet compresses raw nucleotides into latent tokens adaptively, balancing compression with predictive accuracy. Pretrained on prokaryotic genomes, dnaHNet outperforms leading architectures including StripedHyena2 in scaling and efficiency. This recursive chunking yields quadratic FLOP reductions, enabling $>3 \times$ inference speedup over Transformers. On zero-shot tasks, dnaHNet achieves superior performance in predicting protein variant fitness and gene essentiality, while automatically discovering hierarchical biological structures without supervision. These results establish dnaHNet as a scalable, interpretable framework for next-generation genomic modeling. △ Less

Submitted 9 April, 2026; v1 submitted 11 February, 2026; originally announced February 2026.

arXiv:2602.08869 [pdf, ps, other]

A cavity-mediated reconfigurable coupling scheme for superconducting qubits

Authors: Shinyoung Hwang, Sangyeon Lee, Eunjong Kim

Abstract: Superconducting qubits have achieved remarkable progress in gate fidelity and coherence, yet their typical nearest-neighbor connectivity presents constraints for implementing complex quantum circuits. Here, we introduce a cavity-mediated coupling architecture in which a shared cavity mode, accessed through tunable qubit-cavity couplers, enables dynamically reconfigurable interactions between non-a… ▽ More Superconducting qubits have achieved remarkable progress in gate fidelity and coherence, yet their typical nearest-neighbor connectivity presents constraints for implementing complex quantum circuits. Here, we introduce a cavity-mediated coupling architecture in which a shared cavity mode, accessed through tunable qubit-cavity couplers, enables dynamically reconfigurable interactions between non-adjacent qubits. By selectively activating the couplers, we demonstrate that high-fidelity iSWAP and CZ gates can be performed within 50 ns with simulated coherent error below $10^{-4}$, while residual $ZZ$ interaction during idling remains below a few kilohertz. Extending to a four-qubit system, we also simulate gates between every qubit pair by selectively enabling the couplers with low qubit crosstalk. This approach provides a practical route toward enhanced interaction flexibility in superconducting quantum processors and may serve as a useful building block for devices that benefit from selective non-local coupling. △ Less

Submitted 9 February, 2026; originally announced February 2026.

arXiv:2602.08283 [pdf, ps, other]

K-DRIFT Science Theme: Galaxies in the Faint Universe

Authors: Woowon Byun, Yongmin Yoon, Jongwan Ko, Yun Hee Lee, Gain Lee, Ho Seong Hwang, Cristiano G. Sabiu, Kwang-il Seon, Kyungwon Chun, Jihye Shin, Jinsu Rhee, Jae-Woo Kim, Jaewon Yoo, Jaehyun Lee, Sang-Hyun Chun, Hong Soo Park, Soung-Chul Yang, Sungryong Hong, Jeehye Shin, Hyowon Kim

Abstract: Low-surface-brightness (LSB) structures serve as evidence of the intricate mass assembly of galaxies, and dedicatedly studying them promises to give us profound insights into the evolutionary history of galaxies. Furthermore, delving into the properties of star formation (SF) in the LSB regime can broaden our understanding of SF activity in regions characterized by low surface gas density, thereby… ▽ More Low-surface-brightness (LSB) structures serve as evidence of the intricate mass assembly of galaxies, and dedicatedly studying them promises to give us profound insights into the evolutionary history of galaxies. Furthermore, delving into the properties of star formation (SF) in the LSB regime can broaden our understanding of SF activity in regions characterized by low surface gas density, thereby shedding light on fundamental cosmic processes. However, systematic uncertainties may hamper the exploration of the LSB universe by limiting detectable SB levels. Indeed, despite dedicated advancements in telescope and observing techniques over decades, achieving ultra-deep photometric depths in optical wavelengths remains a formidable challenge. To overcome this challenge and explore the LSB universe that we have yet to see, we have been developing a novel telescope called K-DRIFT. This paper outlines the telescope's specification and describes various LSB features we aim for, explicitly focusing on nearby individual galaxies. To further advance the capabilities of the K-DRIFT survey, focused on LSB detection, we present several feasible research topics that utilize other survey data together and discuss the role of LSB observation in understanding the evolution of galaxies. △ Less

Submitted 9 February, 2026; originally announced February 2026.

Comments: 23 pages, 14 figures, Accepted for publication in JKAS

arXiv:2602.07408 [pdf, ps, other]

Progressive Multi-Agent Reasoning for Biological Perturbation Prediction

Authors: Hyomin Kim, Sang-Yeon Hwang, Jaechang Lim, Yinhua Piao, Yunhak Oh, Woo Youn Kim, Chanyoung Park, Sungsoo Ahn, Junhyeok Jeon

Abstract: Predicting gene regulation responses to biological perturbations requires reasoning about underlying biological causalities. While large language models (LLMs) show promise for such tasks, they are often overwhelmed by the entangled nature of high-dimensional perturbation results. Moreover, recent works have primarily focused on genetic perturbations in single-cell experiments, leaving bulk-cell c… ▽ More Predicting gene regulation responses to biological perturbations requires reasoning about underlying biological causalities. While large language models (LLMs) show promise for such tasks, they are often overwhelmed by the entangled nature of high-dimensional perturbation results. Moreover, recent works have primarily focused on genetic perturbations in single-cell experiments, leaving bulk-cell chemical perturbations, which is central to drug discovery, largely unexplored. Motivated by this, we present LINCSQA, a novel benchmark for predicting target gene regulation under complex chemical perturbations in bulk-cell environments. We further propose PBio-Agent, a multi-agent framework that integrates difficulty-aware task sequencing with iterative knowledge refinement. Our key insight is that genes affected by the same perturbation share causal structure, allowing confidently predicted genes to contextualize more challenging cases. The framework employs specialized agents enriched with biological knowledge graphs, while a synthesis agent integrates outputs and specialized judges ensure logical coherence. PBio-Agent outperforms existing baselines on both LINCSQA and PerturbQA, enabling even smaller models to predict and explain complex biological processes without additional training. △ Less

Submitted 7 February, 2026; originally announced February 2026.

Comments: 17 pages, 4 figures, 9 tables

arXiv:2602.06274 [pdf, ps, other]

First results from the search for an excess of $\barν_{e}$ events in JSNS$^2$

Authors: D. H. Lee, S. Ajimura, A. Antonakis, M. Botran, M. K. Cheoun, J. H. Choi, J. W. Choi, J. Y. Choi, T. Dodo, H. Furuta, J. H. Goh, M. Harada, S. Hasegawa, Y. Hino, T. Hiraiwa, W. S. Hwang, T. Iida, E. Iwai, S. Iwata, H. I. Jang, J. S. Jang, M. C. Jang, H. K. Jeon, S. H. Jeon, K. K. Joo , et al. (57 additional authors not shown)

Abstract: The JSNS$^2$ (J-PARC Sterile Neutrino Search at the J-PARC Spallation Neutron Source) experiment at the Material and Life Science Facility (MLF) of J-PARC is designed to directly test an excess on $\barν_{e}$ events which was indicated by LSND (Liquid Scintillator Neutrino Detector). The combination of a short-pulsed proton beam and a gadolinium-loaded liquid scintillator provides an excellent sig… ▽ More The JSNS$^2$ (J-PARC Sterile Neutrino Search at the J-PARC Spallation Neutron Source) experiment at the Material and Life Science Facility (MLF) of J-PARC is designed to directly test an excess on $\barν_{e}$ events which was indicated by LSND (Liquid Scintillator Neutrino Detector). The combination of a short-pulsed proton beam and a gadolinium-loaded liquid scintillator provides an excellent signal-to-noise ratio. In this article, we report the first results of a direct test based on data collected in 2022. After applying all event selection criteria, two events are observed, consistent with the expected background of 2.3$\pm$0.4 events. No excess of $\barν_e$ events are seen in this report, however the expected number of events due to LSND anomaly is 1.1$\pm$0.5, thus this result is not yet conclusive. Data taking has been ongoing since 2021 and will continue in future runs. In addition, a new far detector has recently been constructed for the second phase experiment, JSNS$^2$-II, marking an important milestone toward forthcoming measurements. △ Less

Submitted 5 February, 2026; originally announced February 2026.

arXiv:2602.06211 [pdf, ps, other]

DroneKey++: A Size Prior-free Method and New Benchmark for Drone 3D Pose Estimation from Sequential Images

Authors: Seo-Bin Hwang, Yeong-Jun Cho

Abstract: Accurate 3D pose estimation of drones is essential for security and surveillance systems. However, existing methods often rely on prior drone information such as physical sizes or 3D meshes. At the same time, current datasets are small-scale, limited to single models, and collected under constrained environments, which makes reliable validation of generalization difficult. We present DroneKey++, a… ▽ More Accurate 3D pose estimation of drones is essential for security and surveillance systems. However, existing methods often rely on prior drone information such as physical sizes or 3D meshes. At the same time, current datasets are small-scale, limited to single models, and collected under constrained environments, which makes reliable validation of generalization difficult. We present DroneKey++, a prior-free framework that jointly performs keypoint detection, drone classification, and 3D pose estimation. The framework employs a keypoint encoder for simultaneous keypoint detection and classification, and a pose decoder that estimates 3D pose using ray-based geometric reasoning and class embeddings. To address dataset limitations, we construct 6DroneSyn, a large-scale synthetic benchmark with over 50K images covering 7 drone models and 88 outdoor backgrounds, generated using 360-degree panoramic synthesis. Experiments show that DroneKey++ achieves MAE 17.34 deg and MedAE 17.1 deg for rotation, MAE 0.135 m and MedAE 0.242 m for translation, with inference speeds of 19.25 FPS (CPU) and 414.07 FPS (GPU), demonstrating both strong generalization across drone models and suitability for real-time applications. The dataset is publicly available. △ Less

Submitted 5 February, 2026; originally announced February 2026.

Comments: 8 page, 5 figures, 6 tables, Accepted to ICRA 2026 (to appear)

arXiv:2602.03358 [pdf, ps, other]

GFlowPO: Generative Flow Network as a Language Model Prompt Optimizer

Authors: Junmo Cho, Suhan Kim, Sangjune An, Minsu Kim, Dong Bok Lee, Heejun Lee, Sung Ju Hwang, Hae Beom Lee

Abstract: Finding effective prompts for language models (LMs) is critical yet notoriously difficult: the prompt space is combinatorially large, rewards are sparse due to expensive target-LM evaluation. Yet, existing RL-based prompt optimizers often rely on on-policy updates and a meta-prompt sampled from a fixed distribution, leading to poor sample efficiency. We propose GFlowPO, a probabilistic prompt opti… ▽ More Finding effective prompts for language models (LMs) is critical yet notoriously difficult: the prompt space is combinatorially large, rewards are sparse due to expensive target-LM evaluation. Yet, existing RL-based prompt optimizers often rely on on-policy updates and a meta-prompt sampled from a fixed distribution, leading to poor sample efficiency. We propose GFlowPO, a probabilistic prompt optimization framework that casts prompt search as a posterior inference problem over latent prompts regularized by a meta-prompted reference-LM prior. In the first step, we fine-tune a lightweight prompt-LM with an off-policy Generative Flow Network (GFlowNet) objective, using a replay-based training policy that reuses past prompt evaluations to enable sample-efficient exploration. In the second step, we introduce Dynamic Memory Update (DMU), a training-free mechanism that updates the meta-prompt by injecting both (i) diverse prompts from a replay buffer and (ii) top-performing prompts from a small priority queue, thereby progressively concentrating the search process on high-reward regions. Across few-shot text classification, instruction induction benchmarks, and question answering tasks, GFlowPO consistently outperforms recent discrete prompt optimization baselines. △ Less

Submitted 3 February, 2026; originally announced February 2026.

arXiv:2601.23143 [pdf, ps, other]

THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

Authors: Seanie Lee, Sangwoo Park, Yumin Choi, Gyeongman Kim, Minki Kang, Jihun Yun, Dongmin Park, Jongho Park, Sung Ju Hwang

Abstract: Large reasoning models (LRMs) achieve remarkable performance by leveraging reinforcement learning (RL) on reasoning tasks to generate long chain-of-thought (CoT) reasoning. However, this over-optimization often prioritizes compliance, making models vulnerable to harmful prompts. To mitigate this safety degradation, recent approaches rely on external teacher distillation, yet this introduces a dist… ▽ More Large reasoning models (LRMs) achieve remarkable performance by leveraging reinforcement learning (RL) on reasoning tasks to generate long chain-of-thought (CoT) reasoning. However, this over-optimization often prioritizes compliance, making models vulnerable to harmful prompts. To mitigate this safety degradation, recent approaches rely on external teacher distillation, yet this introduces a distributional discrepancy that degrades native reasoning. We propose ThinkSafe, a self-generated alignment framework that restores safety alignment without external teachers. Our key insight is that while compliance suppresses safety mechanisms, models often retain latent knowledge to identify harm. ThinkSafe unlocks this via lightweight refusal steering, guiding the model to generate in-distribution safety reasoning traces. Fine-tuning on these self-generated responses effectively realigns the model while minimizing distribution shift. Experiments on DeepSeek-R1-Distill and Qwen3 show ThinkSafe significantly improves safety while preserving reasoning proficiency. Notably, it achieves superior safety and comparable reasoning to GRPO, with significantly reduced computational cost. Code, models, and datasets are available at https://github.com/seanie12/ThinkSafe.git. △ Less

Submitted 30 January, 2026; originally announced January 2026.

Comments: 17 pages, 13 figures

arXiv:2601.22067 [pdf, ps, other]

Projective reflection groups of finite covolume

Authors: Balthazar Fléchelles, Seunghoon Hwang

Abstract: We show that the Coxeter polytopes that have finite volume in their Vinberg domains are exactly the quasiperfect Coxeter polytopes of negative type, i.e. the Coxeter polytopes that are contained in their properly convex Vinberg domain, at the exception of some vertices that are C^1 points of the boundary. As a corollary, we show that for reflection groups à la Vinberg, the Vinberg domain is the on… ▽ More We show that the Coxeter polytopes that have finite volume in their Vinberg domains are exactly the quasiperfect Coxeter polytopes of negative type, i.e. the Coxeter polytopes that are contained in their properly convex Vinberg domain, at the exception of some vertices that are C^1 points of the boundary. As a corollary, we show that for reflection groups à la Vinberg, the Vinberg domain is the only invariant properly convex domain if and only if the action is of finite covolume on the Vinberg domain and the dimension is at least 2. △ Less

Submitted 3 March, 2026; v1 submitted 29 January, 2026; originally announced January 2026.

Comments: 21 pages, 1 figure, rewrote section 7

MSC Class: 51F15; 57N16; 53C60; 22E40; 57S30

arXiv:2601.21699 [pdf, ps, other]

Can David Beat Goliath? On Multi-Hop Reasoning with Resource-Constrained Agents

Authors: Hojae Han, Heeyun Jung, Jongyoon Kim, Seung-won Hwang

Abstract: While reinforcement learning (RL) has empowered multi-turn reasoning agents with retrieval and tools, existing successes largely depend on extensive on-policy rollouts in high-cost, high-accuracy regimes. Under realistic resource constraints that cannot support large models or dense explorations, however, small language model agents fall into a low-cost, low-accuracy regime, where limited rollout… ▽ More While reinforcement learning (RL) has empowered multi-turn reasoning agents with retrieval and tools, existing successes largely depend on extensive on-policy rollouts in high-cost, high-accuracy regimes. Under realistic resource constraints that cannot support large models or dense explorations, however, small language model agents fall into a low-cost, low-accuracy regime, where limited rollout budgets lead to sparse exploration, sparse credit assignment, and unstable training. In this work, we challenge this trade-off and show that small language models can achieve strong multi-hop reasoning under resource constraints. We introduce DAVID-GRPO, a budget-efficient RL framework that (i) stabilizes early learning with minimal supervision, (ii) assigns retrieval credit based on evidence recall, and (iii) improves exploration by resampling truncated near-miss trajectories. Evaluated on agents up to 1.5B parameters trained on only four RTX 3090 GPUs, DAVID-GRPO consistently outperforms prior RL methods designed for large-scale settings on six multi-hop QA benchmarks. These results show that with the right inductive biases, small agents can achieve low training cost with high accuracy. △ Less

Submitted 29 January, 2026; originally announced January 2026.

Comments: Preprint

arXiv:2601.21329 [pdf, ps, other]

A redshift survey of the nearby galaxy cluster Abell 2199 : No upturn of the faint-end slope of galaxy luminosity function

Authors: Jong-In Park, Hyunmi Song, Ho Seong Hwang

Abstract: We determine the galaxy luminosity function of cluster galaxies in the nearby galaxy cluster Abell 2199 (A2199), focusing on the faint-end slope down to $M_r \sim -14.5$. To achieve this, we augment the existing dataset by adding redshift data from our deep MMT/Hectospec survey and from the Dark Energy Spectroscopic Instrument (DESI), significantly improving the spectroscopic completeness down to… ▽ More We determine the galaxy luminosity function of cluster galaxies in the nearby galaxy cluster Abell 2199 (A2199), focusing on the faint-end slope down to $M_r \sim -14.5$. To achieve this, we augment the existing dataset by adding redshift data from our deep MMT/Hectospec survey and from the Dark Energy Spectroscopic Instrument (DESI), significantly improving the spectroscopic completeness down to $r_{\mathrm{petro},0} = 20.8$ within the central $30^\prime$ region. The resulting luminosity function is well described by a Schechter function with a characteristic magnitude $M^* = -21.30 \pm 0.27$ and a faint-end slope $α= -1.23 \pm 0.05$. This faint-end slope is consistent with those measured in the nearby Coma and Virgo clusters and in a cluster from the TNG50 cosmological simulation, and is slightly shallower than that of field galaxies. These findings indicate that the previously claimed steep faint-end upturn (with $α\sim -2$) in nearby galaxy clusters is not supported. Instead, they indicate that environmental processes in dense cluster cores does not seem to trigger the formation or survival of low-mass galaxies, thereby preventing a steep faint-end upturn in the luminosity function. △ Less

Submitted 23 March, 2026; v1 submitted 29 January, 2026; originally announced January 2026.

Comments: Accepted for publication in JKAS 12 pages, 7 figures, 4 tables

arXiv:2601.19939 [pdf, ps, other]

oculomix: Hierarchical Sampling for Retinal-Based Systemic Disease Prediction

Authors: Hyunmin Kim, Yukun Zhou, Rahul A. Jonas, Lie Ju, Sunjin Hwang, Pearse A. Keane, Siegfried K. Wagner

Abstract: Oculomics - the concept of predicting systemic diseases, such as cardiovascular disease and dementia, through retinal imaging - has advanced rapidly due to the data efficiency of transformer-based foundation models like RETFound. Image-level mixed sample data augmentations, such as CutMix and MixUp, are frequently used for training transformers, yet these techniques perturb patient-specific attrib… ▽ More Oculomics - the concept of predicting systemic diseases, such as cardiovascular disease and dementia, through retinal imaging - has advanced rapidly due to the data efficiency of transformer-based foundation models like RETFound. Image-level mixed sample data augmentations, such as CutMix and MixUp, are frequently used for training transformers, yet these techniques perturb patient-specific attributes, such as medical comorbidity and clinical factors, since they only account for images and labels. To address this limitation, we propose a hierarchical sampling strategy, Oculomix, for mixed sample augmentations. Our method is based on two clinical priors. First (exam level), images acquired from the same patient at the same time point share the same attributes. Second (patient level), images acquired from the same patient at different time points have a soft temporal trend, as morbidity generally increases over time. Guided by these priors, our method constrains the mixing space to the patient and exam levels to better preserve patient-specific characteristics and leverages their hierarchical relationships. The proposed method is validated using ViT models on a five-year prediction of major adverse cardiovascular events (MACE) in a large ethnically diverse population (Alzeye). We show that Oculomix consistently outperforms image-level CutMix and MixUp by up to 3% in AUROC, demonstrating the necessity and value of the proposed method in oculomics. △ Less

Submitted 16 January, 2026; originally announced January 2026.

Comments: Accepted to ISBI 2026

arXiv:2601.19151 [pdf, ps, other]

TS-Debate: Multimodal Collaborative Debate for Zero-Shot Time Series Reasoning

Authors: Patara Trirat, Jin Myung Kwak, Jay Heo, Heejun Lee, Sung Ju Hwang

Abstract: Recent progress at the intersection of large language models (LLMs) and time series (TS) analysis has revealed both promise and fragility. While LLMs can reason over temporal structure given carefully engineered context, they often struggle with numeric fidelity, modality interference, and principled cross-modal integration. We present TS-Debate, a modality-specialized, collaborative multi-agent d… ▽ More Recent progress at the intersection of large language models (LLMs) and time series (TS) analysis has revealed both promise and fragility. While LLMs can reason over temporal structure given carefully engineered context, they often struggle with numeric fidelity, modality interference, and principled cross-modal integration. We present TS-Debate, a modality-specialized, collaborative multi-agent debate framework for zero-shot time series reasoning. TS-Debate assigns dedicated expert agents to textual context, visual patterns, and numerical signals, preceded by explicit domain knowledge elicitation, and coordinates their interaction via a structured debate protocol. Reviewer agents evaluate agent claims using a verification-conflict-calibration mechanism, supported by lightweight code execution and numerical lookup for programmatic verification. This architecture preserves modality fidelity, exposes conflicting evidence, and mitigates numeric hallucinations without task-specific fine-tuning. Across 20 tasks spanning three public benchmarks, TS-Debate achieves consistent and significant performance improvements over strong baselines, including standard multimodal debate in which all agents observe all inputs. △ Less

Submitted 26 January, 2026; originally announced January 2026.

Comments: Code will be available at https://github.com/DeepAuto-AI/TS-Debate

Showing 1–50 of 1,125 results for author: Hwang, S