Skip to main content

Showing 1–50 of 485 results for author: Cho, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.12026  [pdf, ps, other

    cs.LG q-bio.BM q-bio.QM

    TriFit: Trimodal Fusion with Protein Dynamics for Mutation Fitness Prediction

    Authors: Seungik Cho

    Abstract: Predicting the functional impact of single amino acid substitutions (SAVs) is central to understanding genetic disease and engineering therapeutic proteins. While protein language models and structure-based methods have achieved strong performance on this task, they systematically neglect protein dynamics; residue flexibility, correlated motions, and allosteric coupling are well-established determ… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

  2. arXiv:2604.06938  [pdf, ps, other

    cs.CV

    POS-ISP: Pipeline Optimization at the Sequence Level for Task-aware ISP

    Authors: Jiyun Won, Heemin Yang, Woohyeok Kim, Jungseul Ok, Sunghyun Cho

    Abstract: Recent work has explored optimizing image signal processing (ISP) pipelines for various tasks by composing predefined modules and adapting them to task-specific objectives. However, jointly optimizing module sequences and parameters remains challenging. Existing approaches rely on neural architecture search (NAS) or step-wise reinforcement learning (RL), but NAS suffers from a training-inference m… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

  3. arXiv:2604.04971  [pdf, ps, other

    cs.LG math.NA physics.comp-ph

    A Theory-guided Weighted $L^2$ Loss for solving the BGK model via Physics-informed neural networks

    Authors: Gyounghun Ko, Sung-Jun Son, Seung Yeon Cho, Myeong-Su Lee

    Abstract: While Physics-Informed Neural Networks offer a promising framework for solving partial differential equations, the standard $L^2$ loss formulation is fundamentally insufficient when applied to the Bhatnagar-Gross-Krook (BGK) model. Specifically, simply minimizing the standard loss does not guarantee accurate predictions of the macroscopic moments, causing the approximate solutions to fail in captu… ▽ More

    Submitted 4 April, 2026; originally announced April 2026.

    Comments: 26 pages, 9 figures

    MSC Class: 68T07 (Primary) 82C40; 65M12; 65M70; 65M99 (Secondary)

  4. arXiv:2604.01666  [pdf, ps, other

    cs.CV

    DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data

    Authors: Wonjoon Jin, Jiyun Won, Janghyeok Han, Qi Dai, Chong Luo, Seung-Hwan Baek, Sunghyun Cho

    Abstract: Despite recent progress, video diffusion models still struggle to synthesize realistic videos involving highly dynamic motions or requiring fine-grained motion controllability. A central limitation lies in the scarcity of such examples in commonly used training datasets. To address this, we introduce DynaVid, a video synthesis framework that leverages synthetic motion data in training, which is re… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

    Comments: Accepted to CVPR 2026. Website: https://jinwonjoon.github.io/DynaVid/

  5. arXiv:2604.01411  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Test-Time Scaling Makes Overtraining Compute-Optimal

    Authors: Nicholas Roberts, Sungjun Cho, Zhiqi Gao, Tzu-Heng Huang, Albert Wu, Gabriel Orlanski, Avi Trost, Kelly Buchanan, Aws Albarghouthi, Frederic Sala

    Abstract: Modern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address. We present Train-to-Test ($T^2$) scaling laws that jointly optimize model size, training tokens, and number of inference samples under fixed end-to-end budgets. $T^2$ modernizes… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  6. arXiv:2603.27176  [pdf, ps, other

    cs.CV

    MEDIC-AD: Towards Medical Vision-Language Model's Clinical Intelligence

    Authors: Woohyeon Park, Jaeik Kim, Sunghwan Steve Cho, Pa Hong, Wookyoung Jeong, Yoojin Nam, Namjoon Kim, Ginny Y. Wong, Ka Chun Cheung, Jaeyoung Do

    Abstract: Lesion detection, symptom tracking, and visual explainability are central to real-world medical image analysis, yet current medical Vision-Language Models (VLMs) still lack mechanisms that translate their broad knowledge into clinically actionable outputs. To bridge this gap, we present MEDIC-AD, a clinically oriented VLM that strengthens these three capabilities through a stage-wise framework. Fi… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

    Journal ref: CVPR 2026

  7. arXiv:2603.23611  [pdf, ps, other

    cs.SE cs.AI cs.CL cs.LG

    LLMORPH: Automated Metamorphic Testing of Large Language Models

    Authors: Steven Cho, Stefano Ruberto, Valerio Terragni

    Abstract: Automated testing is essential for evaluating and improving the reliability of Large Language Models (LLMs), yet the lack of automated oracles for verifying output correctness remains a key challenge. We present LLMORPH, an automated testing tool specifically designed for LLMs performing NLP tasks, which leverages Metamorphic Testing (MT) to uncover faulty behaviors without relying on human-labele… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

    Comments: Accepted for publication in the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025). This arXiv version is the authors' accepted manuscript. DOI: 10.1109/ASE63991.2025.00385 Code: github.com/steven-b-cho/llmorph

    ACM Class: D.2.5; I.2.7

    Journal ref: 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025)

  8. arXiv:2603.22765  [pdf, ps, other

    cs.CL cs.AI cs.IR

    DALDALL: Data Augmentation for Lexical and Semantic Diverse in Legal Domain by leveraging LLM-Persona

    Authors: Janghyeok Choi, Jaewon Lee, Sungzoon Cho

    Abstract: Data scarcity remains a persistent challenge in low-resource domains. While existing data augmentation methods leverage the generative capabilities of large language models (LLMs) to produce large volumes of synthetic data, these approaches often prioritize quantity over quality and lack domain-specific strategies. In this work, we introduce DALDALL, a persona-based data augmentation framework tai… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

  9. arXiv:2603.21784  [pdf, ps, other

    cs.CV

    Dynamic Exposure Burst Image Restoration

    Authors: Woohyeok Kim, Jaesung Rim, Daeyeon Kim, Sunghyun Cho

    Abstract: Burst image restoration aims to reconstruct a high-quality image from burst images, which are typically captured using manually designed exposure settings. Although these exposure settings significantly influence the final restoration performance, the problem of finding optimal exposure settings has been overlooked. In this paper, we present Dynamic Exposure Burst Image Restoration (DEBIR), a nove… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

  10. arXiv:2603.21466  [pdf, ps, other

    cs.OS cs.DB

    GateANN: I/O-Efficient Filtered Vector Search on SSDs

    Authors: Nakyung Lee, Soobin Cho, Jiwoong Park, Gyuyeong Kim

    Abstract: We present GateANN, an I/O-efficient SSD-based graph ANNS system that supports filtered vector search on an unmodified graph index. Existing SSD-based systems either waste I/O by post-filtering, or require expensive filter-aware index rebuilds. GateANN avoids both by decoupling graph traversal from vector retrieval. Our key insight is that traversing a node requires only its neighbor list and an a… ▽ More

    Submitted 26 March, 2026; v1 submitted 22 March, 2026; originally announced March 2026.

  11. arXiv:2603.13403  [pdf, ps, other

    cs.CV

    Diabetic Retinopathy Grading with CLIP-based Ranking-Aware Adaptation:A Comparative Study on Fundus Image

    Authors: Sungjun Cho

    Abstract: Diabetic retinopathy (DR) is a leading cause of preventable blindness, and automated fundus image grading can play an important role in large-scale screening. In this work, we investigate three CLIP-based approaches for five-class DR severity grading: (1) a zero-shot baseline using prompt engineering, (2) a hybrid FCN-CLIP model augmented with CBAM attention, and (3) a ranking-aware prompting mode… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

  12. arXiv:2603.12461  [pdf

    cs.AR

    System-Technology Co-Optimization of Bitline Routing and Bonding Pathways in Monolithic 3D DRAM Architectures

    Authors: Kiseok Lee, Sungwon Cho, Seongkwang Lim, Suman Datta, Shimeng Yu

    Abstract: 3D DRAM has emerged as a promising approach for continued density scaling, but its viability is limited by routing and hybrid bonding constraints to periphery, which may degrade sensing margin, latency, and array efficiency. With device characteristics and array parasitics extracted from TCAD, SPICE simulations are performed with peri logic in a CMOS-Bonded-Array (CBA). The analysis shows that the… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

    Comments: 4 pages, 9 figures, 1 table

  13. arXiv:2603.09257  [pdf, ps, other

    cs.LG stat.ML

    Transductive Generalization via Optimal Transport and Its Application to Graph Node Classification

    Authors: MoonJeong Park, Seungbeom Lee, Kyungmin Kim, Jaeseung Heo, Seunghyuk Cho, Shouheng Li, Sangdon Park, Dongwoo Kim

    Abstract: Many existing transductive bounds rely on classical complexity measures that are computationally intractable and often misaligned with empirical behavior. In this work, we establish new representation-based generalization bounds in a distribution-free transductive setting, where learned representations are dependent, and test features are accessible during training. We derive global and class-wise… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

  14. arXiv:2603.06881  [pdf

    cs.LG cs.AI physics.comp-ph

    Physics-informed AI Accelerated Retention Analysis of Ferroelectric Vertical NAND: From Day-Scale TCAD to Second-Scale Surrogate Model

    Authors: Gyujun Jeong, Sungwon Cho, Minji Shon, Namhoon Kim, Woohyun Hwang, Kwangyou Seo, Suhwan Lim, Wanki Kim, Daewon Ha, Prasanna Venkatesan, Kihang Youn, Ram Cherukuri, Yiyi Wang, Suman Datta, Asif Khan, Shimeng Yu

    Abstract: Ferroelectric field-effect transistors (FeFET)-based vertical NAND (Fe-VNAND) has emerged as a promising candidate to overcome z-scaling limitations with lower programming voltages. However, the data retention of 3D Fe-VNAND is hindered by the complex interaction between charge detrapping and ferroelectric depolarization. Developing optimized device designs requires exploring an extensive paramete… ▽ More

    Submitted 13 April, 2026; v1 submitted 6 March, 2026; originally announced March 2026.

    Comments: 4 pages, 6 figures, to be published in ICMC (International Compact Modeling Conference)

  15. arXiv:2603.00423  [pdf, ps, other

    cs.CV cs.AI

    An Interpretable Local Editing Model for Counterfactual Medical Image Generation

    Authors: Hyungi Min, Taeseung You, Hangyeul Lee, Yeongjae Cho, Sungzoon Cho

    Abstract: Counterfactual medical image generation have emerged as a critical tool for enhancing AI-driven systems in medical domain by answering "what-if" questions. However, existing approaches face two fundamental limitations: First, they fail to prevent unintended modifications, resulting collateral changes in demographic attributes when only disease features should be affected. Second, they lack interpr… ▽ More

    Submitted 27 February, 2026; originally announced March 2026.

  16. arXiv:2602.24156  [pdf, ps, other

    cs.RO

    Humanoid Robots as First Assistants in Endoscopic Surgery

    Authors: Sue Min Cho, Jan Emily Mangulabnan, Han Zhang, Zhekai Mao, Yufan He, Pengfei Guo, Daguang Xu, Gregory Hager, Masaru Ishii, Mathias Unberath

    Abstract: Humanoid robots have become a focal point of technological ambition, with claims of surgical capability within years in mainstream discourse. These projections are aspirational yet lack empirical grounding. To date, no humanoid has assisted a surgeon through an actual procedure, let alone performed one. The work described here breaks this new ground. Here we report a proof of concept in which a te… ▽ More

    Submitted 27 February, 2026; originally announced February 2026.

  17. arXiv:2602.21760  [pdf, ps, other

    cs.CV

    Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

    Authors: Euisoo Jung, Byunghyun Kim, Hyunjin Kim, Seonghye Cho, Jae-Gil Lee

    Abstract: Diffusion models have achieved remarkable progress in high-fidelity image, video, and audio generation, yet inference remains computationally expensive. Nevertheless, current diffusion acceleration methods based on distributed parallelism suffer from noticeable generation artifacts and fail to achieve substantial acceleration proportional to the number of GPUs. Therefore, we propose a hybrid paral… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

  18. arXiv:2602.21565  [pdf, ps, other

    cs.LG

    Training-free Composition of Pre-trained GFlowNets for Multi-Objective Generation

    Authors: Seokwon Yoon, Youngbin Choi, Seunghyuk Cho, Seungbeom Lee, MoonJeong Park, Dongwoo Kim

    Abstract: Generative Flow Networks (GFlowNets) learn to sample diverse candidates in proportion to a reward function, making them well-suited for scientific discovery, where exploring multiple promising solutions is crucial. Further extending GFlowNets to multi-objective settings has attracted growing interest since real-world applications often involve multiple, conflicting objectives. However, existing ap… ▽ More

    Submitted 24 February, 2026; originally announced February 2026.

    Comments: 22 pages, 12 figures, 12 tables

  19. arXiv:2602.19124  [pdf, ps, other

    cs.HC

    Dark and Bright Side of Participatory Red-Teaming with Targets of Stereotyping for Eliciting Harmful Behaviors from Large Language Models

    Authors: Sieun Kim, Yeeun Jo, Sungmin Na, Hyunseung Lim, Eunchae Lee, Yu Min Choi, Soohyun Cho, Hwajung Hong

    Abstract: Red-teaming, where adversarial prompts are crafted to expose harmful behaviors and assess risks, offers a dynamic approach to surfacing underlying stereotypical bias in large language models. Because such subtle harms are best recognized by those with lived experience, involving targets of stereotyping as red-teamers is essential. However, critical challenges remain in leveraging their lived exper… ▽ More

    Submitted 22 February, 2026; originally announced February 2026.

    Comments: 20 pages, 4 tables, 3 figures. Accepted to CHI 2026, April 13-17, 2026, Barcelona, Spain

  20. arXiv:2602.19101  [pdf, ps, other

    cs.CL cs.AI

    Value Entanglement: Conflation Between Different Kinds of Good In (Some) Large Language Models

    Authors: Seong Hah Cho, Junyi Li, Anna Leshinskaya

    Abstract: Value alignment of Large Language Models (LLMs) requires us to empirically measure these models' actual, acquired representation of value. Among the characteristics of value representation in humans is that they distinguish among value of different kinds. We investigate whether LLMs likewise distinguish three different kinds of good: moral, grammatical, and economic. By probing model behavior, emb… ▽ More

    Submitted 22 February, 2026; originally announced February 2026.

  21. arXiv:2602.18813  [pdf, ps, other

    cs.RO cs.LG

    Habilis-$β$: A Fast-Motion and Long-Lasting On-Device Vision-Language-Action Model

    Authors: Tommoro Robotics, :, Jesoon Kang, Taegeon Park, Jisu An, Soo Min Kimm, Jaejoon Kim, Jinu Pahk, Byungju Kim, Junseok Lee, Namheon Baek, Sungwan Ha, Hojun Baek, Eduardo Ayerve Cruz, Wontae Kim, Junghyeon Choi, Yousuk Lee, Joonmo Han, Sunghyun Cho, Sunghyun Kwon, Soyoung Lee, Jun Ki Lee, Seung-Joon Yi, Byoung-Tak Zhang, Theo Taeyeong Kim

    Abstract: We introduce Habilis-$β$, a fast-motion and long-lasting on-device vision-language-action (VLA) model designed for real-world deployment. Current VLA evaluation remains largely confined to single-trial success rates under curated resets, which fails to capture the fast-motion and long-lasting capabilities essential for practical operation. To address this, we introduce the Productivity-Reliability… ▽ More

    Submitted 21 February, 2026; originally announced February 2026.

  22. arXiv:2602.18721  [pdf, ps, other

    cs.CL eess.AS

    ReHear: Iterative Pseudo-Label Refinement for Semi-Supervised Speech Recognition via Audio Large Language Models

    Authors: Zefang Liu, Chenyang Zhu, Sangwoo Cho, Shi-Xiong Zhang

    Abstract: Semi-supervised learning in automatic speech recognition (ASR) typically relies on pseudo-labeling, which often suffers from confirmation bias and error accumulation due to noisy supervision. To address this limitation, we propose ReHear, a framework for iterative pseudo-label refinement that integrates an instruction-tuned, audio-aware large language model (LLM) into the self-training loop. Unlik… ▽ More

    Submitted 21 February, 2026; originally announced February 2026.

  23. arXiv:2602.17855  [pdf, ps, other

    eess.IV cs.CV cs.LG

    TopoGate: Quality-Aware Topology-Stabilized Gated Fusion for Longitudinal Low-Dose CT New-Lesion Prediction

    Authors: Seungik Cho

    Abstract: Longitudinal low-dose CT follow-ups vary in noise, reconstruction kernels, and registration quality. These differences destabilize subtraction images and can trigger false new lesion alarms. We present TopoGate, a lightweight model that combines the follow-up appearance view with the subtraction view and controls their influence through a learned, quality-aware gate. The gate is driven by three ca… ▽ More

    Submitted 19 February, 2026; originally announced February 2026.

  24. arXiv:2602.16626  [pdf, ps, other

    cs.LG cs.AI q-bio.NC

    A Systematic Evaluation of Sample-Level Tokenization Strategies for MEG Foundation Models

    Authors: SungJun Cho, Chetan Gohil, Rukuang Huang, Oiwi Parker Jones, Mark W. Woolrich

    Abstract: Recent success in natural language processing has motivated growing interest in large-scale foundation models for neuroimaging data. Such models often require discretization of continuous neural time series data, a process referred to as 'tokenization'. However, the impact of different tokenization strategies for neural data is currently poorly understood. In this work, we present a systematic eva… ▽ More

    Submitted 18 February, 2026; originally announced February 2026.

    Comments: 15 pages, 10 figures, 1 table

  25. arXiv:2602.10437  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Control Reinforcement Learning: Interpretable Token-Level Steering of LLMs via Sparse Autoencoder Features

    Authors: Seonglae Cho, Zekun Wu, Adriano Koshiyama

    Abstract: Sparse autoencoders (SAEs) decompose language model activations into interpretable features, but existing methods reveal only which features activate, not which change model outputs when amplified. We introduce Control Reinforcement Learning (CRL), which trains a policy to select SAE features for steering at each token, producing interpretable intervention logs: the learned policy identifies featu… ▽ More

    Submitted 11 February, 2026; v1 submitted 10 February, 2026; originally announced February 2026.

  26. arXiv:2602.09642  [pdf, ps, other

    cs.CL cs.AI

    MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering

    Authors: Sieun Hyeon, Jusang Oh, Sunghwan Steve Cho, Jaeyoung Do

    Abstract: Recent advances in Large Language Models (LLMs) have significantly improved table understanding tasks such as Table Question Answering (TableQA), yet challenges remain in ensuring reliability, scalability, and efficiency, especially in resource-constrained or privacy-sensitive environments. In this paper, we introduce MATA, a multi-agent TableQA framework that leverages multiple complementary reas… ▽ More

    Submitted 10 February, 2026; originally announced February 2026.

  27. arXiv:2602.08159  [pdf, ps, other

    cs.LG cs.AI cs.CL

    The Confidence Manifold: Geometric Structure of Correctness Representations in Language Models

    Authors: Seonglae Cho, Zekun Wu, Kleyton Da Costa, Adriano Koshiyama

    Abstract: When a language model asserts that "the capital of Australia is Sydney," does it know this is wrong? We characterize the geometry of correctness representations across 9 models from 5 architecture families. The structure is simple: the discriminative signal occupies 3-8 dimensions, performance degrades with additional dimensions, and no nonlinear classifier improves over linear separation. Centroi… ▽ More

    Submitted 8 February, 2026; originally announced February 2026.

  28. arXiv:2602.08149  [pdf, ps, other

    cs.CL cs.AI

    DIAL-SUMMER: A Structured Evaluation Framework of Hierarchical Errors in Dialogue Summaries

    Authors: Sahana Ramnath, Nima Chitsazan, Mingyang Zhou, Chia-Hsuan Lee, Shi-Xiong Zhang, Stephen Rawls, Sambit Sahu, Sangwoo Cho, Xiang Ren, Genta Indra Winata, Akshaj Kumar Veldanda

    Abstract: Dialogues are a predominant mode of communication for humans, and it is immensely helpful to have automatically generated summaries of them (e.g., to revise key points discussed in a meeting, to review conversations between customer agents and product users). Prior works on dialogue summary evaluation largely ignore the complexities specific to this task: (i) shift in structure, from multiple spea… ▽ More

    Submitted 8 February, 2026; originally announced February 2026.

  29. arXiv:2602.05754  [pdf, ps, other

    cs.DC cs.AI

    TimelyFreeze: Adaptive Parameter Freezing Mechanism for Pipeline Parallelism

    Authors: Seonghye Cho, Jaemin Han, Hyunjin Kim, Euisoo Jung, Jae-Gil Lee

    Abstract: Pipeline parallelism enables training models that exceed single-device memory, but practical throughput remains limited by pipeline bubbles. Although parameter freezing can improve training throughput by adaptively skipping backward computation, existing methods often over-freeze parameters, resulting in unnecessary accuracy degradation. To address this issue, we propose TimelyFreeze, which models… ▽ More

    Submitted 6 February, 2026; v1 submitted 5 February, 2026; originally announced February 2026.

  30. arXiv:2602.04208  [pdf, ps, other

    cs.RO cs.AI cs.LG

    SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models

    Authors: Hyeonbeom Choi, Daechul Ahn, Youhan Lee, Taewook Kang, Seongwon Cho, Jonghyun Choi

    Abstract: Vision-Language-Action (VLA) models have emerged as a promising paradigm for general-purpose robotic control, with test-time scaling (TTS) gaining attention to enhance robustness beyond training. However, existing TTS methods for VLAs require additional training, verifiers, and multiple forward passes, making them impractical for deployment. Moreover, they intervene only at action decoding while k… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

    Comments: 20 pages, 8 figures

  31. arXiv:2602.01602  [pdf, ps, other

    cs.IT

    Spectral-Aligned Pruning for Universal Error-Correcting Code Transformers

    Authors: Sanghyeon Cho, Taewoo Park, Seong-Joon Park, Dae-Young Yun, Hee-Youl Kwak, Sang-Hyo Kim, Yongjune Kim

    Abstract: Recently, the Foundation Error Correction Code Transformer (FECCT) has emerged as a promising universal channel decoder, achieving competitive decoding performance across diverse code families by relying on a single shared model backbone, optionally followed by code-specific retraining. Despite this flexibility, the high computational complexity and large parameter footprint of transformer-based d… ▽ More

    Submitted 4 February, 2026; v1 submitted 1 February, 2026; originally announced February 2026.

  32. arXiv:2602.00007  [pdf, ps, other

    cs.CL cs.AI cs.IR

    PPoGA: Predictive Plan-on-Graph with Action for Knowledge Graph Question Answering

    Authors: MinGyu Jeon, SuWan Cho, JaeYoung Shu

    Abstract: Large Language Models (LLMs) augmented with Knowledge Graphs (KGs) have advanced complex question answering, yet they often remain susceptible to failure when their initial high-level reasoning plan is flawed. This limitation, analogous to cognitive functional fixedness, prevents agents from restructuring their approach, leading them to pursue unworkable solutions. To address this, we propose PPoG… ▽ More

    Submitted 22 November, 2025; originally announced February 2026.

  33. arXiv:2601.19096  [pdf, ps, other

    cs.CL

    PsyProbe: Proactive and Interpretable Dialogue through User State Modeling for Exploratory Counseling

    Authors: Sohhyung Park, Hyunji Kang, Sungzoon Cho, Dongil Kim

    Abstract: Recent advances in large language models have enabled mental health dialogue systems, yet existing approaches remain predominantly reactive, lacking systematic user state modeling for proactive therapeutic exploration. We introduce PsyProbe, a dialogue system designed for the exploration phase of counseling that systematically tracks user psychological states through the PPPPPI framework (Presenti… ▽ More

    Submitted 26 January, 2026; originally announced January 2026.

    Comments: In Findings of the Association for Computational Linguistics: EACL 2026

  34. arXiv:2601.16645  [pdf, ps, other

    cs.CV

    Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss

    Authors: Minsu Gong, Nuri Ryu, Jungseul Ok, Sunghyun Cho

    Abstract: Recent advances in image editing leverage latent diffusion models (LDMs) for versatile, text-prompt-driven edits across diverse tasks. Yet, maintaining pixel-level edge structures-crucial for tasks such as photorealistic style transfer or image tone adjustment-remains as a challenge for latent-diffusion-based editing. To overcome this limitation, we propose a novel Structure Preservation Loss (SPL… ▽ More

    Submitted 23 January, 2026; originally announced January 2026.

    Comments: Accepted to WACV 2026

  35. arXiv:2601.14560  [pdf, ps, other

    cs.CL

    Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education

    Authors: Unggi Lee, Jiyeong Bae, Jaehyeon Park, Haeun Park, Taejun Park, Younghoon Jeon, Sungmin Cho, Junbo Koh, Yeil Jeong, Gyeonggeon Lee

    Abstract: Large language models (LLMs) are increasingly deployed as intelligent tutoring systems, yet research on optimizing LLMs specifically for educational contexts remains limited. Recent works have proposed reinforcement learning approaches for training LLM tutors, but these methods focus solely on optimizing visible responses while neglecting the model's internal thinking process. We introduce Pedagog… ▽ More

    Submitted 20 January, 2026; originally announced January 2026.

  36. arXiv:2601.13882  [pdf, ps, other

    cs.CL

    OpenLearnLM Benchmark: A Unified Framework for Evaluating Knowledge, Skill, and Attitude in Educational Large Language Models

    Authors: Unggi Lee, Sookbun Lee, Heungsoo Choi, Jinseo Lee, Haeun Park, Younghoon Jeon, Sungmin Cho, Minju Kang, Junbo Koh, Jiyeong Bae, Minwoo Nam, Juyeon Eun, Yeonji Jung, Yeil Jeong

    Abstract: Large Language Models are increasingly deployed as educational tools, yet existing benchmarks focus on narrow skills and lack grounding in learning sciences. We introduce OpenLearnLM Benchmark, a theory-grounded framework evaluating LLMs across three dimensions derived from educational assessment theory: Knowledge (curriculum-aligned content and pedagogical understanding), Skills (scenario-based c… ▽ More

    Submitted 20 January, 2026; originally announced January 2026.

  37. arXiv:2601.09200  [pdf, ps, other

    cs.CL cs.AI

    A.X K1 Technical Report

    Authors: Sung Jun Cheon, Jaekyung Cho, Seongho Choi, Hyunjun Eun, Seokhwan Jo, Jaehyun Jun, Minsoo Kang, Jin Kim, Jiwon Kim, Minsang Kim, Seungsik Kim, Sungwan Kim, Tae Yoon Kim, Youngrang Kim, Hyeongmun Lee, Sangyeol Lee, Sungeun Lee, Youngsoon Lee, Yujin Lee, Seongmin Ok, Chanyong Park, Hyewoong Park, Junyoung Park, Hyunho Yang, Subin Yi , et al. (35 additional authors not shown)

    Abstract: We introduce A.X K1, a 519B-parameter Mixture-of-Experts (MoE) language model trained from scratch. Our design leverages scaling laws to optimize training configurations and vocabulary size under fixed computational budgets. A.X K1 is pre-trained on a corpus of approximately 10T tokens, curated by a multi-stage data processing pipeline. Designed to bridge the gap between reasoning capability and i… ▽ More

    Submitted 10 February, 2026; v1 submitted 14 January, 2026; originally announced January 2026.

  38. arXiv:2601.08682  [pdf, ps, other

    cs.CL cs.AI

    Lessons from the Field: An Adaptable Lifecycle Approach to Applied Dialogue Summarization

    Authors: Kushal Chawla, Chenyang Zhu, Pengshan Cai, Sangwoo Cho, Scott Novotney, Ayushman Singh, Jonah Lewis, Keasha Safewright, Alfy Samuel, Erin Babinsky, Shi-Xiong Zhang, Sambit Sahu

    Abstract: Summarization of multi-party dialogues is a critical capability in industry, enhancing knowledge transfer and operational effectiveness across many domains. However, automatically generating high-quality summaries is challenging, as the ideal summary must satisfy a set of complex, multi-faceted requirements. While summarization has received immense attention in research, prior work has primarily u… ▽ More

    Submitted 13 January, 2026; originally announced January 2026.

    Comments: EACL 2026 Industry Track

  39. arXiv:2601.00369  [pdf, ps, other

    cs.CV

    BHaRNet: Reliability-Aware Body-Hand Modality Expertized Networks for Fine-grained Skeleton Action Recognition

    Authors: Seungyeon Cho, Tae-kyun Kim

    Abstract: Skeleton-based human action recognition (HAR) has achieved remarkable progress with graph-based architectures. However, most existing methods remain body-centric, focusing on large-scale motions while neglecting subtle hand articulations that are crucial for fine-grained recognition. This work presents a probabilistic dual-stream framework that unifies reliability modeling and multi-modal integrat… ▽ More

    Submitted 1 January, 2026; originally announced January 2026.

    Comments: 16 pages; 8 figures. Extension of previous conference paper. Project page: https://github.com/VinnyCSY/BHaRNet

  40. arXiv:2512.20781  [pdf, ps, other

    cs.IR

    Soft Filtering: Guiding Zero-shot Composed Image Retrieval with Prescriptive and Proscriptive Constraints

    Authors: Youjin Jung, Seongwoo Cho, Hyun-seok Min, Sungchul Choi

    Abstract: Composed Image Retrieval (CIR) aims to find a target image that aligns with user intent, expressed through a reference image and a modification text. While Zero-shot CIR (ZS-CIR) methods sidestep the need for labeled training data by leveraging pretrained vision-language models, they often rely on a single fused query that merges all descriptive cues of what the user wants, tending to dilute key i… ▽ More

    Submitted 23 December, 2025; originally announced December 2025.

    Comments: Accepted to AAAI 2026 Workshop on New Frontiers in Information Retrieval

  41. arXiv:2512.08979  [pdf, ps, other

    cs.CV cs.AI

    What Happens When: Learning Temporal Orders of Events in Videos

    Authors: Daechul Ahn, Yura Choi, Hyeonbeom Choi, Seongwon Cho, San Kim, Jonghyun Choi

    Abstract: Video Large Multimodal Models (VLMMs) have shown impressive performance in video understanding, yet their ability to accurately capture the temporal order of multiple events remains underexplored. We interestingly observe that, even when video frames are scrambled, models perform very well on the existing benchmarks by comprehensive experiments. This implies that VLMMs may not necessarily rely on… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

    Comments: WACV 2026

  42. arXiv:2512.02006  [pdf, ps, other

    cs.CV

    MV-TAP: Tracking Any Point in Multi-View Videos

    Authors: Jahyeok Koo, Inès Hyeonsu Kim, Mungyeom Kim, Junghyun Park, Seohyun Park, Jaeyeong Kim, Jung Yi, Seokju Cho, Seungryong Kim

    Abstract: Multi-view camera systems enable rich observations of complex real-world scenes, and understanding dynamic objects in multi-view settings has become central to various applications. In this work, we present MV-TAP, a novel point tracker that tracks points across multi-view videos of dynamic scenes by leveraging cross-view information. MV-TAP utilizes camera geometry and a cross-view attention mech… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

    Comments: Project Page: https://cvlab-kaist.github.io/MV-TAP/

  43. arXiv:2511.22364  [pdf, ps, other

    cs.RO cs.AI

    BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands

    Authors: Seongwon Cho, Daechul Ahn, Donghyun Shin, Hyeonbeom Choi, San Kim, Jonghyun Choi

    Abstract: Open-vocabulary mobile manipulation (OVMM) requires robots to follow language instructions, navigate, and manipulate while updating their world representation under dynamic environmental changes. However, most prior approaches update their world representation only at discrete update points such as navigation targets, waypoints, or the end of an action step, leaving robots blind between updates an… ▽ More

    Submitted 14 April, 2026; v1 submitted 27 November, 2025; originally announced November 2025.

    Comments: 12 pages, 8 figures

  44. arXiv:2511.20686  [pdf, ps, other

    cs.AI cs.CY cs.LG

    AssurAI: Experience with Constructing Korean Socio-cultural Datasets to Discover Potential Risks of Generative AI

    Authors: Chae-Gyun Lim, Seung-Ho Han, EunYoung Byun, Jeongyun Han, Soohyun Cho, Eojin Joo, Heehyeon Kim, Sieun Kim, Juhoon Lee, Hyunsoo Lee, Dongkun Lee, Jonghwan Hyeon, Yechan Hwang, Young-Jun Lee, Kyeongryul Lee, Minhyeong An, Hyunjun Ahn, Jeongwoo Son, Junho Park, Donggyu Yoon, Taehyung Kim, Jeemin Kim, Dasom Choi, Kwangyoung Lee, Hyunseung Lim , et al. (29 additional authors not shown)

    Abstract: The rapid evolution of generative AI necessitates robust safety evaluations. However, current safety datasets are predominantly English-centric, failing to capture specific risks in non-English, socio-cultural contexts such as Korean, and are often limited to the text modality. To address this gap, we introduce AssurAI, a new quality-controlled Korean multimodal dataset for evaluating the safety o… ▽ More

    Submitted 20 November, 2025; originally announced November 2025.

    Comments: 16 pages, HuggingFace: https://huggingface.co/datasets/TTA01/AssurAI

  45. arXiv:2511.20285  [pdf, ps, other

    cs.AI

    Schema Matching on Graph: Iterative Graph Exploration for Efficient and Explainable Data Integration

    Authors: Mingyu Jeon, Jaeyoung Suh, Suwan Cho

    Abstract: Schema matching is a critical task in data integration, particularly in the medical domain where disparate Electronic Health Record (EHR) systems must be aligned to standard models like OMOP CDM. While Large Language Models (LLMs) have shown promise in schema matching, they suffer from hallucination and lack of up-to-date domain knowledge. Knowledge Graphs (KGs) offer a solution by providing struc… ▽ More

    Submitted 1 December, 2025; v1 submitted 25 November, 2025; originally announced November 2025.

  46. arXiv:2511.17990  [pdf, ps, other

    cs.AI cs.GT

    How Far Can LLMs Emulate Human Behavior?: A Strategic Analysis via the Buy-and-Sell Negotiation Game

    Authors: Mingyu Jeon, Jaeyoung Suh, Suwan Cho, Dohyeon Kim

    Abstract: With the rapid advancement of Large Language Models (LLMs), recent studies have drawn attention to their potential for handling not only simple question-answer tasks but also more complex conversational abilities and performing human-like behavioral imitations. In particular, there is considerable interest in how accurately LLMs can reproduce real human emotions and behaviors, as well as whether s… ▽ More

    Submitted 22 November, 2025; originally announced November 2025.

  47. arXiv:2511.13010  [pdf, ps, other

    cs.LG cs.AI

    Are Graph Transformers Necessary? Efficient Long-Range Message Passing with Fractal Nodes in MPNNs

    Authors: Jeongwhan Choi, Seungjun Park, Sumin Park, Sung-Bae Cho, Noseong Park

    Abstract: Graph Neural Networks (GNNs) have emerged as powerful tools for learning on graph-structured data, but often struggle to balance local and global information. While graph Transformers aim to address this by enabling long-range interactions, they often overlook the inherent locality and efficiency of Message Passing Neural Networks (MPNNs). We propose a new concept called fractal nodes, inspired by… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: Accepted in AAAI 2026 for Oral Representation. This is the extended version including the appendix

  48. arXiv:2511.10850  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Leveraging Parameter Space Symmetries for Reasoning Skill Transfer in LLMs

    Authors: Stefan Horoi, Sangwoo Cho, Supriyo Chakraborty, Shi-Xiong Zhang, Sambit Sahu, Guy Wolf, Genta Indra Winata

    Abstract: Task arithmetic is a powerful technique for transferring skills between Large Language Models (LLMs), but it often suffers from negative interference when models have diverged during training. We address this limitation by first aligning the models' parameter spaces, leveraging the inherent permutation, rotation, and scaling symmetries of Transformer architectures. We adapt parameter space alignme… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

  49. arXiv:2511.09871  [pdf, ps, other

    cs.LG cs.AI

    Expandable and Differentiable Dual Memories with Orthogonal Regularization for Exemplar-free Continual Learning

    Authors: Hyung-Jun Moon, Sung-Bae Cho

    Abstract: Continual learning methods used to force neural networks to process sequential tasks in isolation, preventing them from leveraging useful inter-task relationships and causing them to repeatedly relearn similar features or overly differentiate them. To address this problem, we propose a fully differentiable, exemplar-free expandable method composed of two complementary memories: One learns common f… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

    Comments: To appear in AAAI 2026 (The 40th AAAI Conference on Artificial Intelligence)

  50. arXiv:2511.08181  [pdf, ps, other

    cs.IR cs.AI

    MARC: Multimodal and Multi-Task Agentic Retrieval-Augmented Generation for Cold-Start Recommender System

    Authors: Seung Hwan Cho, Yujin Yang, Danik Baeck, Minjoo Kim, Young-Min Kim, Heejung Lee, Sangjin Park

    Abstract: Recommender systems (RS) are currently being studied to mitigate limitations during cold-start conditions by leveraging modality information or introducing Agent concepts based on the exceptional reasoning capabilities of Large Language Models (LLMs). Meanwhile, food and beverage recommender systems have traditionally used knowledge graph and ontology concepts due to the domain's unique data attri… ▽ More

    Submitted 15 November, 2025; v1 submitted 11 November, 2025; originally announced November 2025.

    Comments: 13 pages, 2 figures, Accepted at RDGENAI at CIKM 2025 workshop