Skip to main content

Showing 1–50 of 314 results for author: Lee, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.08510  [pdf, ps, other

    cs.CL

    What do Language Models Learn and When? The Implicit Curriculum Hypothesis

    Authors: Emmy Liu, Kaiser Sun, Millicent Li, Isabelle Lee, Lindia Tjuatja, Jen-tse Huang, Graham Neubig

    Abstract: Large language models (LLMs) can perform remarkably complex tasks, yet the fine-grained details of how these capabilities emerge during pretraining remain poorly understood. Scaling laws on validation loss tell us how much a model improves with additional compute, but not what skills it acquires in which order. To remedy this, we propose the Implicit Curriculum Hypothesis: pretraining follows a co… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  2. arXiv:2604.03616  [pdf, ps, other

    cs.CL

    The Format Tax

    Authors: Ivan Yee Lee, Loris D'Antoni, Taylor Berg-Kirkpatrick

    Abstract: Asking a large language model to respond in JSON should be a formatting choice, not a capability tax. Yet we find that structured output requirements -- JSON, XML, LaTeX, Markdown -- substantially degrade reasoning and writing performance across open-weight models. The research response has focused on constrained decoding, but sampling bias accounts for only a fraction of the degradation. The domi… ▽ More

    Submitted 4 April, 2026; originally announced April 2026.

  3. arXiv:2603.29616  [pdf, ps, other

    cs.CV

    Video-Oasis: Rethinking Evaluation of Video Understanding

    Authors: Geuntaek Lim, Minho Shim, Sungjune Park, Jaeyun Lee, Inwoong Lee, Taeoh Kim, Dongyoon Wee, Yukyung Choi

    Abstract: The inherent complexity of video understanding makes it difficult to attribute whether performance gains stem from visual perception, linguistic reasoning, or knowledge priors. While many benchmarks have emerged to assess high-level reasoning, the essential criteria that constitute video understanding remain largely overlooked. Instead of introducing yet another benchmark, we take a step back to r… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

  4. Class-Distribution Guided Active Learning for 3D Occupancy Prediction in Autonomous Driving

    Authors: Wonjune Kim, In-Jae Lee, Sihwan Hwang, Sanmin Kim, Dongsuk Kum

    Abstract: 3D occupancy prediction provides dense spatial understanding critical for safe autonomous driving. However, this task suffers from a severe class imbalance due to its volumetric representation, where safety-critical objects (bicycles, traffic cones, pedestrians) occupy minimal voxels compared to dominant backgrounds. Additionally, voxel-level annotation is costly, yet dedicating effort to dominant… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

    Comments: IEEE RA-L 2026

    Journal ref: IEEE Robotics and Automation Letters (2026)

  5. arXiv:2603.25796  [pdf, ps, other

    stat.ML cs.AI cs.LG math.ST

    Beyond identifiability: Learning causal representations with few environments and finite samples

    Authors: Inbeom Lee, Tongtong Jin, Bryon Aragam

    Abstract: We provide explicit, finite-sample guarantees for learning causal representations from data with a sublinear number of environments. Causal representation learning seeks to provide a rigourous foundation for the general representation learning problem by bridging causal models with latent factor models in order to learn interpretable representations with causal semantics. Despite a blossoming theo… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

  6. arXiv:2603.23934  [pdf, ps, other

    cs.CV cs.AI

    Revealing Multi-View Hallucination in Large Vision-Language Models

    Authors: Wooje Park, Insu Lee, Soohyun Kim, Jaeyun Jang, Minyoung Noh, Kyuhong Shim, Byonghyo Shim

    Abstract: Large vision-language models (LVLMs) are increasingly being applied to multi-view image inputs captured from diverse viewpoints. However, despite this growing use, current LVLMs often confuse or mismatch visual information originating from different instances or viewpoints, a phenomenon we term multi-view hallucination. To systematically analyze this problem, we construct MVH-Bench, a benchmark co… ▽ More

    Submitted 25 March, 2026; originally announced March 2026.

  7. arXiv:2603.19615  [pdf, ps, other

    cs.SD cs.AI cs.CL

    CAF-Score: Calibrating CLAP with LALMs for Reference-free Audio Captioning Evaluation

    Authors: Insung Lee, Taeyoung Jeong, Haejun Yoo, Du-Seong Chang, Myoung-Wan Koo

    Abstract: While Large Audio-Language Models (LALMs) have advanced audio captioning, robust evaluation remains difficult. Reference-based metrics are expensive and often fail to assess acoustic fidelity, while Contrastive Language-Audio Pretraining (CLAP)-based approaches frequently overlook syntactic errors and fine-grained details. We propose CAF-Score, a reference-free metric that calibrates CLAP's coarse… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: A condensed version of this work has been submitted to Interspeech 2026. Section 10 is an extended analysis added in this version

  8. arXiv:2603.14271  [pdf, ps, other

    cs.CV

    Toward Clinically Ready Foundation Models in Medical Image Analysis: Adaptation Mechanisms and Deployment Trade-offs

    Authors: Karma Phuntsho, Abdullah, Kyungmi Lee, Ickjai Lee, Euijoon Ahn

    Abstract: Foundation models (FMs) have demonstrated strong transferability across medical imaging tasks, yet their clinical utility depends critically on how pretrained representations are adapted to domain-specific data, supervision regimes, and deployment constraints. Prior surveys primarily emphasize architectural advances and application coverage, while the mechanisms of adaptation and their implication… ▽ More

    Submitted 15 March, 2026; originally announced March 2026.

  9. arXiv:2603.10976  [pdf, ps, other

    cs.ET cs.CY

    Report for NSF Workshop on Algorithm-Hardware Co-design for Medical Applications

    Authors: Peipei Zhou, Zheng Dong, Insup Lee, Aidong Zhang, Robert Dick, Majid Sarrafzadeh, Xiaodong Wu, Weisong Shi, Zhuoping Yang, Jingtong Hu, Yiyu Shi

    Abstract: This report summarizes the discussions and recommendations from the NSF Workshop on Algorithm-Hardware Co-design for Medical Applications, held on September 26-27, 2024, in Pittsburgh, PA. The workshop assembled an interdisciplinary cohort of researchers, clinicians, and industry leaders to examine foundational challenges and develop a strategic roadmap for algorithm-hardware co-design in medical… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

  10. arXiv:2603.10961  [pdf, ps, other

    cs.LG

    Bio-Inspired Self-Supervised Learning for Wrist-worn IMU Signals

    Authors: Prithviraj Tarale, Kiet Chu, Abhishek Varghese, Kai-Chun Liu, Maxwell A Xu, Mohit Iyyer, Sunghoon I. Lee

    Abstract: Wearable accelerometers have enabled large-scale health and wellness monitoring, yet learning robust human-activity representations has been constrained by the scarcity of labeled data. While self-supervised learning offers a potential remedy, existing approaches treat sensor streams as unstructured time series, overlooking the underlying biological structure of human movement, a factor we argue i… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

  11. arXiv:2603.07060  [pdf, ps, other

    cs.RO cs.HC eess.SY

    GuideTWSI: A Diverse Tactile Walking Surface Indicator Dataset from Synthetic and Real-World Images for Blind and Low-Vision Navigation

    Authors: Hochul Hwang, Soowan Yang, Anh N. H. Nguyen, Parth Goel, Krisha Adhikari, Sunghoon I. Lee, Joydeep Biswas, Nicholas A. Giudice, Donghyun Kim

    Abstract: Tactile Walking Surface Indicators (TWSIs) are safety-critical landmarks that blind and low-vision (BLV) pedestrians use to locate crossings and hazard zones. From our observation sessions with BLV guide dog handlers, trainers, and an O&M specialist, we confirmed the critical importance of reliable and accurate TWSI segmentation for navigation assistance of BLV individuals. Achieving such reliabil… ▽ More

    Submitted 7 March, 2026; originally announced March 2026.

  12. arXiv:2602.24235  [pdf, ps, other

    cs.RO cs.AI

    SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems

    Authors: Jialiang Fan, Weizhe Xu, Mengyu Liu, Oleg Sokolsky, Insup Lee, Fanxin Kong

    Abstract: Safety-critical task planning in robotic systems remains challenging: classical planners suffer from poor scalability, Reinforcement Learning (RL)-based methods generalize poorly, and base Large Language Models (LLMs) cannot guarantee safety. To address this gap, we propose safety-generalizable large language models, named SafeGen-LLM. SafeGen-LLM can not only enhance the safety satisfaction of ta… ▽ More

    Submitted 27 February, 2026; originally announced February 2026.

    Comments: 12 pages, 6 figures

  13. arXiv:2602.17955  [pdf, ps, other

    cs.SE

    Mining Type Constructs Using Patterns in AI-Generated Code

    Authors: Imgyeong Lee, Tayyib Ul Hassan, Abram Hindle

    Abstract: Artificial Intelligence (AI) increasingly automates various parts of the software development tasks. Although AI has enhanced the productivity of development tasks, it remains unstudied whether AI essentially outperforms humans in type-related programming tasks, such as employing type constructs properly for type safety, during its tasks. Moreover, there is no systematic study that evaluates wheth… ▽ More

    Submitted 19 February, 2026; originally announced February 2026.

  14. arXiv:2602.15339  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Benchmarking Self-Supervised Models for Cardiac Ultrasound View Classification

    Authors: Youssef Megahed, Salma I. Megahed, Robin Ducharme, Inok Lee, Adrian D. C. Chan, Mark C. Walker, Steven Hawken

    Abstract: Reliable interpretation of cardiac ultrasound images is essential for accurate clinical diagnosis and assessment. Self-supervised learning has shown promise in medical imaging by leveraging large unlabelled datasets to learn meaningful representations. In this study, we evaluate and compare two self-supervised learning frameworks, USF-MAE, developed by our team, and MoCo v3, on the recently introd… ▽ More

    Submitted 16 February, 2026; originally announced February 2026.

    Comments: 10 pages, 3 figures, 3 tables

  15. arXiv:2602.08336  [pdf, ps, other

    cs.CL cs.CV

    From Reasoning to Pixels: Benchmarking the Alignment Gap in Unified Multimodal Models

    Authors: Cheng Yang, Chufan Shi, Bo Shui, Yaokang Wu, Muzi Tao, Huijuan Wang, Ivan Yee Lee, Yong Liu, Xuezhe Ma, Taylor Berg-Kirkpatrick

    Abstract: Unified multimodal models (UMMs) aim to integrate multimodal understanding and generation within a unified architecture, yet it remains unclear to what extent their representations are truly aligned across modalities. To investigate this question, we use reasoning-guided image generation as a diagnostic task, where models produce textual reasoning first and then generate images. We introduce UReas… ▽ More

    Submitted 7 April, 2026; v1 submitted 9 February, 2026; originally announced February 2026.

    Comments: Project page: https://ureason.github.io

  16. arXiv:2602.04536  [pdf, ps, other

    cs.LG

    Forget to Generalize: Iterative Adaptation for Generalization in Federated Learning

    Authors: Abdulrahman Alotaibi, Irene Tenison, Miriam Kim, Isaac Lee, Lalana Kagal

    Abstract: The Web is naturally heterogeneous with user devices, geographic regions, browsing patterns, and contexts all leading to highly diverse, unique datasets. Federated Learning (FL) is an important paradigm for the Web because it enables privacy-preserving, collaborative machine learning across diverse user devices, web services and clients without needing to centralize sensitive data. However, its pe… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

  17. arXiv:2601.16211  [pdf, ps, other

    cs.CV cs.AI

    Why Can't I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

    Authors: Geo Ahn, Inwoong Lee, Taeoh Kim, Minho Shim, Dongyoon Wee, Jinwoo Choi

    Abstract: Zero-Shot Compositional Action Recognition (ZS-CAR) requires recognizing novel verb-object combinations composed of previously observed primitives. In this work, we tackle a key failure mode: models predict verbs via object-driven shortcuts (i.e., relying on the labeled object class) rather than temporal evidence. We argue that sparse compositional supervision and verb-object learning asymmetry ca… ▽ More

    Submitted 7 April, 2026; v1 submitted 22 January, 2026; originally announced January 2026.

    Comments: The code is available at https://github.com/KHU-VLL/RCORE

  18. Deep Learning Based Facial Retargeting Using Local Patches

    Authors: Yeonsoo Choi, Inyup Lee, Sihun Cha, Seonghyeon Kim, Sunjin Jung, Junyong Noh

    Abstract: In the era of digital animation, the quest to produce lifelike facial animations for virtual characters has led to the development of various retargeting methods. While the retargeting facial motion between models of similar shapes has been very successful, challenges arise when the retargeting is performed on stylized or exaggerated 3D characters that deviate significantly from human facial struc… ▽ More

    Submitted 13 January, 2026; originally announced January 2026.

    Comments: Eurographics 25

    Journal ref: Computer Graphics Forum 2024

  19. arXiv:2512.23365  [pdf, ps, other

    cs.CV

    SpatialMosaic: A Multiview VLM Dataset for Partial Visibility

    Authors: Kanghee Lee, Injae Lee, Minseok Kwak, Jungi Hong, Kwonyoung Ryu, Jaesik Park

    Abstract: The rapid progress of Multimodal Large Language Models (MLLMs) has unlocked the potential for enhanced 3D scene understanding and spatial reasoning. A recent line of work explores learning spatial reasoning directly from multi-view images, enabling MLLMs to understand 3D scenes without explicit 3D reconstructions. Nevertheless, key challenges that frequently arise in real-world environments, such… ▽ More

    Submitted 9 April, 2026; v1 submitted 29 December, 2025; originally announced December 2025.

  20. arXiv:2512.23227  [pdf, ps, other

    cs.CV cs.AI

    Anomaly Detection by Effectively Leveraging Synthetic Images

    Authors: Sungho Kang, Hyunkyu Park, Yeonho Lee, Hanbyul Lee, Mijoo Jeong, YeongHyeon Park, Injae Lee, Juneho Yi

    Abstract: Anomaly detection plays a vital role in industrial manufacturing. Due to the scarcity of real defect images, unsupervised approaches that rely solely on normal images have been extensively studied. Recently, diffusion-based generative models brought attention to training data synthesis as an alternative solution. In this work, we focus on a strategy to effectively leverage synthetic images to maxi… ▽ More

    Submitted 29 December, 2025; originally announced December 2025.

  21. arXiv:2512.22730  [pdf, ps, other

    cs.CV cs.AI eess.IV

    Improved cystic hygroma detection from prenatal imaging using ultrasound-specific self-supervised representation learning

    Authors: Youssef Megahed, Robin Ducharme, Inok Lee, Inbal Willner, Adrian D. C. Chan, Mark Walker, Steven Hawken

    Abstract: Cystic hygroma is a high-risk prenatal ultrasound finding that portends high rates of chromosomal abnormalities, structural malformations, and adverse pregnancy outcomes. Automated detection can increase reproducibility and support scalable early screening programs, but supervised deep learning methods are limited by small labelled datasets. This study assesses whether ultrasound-specific self-sup… ▽ More

    Submitted 18 January, 2026; v1 submitted 27 December, 2025; originally announced December 2025.

    Comments: 13 pages, 6 figures, 2 tables

  22. arXiv:2512.20117  [pdf, ps, other

    cs.CV cs.SD eess.AS

    DDAVS: Disentangled Audio Semantics and Delayed Bidirectional Alignment for Audio-Visual Segmentation

    Authors: Jingqi Tian, Yiheng Du, Haoji Zhang, Yuji Wang, Isaac Ning Lee, Xulong Bai, Tianrui Zhu, Jingxuan Niu, Yansong Tang

    Abstract: Audio-Visual Segmentation (AVS) aims to localize sound-producing objects at the pixel level by jointly leveraging auditory and visual information. However, existing methods often suffer from multi-source entanglement and audio-visual misalignment, which lead to biases toward louder or larger objects while overlooking weaker, smaller, or co-occurring sources. To address these challenges, we propose… ▽ More

    Submitted 23 December, 2025; originally announced December 2025.

    Comments: https://trilarflagz.github.io/DDAVS-page/

  23. arXiv:2512.13945  [pdf, ps, other

    cs.LG

    Pattern-Guided Diffusion Models

    Authors: Vivian Lin, Kuk Jin Jang, Wenwen Si, Insup Lee

    Abstract: Diffusion models have shown promise in forecasting future data from multivariate time series. However, few existing methods account for recurring structures, or patterns, that appear within the data. We present Pattern-Guided Diffusion Models (PGDM), which leverage inherent patterns within temporal data for forecasting future time steps. PGDM first extracts patterns using archetypal analysis and e… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

    Comments: Under review

  24. arXiv:2512.13434  [pdf, ps, other

    eess.IV cs.CV

    Self-Supervised Ultrasound Representation Learning for Renal Anomaly Prediction in Prenatal Imaging

    Authors: Youssef Megahed, Inok Lee, Robin Ducharme, Kevin Dick, Adrian D. C. Chan, Steven Hawken, Mark C. Walker

    Abstract: Prenatal ultrasound is the cornerstone for detecting congenital anomalies of the kidneys and urinary tract, but diagnosis is limited by operator dependence and suboptimal imaging conditions. We sought to assess the performance of a self-supervised ultrasound foundation model for automated fetal renal anomaly classification using a curated dataset of 969 two-dimensional ultrasound images. A pretrai… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

    Comments: 14 pages, 8 figures, 4 tables

  25. arXiv:2512.06147  [pdf, ps, other

    cs.RO cs.CV cs.HC

    GuideNav: User-Informed Development of a Vision-Only Robotic Navigation Assistant For Blind Travelers

    Authors: Hochul Hwang, Soowan Yang, Jahir Sadik Monon, Nicholas A Giudice, Sunghoon Ivan Lee, Joydeep Biswas, Donghyun Kim

    Abstract: While commendable progress has been made in user-centric research on mobile assistive systems for blind and low-vision (BLV) individuals, references that directly inform robot navigation design remain rare. To bridge this gap, we conducted a comprehensive human study involving interviews with 26 guide dog handlers, four white cane users, nine guide dog trainers, and one O\&M trainer, along with 15… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

  26. arXiv:2512.04750  [pdf, ps, other

    cs.IT

    Robust Precoding Designs of RSMA for Multiuser MIMO Systems

    Authors: Wentao Zhou, Yijie Mao, Di Zhang, Mérouane Debbah, Inkyu Lee

    Abstract: Rate-splitting multiple access (RSMA) has been studied for multiuser multiple-input multiple-output (MUMIMO) systems especially in the presence of imperfect channel state information (CSI) at the transmitter. However, its precoding designs that maximize the sum rate normally have high computational complexity. To implement an efficient RSMA scheme for the MU-MIMO system, in this work, we propose a… ▽ More

    Submitted 4 December, 2025; originally announced December 2025.

    Comments: This work has been accepted for publication in IEEE Transactions on Wireless Communications

  27. arXiv:2512.03643  [pdf, ps, other

    cs.CV cs.CL cs.LG

    Optical Context Compression Is Just (Bad) Autoencoding

    Authors: Ivan Yee Lee, Cheng Yang, Taylor Berg-Kirkpatrick

    Abstract: DeepSeek-OCR shows that rendered text can be reconstructed from a small number of vision tokens, sparking excitement about using vision as a compression medium for long textual contexts. But this pipeline requires rendering token embeddings to pixels and compressing from there -- discarding learned representations in favor of an image the vision encoder must then recover from. We ask whether this… ▽ More

    Submitted 4 April, 2026; v1 submitted 3 December, 2025; originally announced December 2025.

  28. arXiv:2512.01352  [pdf, ps, other

    cs.CV

    OpenBox: Annotate Any Bounding Boxes in 3D

    Authors: In-Jae Lee, Mungyeom Kim, Kwonyoung Ryu, Pierre Musacchio, Jaesik Park

    Abstract: Unsupervised and open-vocabulary 3D object detection has recently gained attention, particularly in autonomous driving, where reducing annotation costs and recognizing unseen objects are critical for both safety and scalability. However, most existing approaches uniformly annotate 3D bounding boxes, ignore objects' physical states, and require multiple self-training iterations for annotation refin… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

    Comments: Accepted by NeurIPS 2025

  29. arXiv:2512.00677  [pdf, ps, other

    cs.CV cs.AI

    Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer

    Authors: Dong In Lee, Hyungjun Doh, Seunggeun Chi, Runlin Duan, Sangpil Kim, Karthik Ramani

    Abstract: Recent progress in 4D representations, such as Dynamic NeRF and 4D Gaussian Splatting (4DGS), has enabled dynamic 4D scene reconstruction. However, text-driven 4D scene editing remains under-explored due to the challenge of ensuring both multi-view and temporal consistency across space and time during editing. Existing studies rely on 2D diffusion models that edit frames independently, often causi… ▽ More

    Submitted 29 November, 2025; originally announced December 2025.

    Comments: 4D Scene Editing

  30. arXiv:2512.00138  [pdf, ps, other

    cs.AR cs.CV eess.IV

    Ternary-Input Binary-Weight CNN Accelerator Design for Miniature Object Classification System with Query-Driven Spatial DVS

    Authors: Yuyang Li, Swasthik Muloor, Jack Laudati, Nickolas Dematteis, Yidam Park, Hana Kim, Nathan Chang, Inhee Lee

    Abstract: Miniature imaging systems are essential for space-constrained applications but are limited by memory and power constraints. While machine learning can reduce data size by extracting key features, its high energy demands often exceed the capacity of small batteries. This paper presents a CNN hardware accelerator optimized for object classification in miniature imaging systems. It processes data fro… ▽ More

    Submitted 28 November, 2025; originally announced December 2025.

    Comments: 6 pages.12 figures & 2 table

  31. arXiv:2511.22307  [pdf

    cs.AI cs.LG

    Enhanced Conditional Generation of Double Perovskite by Knowledge-Guided Language Model Feedback

    Authors: Inhyo Lee, Junhyeong Lee, Jongwon Park, KyungTae Lim, Seunghwa Ryu

    Abstract: Double perovskites (DPs) are promising candidates for sustainable energy technologies due to their compositional tunability and compatibility with low-energy fabrication, yet their vast design space poses a major challenge for conditional materials discovery. This work introduces a multi-agent, text gradient-driven framework that performs DP composition generation under natural-language conditions… ▽ More

    Submitted 2 December, 2025; v1 submitted 27 November, 2025; originally announced November 2025.

  32. arXiv:2511.22228  [pdf, ps, other

    cs.CV cs.AI cs.LG

    3D-Consistent Multi-View Editing by Correspondence Guidance

    Authors: Josef Bengtson, David Nilsson, Dong In Lee, Yaroslava Lochman, Fredrik Kahl

    Abstract: Recent advancements in diffusion and flow models have greatly improved text-based image editing, yet methods that edit images independently often produce geometrically and photometrically inconsistent results across different views of the same scene. Such inconsistencies are particularly problematic for editing of 3D representations such as NeRFs or Gaussian splat models. We propose a training-fre… ▽ More

    Submitted 20 March, 2026; v1 submitted 27 November, 2025; originally announced November 2025.

    Comments: Added experiments with FLUX.1 editing method

  33. arXiv:2511.20446  [pdf, ps, other

    cs.CV

    Learning to Generate Human-Human-Object Interactions from Textual Descriptions

    Authors: Jeonghyeon Na, Sangwon Baik, Inhee Lee, Junyoung Lee, Hanbyul Joo

    Abstract: The way humans interact with each other, including interpersonal distances, spatial configuration, and motion, varies significantly across different situations. To enable machines to understand such complex, context-dependent behaviors, it is essential to model multiple people in relation to the surrounding scene context. In this paper, we present a novel research problem to model the correlations… ▽ More

    Submitted 24 December, 2025; v1 submitted 25 November, 2025; originally announced November 2025.

    Comments: Project Page: https://tlb-miss.github.io/hhoi/

  34. arXiv:2511.14400  [pdf, ps, other

    cs.ET cs.PF

    PIM or CXL-PIM? Understanding Architectural Trade-offs Through Large-Scale Benchmarking

    Authors: I-Ting Lee, Bao-Kai Wang, Liang-Chi Chen, Wen Sheng Lim, Da-Wei Chang, Yu-Ming Chang, Chieng-Chung Ho

    Abstract: Processing-in-memory (PIM) reduces data movement by executing near memory, but our large-scale characterization on real PIM hardware shows that end-to-end performance is often limited by disjoint host and device address spaces that force explicit staging transfers. In contrast, CXL-PIM provides a unified address space and cache-coherent access at the cost of higher access latency. These opposing i… ▽ More

    Submitted 18 November, 2025; v1 submitted 18 November, 2025; originally announced November 2025.

  35. arXiv:2511.11828  [pdf, ps, other

    cs.LG cs.AI

    Conformal Constrained Policy Optimization for Cost-Effective LLM Agents

    Authors: Wenwen Si, Sooyong Jang, Insup Lee, Osbert Bastani

    Abstract: While large language models (LLMs) have recently made tremendous progress towards solving challenging AI problems, they have done so at increasingly steep computational and API costs. We propose a novel strategy where we combine multiple LLM models with varying cost/accuracy tradeoffs in an agentic manner, where models and tools are run in sequence as determined by an orchestration model to minimi… ▽ More

    Submitted 22 March, 2026; v1 submitted 14 November, 2025; originally announced November 2025.

  36. arXiv:2511.11472  [pdf, ps, other

    cs.LG

    Quantifying and Improving Adaptivity in Conformal Prediction through Input Transformations

    Authors: Sooyong Jang, Insup Lee

    Abstract: Conformal prediction constructs a set of labels instead of a single point prediction, while providing a probabilistic coverage guarantee. Beyond the coverage guarantee, adaptiveness to example difficulty is an important property. It means that the method should produce larger prediction sets for more difficult examples, and smaller ones for easier examples. Existing evaluation methods for adaptive… ▽ More

    Submitted 16 November, 2025; v1 submitted 14 November, 2025; originally announced November 2025.

  37. arXiv:2511.07827  [pdf, ps, other

    eess.IV cs.CV

    Deep Learning Analysis of Prenatal Ultrasound for Identification of Ventriculomegaly

    Authors: Youssef Megahed, Inok Lee, Robin Ducharme, Aylin Erman, Olivier X. Miguel, Kevin Dick, Adrian D. C. Chan, Steven Hawken, Mark Walker, Felipe Moretti

    Abstract: The proposed study aimed to develop a deep learning model capable of detecting ventriculomegaly on prenatal ultrasound images. Ventriculomegaly is a prenatal condition characterized by dilated cerebral ventricles of the fetal brain and is important to diagnose early, as it can be associated with an increased risk for fetal aneuploidies and/or underlying genetic syndromes. An Ultrasound Self-Superv… ▽ More

    Submitted 20 November, 2025; v1 submitted 10 November, 2025; originally announced November 2025.

    Comments: 13 pages, 7 figures, 3 tables

  38. arXiv:2511.06458  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS

    EchoMark: Perceptual Acoustic Environment Transfer with Watermark-Embedded Room Impulse Response

    Authors: Chenpei Huang, Lingfeng Yao, Kyu In Lee, Lan Emily Zhang, Xun Chen, Miao Pan

    Abstract: Acoustic Environment Matching (AEM) is the task of transferring clean audio into a target acoustic environment, enabling engaging applications such as audio dubbing and auditory immersive virtual reality (VR). Recovering similar room impulse response (RIR) directly from reverberant speech offers more accessible and flexible AEM solution. However, this capability also introduces vulnerabilities of… ▽ More

    Submitted 31 March, 2026; v1 submitted 9 November, 2025; originally announced November 2025.

  39. arXiv:2511.06195  [pdf, ps, other

    cs.HC cs.AI

    AI as intermediary in modern-day ritual: An immersive, interactive production of the roller disco musical Xanadu at UCLA

    Authors: Mira Winick, Naisha Agarwal, Chiheb Boussema, Ingrid Lee, Camilo Vargas, Jeff Burke

    Abstract: Interfaces for contemporary large language, generative media, and perception AI models are often engineered for single user interaction. We investigate ritual as a design scaffold for developing collaborative, multi-user human-AI engagement. We consider the specific case of an immersive staging of the musical Xanadu performed at UCLA in Spring 2025. During a two-week run, over five hundred audienc… ▽ More

    Submitted 8 November, 2025; originally announced November 2025.

  40. arXiv:2511.01284  [pdf, ps, other

    cs.CV cs.AI

    Adaptation of Foundation Models for Medical Image Analysis: Strategies, Challenges, and Future Directions

    Authors: Karma Phuntsho, Abdullah, Kyungmi Lee, Ickjai Lee, Euijoon Ahn

    Abstract: Foundation models (FMs) have emerged as a transformative paradigm in medical image analysis, offering the potential to provide generalizable, task-agnostic solutions across a wide range of clinical tasks and imaging modalities. Their capacity to learn transferable representations from large-scale data has the potential to address the limitations of conventional task-specific models. However, adapt… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

  41. arXiv:2510.26095  [pdf, ps, other

    cs.IR cs.CL

    ORBIT -- Open Recommendation Benchmark for Reproducible Research with Hidden Tests

    Authors: Jingyuan He, Jiongnan Liu, Vishan Vishesh Oberoi, Bolin Wu, Mahima Jagadeesh Patel, Kangrui Mao, Chuning Shi, I-Ta Lee, Arnold Overwijk, Chenyan Xiong

    Abstract: Recommender systems are among the most impactful AI applications, interacting with billions of users every day, guiding them to relevant products, services, or information tailored to their preferences. However, the research and development of recommender systems are hindered by existing datasets that fail to capture realistic user behaviors and inconsistent evaluation settings that lead to ambigu… ▽ More

    Submitted 29 October, 2025; originally announced October 2025.

    Comments: Accepted to NeurIPS 2025 Datasets & Benchmarks track

  42. arXiv:2510.18880  [pdf, ps, other

    cs.HC cs.CL cs.CY

    Towards Better Health Conversations: The Benefits of Context-seeking

    Authors: Rory Sayres, Yuexing Hao, Abbi Ward, Amy Wang, Beverly Freeman, Serena Zhan, Diego Ardila, Jimmy Li, I-Ching Lee, Anna Iurchenko, Siyi Kou, Kartikeya Badola, Jimmy Hu, Bhawesh Kumar, Keith Johnson, Supriya Vijay, Justin Krogue, Avinatan Hassidim, Yossi Matias, Dale R. Webster, Sunny Virmani, Yun Liu, Quang Duong, Mike Schaekermann

    Abstract: Navigating health questions can be daunting in the modern information landscape. Large language models (LLMs) may provide tailored, accessible information, but also risk being inaccurate, biased or misleading. We present insights from 4 mixed-methods studies (total N=163), examining how people interact with LLMs for their own health questions. Qualitative studies revealed the importance of context… ▽ More

    Submitted 13 September, 2025; originally announced October 2025.

  43. arXiv:2510.13915  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Readability $\ne$ Learnability: Rethinking the Role of Simplicity in Training Small Language Models

    Authors: Ivan Lee, Taylor Berg-Kirkpatrick

    Abstract: Recent studies suggest that very small language models (SLMs) can generate surprisingly coherent text when trained on simplified, child-directed corpora such as TinyStories. These findings have been interpreted as evidence that readability -- characterized by accessible vocabulary, familiar narrative structure, and simple syntax -- plays a key role in enabling such capabilities to emerge. In this… ▽ More

    Submitted 15 October, 2025; originally announced October 2025.

    Comments: Accepted to COLM 2025 (Spotlight)

  44. arXiv:2510.03515  [pdf, ps, other

    cs.LG

    RAPID: An Efficient Reinforcement Learning Algorithm for Small Language Models

    Authors: Lianghuan Huang, Sagnik Anupam, Insup Lee, Shuo Li, Osbert Bastani

    Abstract: Reinforcement learning (RL) has emerged as a promising strategy for finetuning small language models (SLMs) to solve targeted tasks such as math and coding. However, RL algorithms tend to be resource-intensive, taking a significant amount of time to train. We propose RAPID, a novel RL algorithm that can substantially reduce the running time of RL. Our key insight is that RL tends to be costly due… ▽ More

    Submitted 3 October, 2025; originally announced October 2025.

  45. arXiv:2510.02563  [pdf, ps, other

    cs.CR cs.HC

    Who's Wearing? Ear Canal Biometric Key Extraction for User Authentication on Wireless Earbuds

    Authors: Chenpei Huang, Lingfeng Yao, Hui Zhong, Kyu In Lee, Lan Zhang, Xiaoyong Yuan, Tomoaki Ohtsuki, Miao Pan

    Abstract: Ear canal scanning/sensing (ECS) has emerged as a novel biometric authentication method for mobile devices paired with wireless earbuds. Existing studies have demonstrated the uniqueness of ear canals by training and testing machine learning classifiers on ECS data. However, implementing practical ECS-based authentication requires preventing raw biometric data leakage and designing computationally… ▽ More

    Submitted 2 October, 2025; originally announced October 2025.

  46. arXiv:2509.05785  [pdf, ps, other

    cs.CV

    CRAB: Camera-Radar Fusion for Reducing Depth Ambiguity in Backward Projection based View Transformation

    Authors: In-Jae Lee, Sihwan Hwang, Youngseok Kim, Wonjune Kim, Sanmin Kim, Dongsuk Kum

    Abstract: Recently, camera-radar fusion-based 3D object detection methods in bird's eye view (BEV) have gained attention due to the complementary characteristics and cost-effectiveness of these sensors. Previous approaches using forward projection struggle with sparse BEV feature generation, while those employing backward projection overlook depth ambiguity, leading to false positives. In this paper, to add… ▽ More

    Submitted 6 September, 2025; originally announced September 2025.

    Comments: Accepted by ICRA 2025

  47. arXiv:2508.02062  [pdf, ps, other

    cs.RO cs.AI

    RICL: Adding In-Context Adaptability to Pre-Trained Vision-Language-Action Models

    Authors: Kaustubh Sridhar, Souradeep Dutta, Dinesh Jayaraman, Insup Lee

    Abstract: Multi-task ``vision-language-action'' (VLA) models have recently demonstrated increasing promise as generalist foundation models for robotics, achieving non-trivial performance out of the box on new tasks in new environments. However, for such models to be truly useful, an end user must have easy means to teach them to improve. For language and vision models, the emergent ability to perform in-con… ▽ More

    Submitted 4 August, 2025; originally announced August 2025.

    Comments: Conference on Robot Learning 2025 (CoRL 2025), 17 pages

  48. arXiv:2508.00438  [pdf, ps, other

    eess.IV cs.CV

    Diffusion-Based User-Guided Data Augmentation for Coronary Stenosis Detection

    Authors: Sumin Seo, In Kyu Lee, Hyun-Woo Kim, Jaesik Min, Chung-Hwan Jung

    Abstract: Coronary stenosis is a major risk factor for ischemic heart events leading to increased mortality, and medical treatments for this condition require meticulous, labor-intensive analysis. Coronary angiography provides critical visual cues for assessing stenosis, supporting clinicians in making informed decisions for diagnosis and treatment. Recent advances in deep learning have shown great potentia… ▽ More

    Submitted 1 August, 2025; originally announced August 2025.

    Comments: Accepted at MICCAI 2025. Dataset available at https://github.com/medipixel/DiGDA

  49. arXiv:2507.21756  [pdf, ps, other

    cs.CV cs.AI

    LiteFat: Lightweight Spatio-Temporal Graph Learning for Real-Time Driver Fatigue Detection

    Authors: Jing Ren, Suyu Ma, Hong Jia, Xiwei Xu, Ivan Lee, Haytham Fayek, Xiaodong Li, Feng Xia

    Abstract: Detecting driver fatigue is critical for road safety, as drowsy driving remains a leading cause of traffic accidents. Many existing solutions rely on computationally demanding deep learning models, which result in high latency and are unsuitable for embedded robotic devices with limited resources (such as intelligent vehicles/cars) where rapid detection is necessary to prevent accidents. This pape… ▽ More

    Submitted 13 August, 2025; v1 submitted 29 July, 2025; originally announced July 2025.

    Comments: 8 pages, 4 figures

  50. arXiv:2507.21384  [pdf

    cs.RO eess.SY

    Projecting the New Body: How Body Image Evolves During Learning to Walk with a Wearable Robot

    Authors: I-Chieh Lee, He Huang

    Abstract: Advances in wearable robotics challenge the traditional definition of human motor systems, as wearable robots redefine body structure, movement capability, and perception of their own bodies. We measured gait performance and perceived body images via Selected Coefficient of Perceived Motion, SCoMo, after each training session. Based on human motor learning theory extended to wearer-robot systems,… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.