Skip to main content

Showing 1–50 of 192 results for author: Yi, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.25342  [pdf, ps, other

    cs.LG

    From Intent to Evidence: A Categorical Approach for Structural Evaluation of Deep Research Agents

    Authors: Shuoling Liu, Zhiquan Tan, Kun Yi, Hui Wu, Yihan Li, Jiangpeng Yan, Liyuan Chen, Kai Chen, Qiang Yang

    Abstract: Although deep research agents (DRAs) have emerged as a promising paradigm for complex information synthesis, their evaluation remains constrained by ad hoc empirical benchmarks. These heuristic approaches do not rigorously model agent behavior or adequately stress-test long-horizon synthesis and ambiguity resolution. To bridge this gap, we formalize DRA behavior through the lens of category theory… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

  2. arXiv:2603.22572  [pdf, ps, other

    cs.CV

    FullCircle: Effortless 3D Reconstruction from Casual 360$^\circ$ Captures

    Authors: Yalda Foroutan, Ipek Oztas, Daniel Rebain, Aysegul Dundar, Kwang Moo Yi, Lily Goli, Andrea Tagliasacchi

    Abstract: Radiance fields have emerged as powerful tools for 3D scene reconstruction. However, casual capture remains challenging due to the narrow field of view of perspective cameras, which limits viewpoint coverage and feature correspondences necessary for reliable camera calibration and reconstruction. While commercially available 360$^\circ$ cameras offer significantly broader coverage than perspective… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

  3. arXiv:2603.19672  [pdf, ps, other

    cs.CV

    Making Video Models Adhere to User Intent with Minor Adjustments

    Authors: Daniel Ajisafe, Eric Hedlin, Helge Rhodin, Kwang Moo Yi

    Abstract: With the recent drastic advancements in text-to-video diffusion models, controlling their generations has drawn interest. A popular way for control is through bounding boxes or layouts. However, enforcing adherence to these control inputs is still an open problem. In this work, we show that by slightly adjusting user-provided bounding boxes we can improve both the quality of generations and the ad… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

    Comments: Project page and code: https://ubc-vision.github.io/MinorAdjustVideo/docs/webpage/index.html

  4. arXiv:2602.21637  [pdf, ps, other

    cs.CV

    CARE: A Molecular-Guided Foundation Model with Adaptive Region Modeling for Whole Slide Image Analysis

    Authors: Di Zhang, Zhangpeng Gong, Xiaobo Pang, Jiashuai Liu, Junbo Lu, Hao Cui, Jiusong Ge, Zhi Zeng, Kai Yi, Yinghua Li, Si Liu, Tingsong Yu, Haoran Wang, Mireia Crispin-Ortuzar, Weimiao Yu, Chen Li, Zeyu Gao

    Abstract: Foundation models have recently achieved impressive success in computational pathology, demonstrating strong generalization across diverse histopathology tasks. However, existing models overlook the heterogeneous and non-uniform organization of pathological regions of interest (ROIs) because they rely on natural image backbones not tailored for tissue morphology. Consequently, they often fail to c… ▽ More

    Submitted 17 March, 2026; v1 submitted 25 February, 2026; originally announced February 2026.

    Comments: Accepted to CVPR 2026

  5. arXiv:2602.19068  [pdf, ps, other

    cs.LG

    TimeRadar: A Domain-Rotatable Foundation Model for Time Series Anomaly Detection

    Authors: Hui He, Hezhe Qiao, Yutong Chen, Kun Yi, Guansong Pang

    Abstract: Current time series foundation models (TSFMs) primarily focus on learning prevalent and regular patterns within a predefined time or frequency domain to enable supervised downstream tasks (e.g., forecasting). Consequently, they are often ineffective for inherently unsupervised downstream tasks-such as time series anomaly detection (TSAD), which aims to identify rare, irregular patterns. This limit… ▽ More

    Submitted 22 February, 2026; originally announced February 2026.

  6. arXiv:2601.10866  [pdf, ps, other

    cs.CR

    Adaptive Privacy Budgeting

    Authors: Yuting Liang, Ke Yi

    Abstract: We study the problem of adaptive privacy budgeting under generalized differential privacy. Consider the setting where each user $i\in [n]$ holds a tuple $x_i\in U:=U_1\times \dotsb \times U_T$, where $x_i(l)\in U_l$ represents the $l$-th component of their data. For every $l\in [T]$ (or a subset), an untrusted analyst wishes to compute some $f_l(x_1(l),\dots,x_n(l))$, while respecting the privacy… ▽ More

    Submitted 15 January, 2026; originally announced January 2026.

  7. arXiv:2511.19985  [pdf, ps, other

    cs.CV

    SONIC: Spectral Optimization of Noise for Inpainting with Consistency

    Authors: Seungyeon Baek, Erqun Dong, Shadan Namazifard, Mark J. Matthews, Kwang Moo Yi

    Abstract: We propose a novel training-free method for inpainting with off-the-shelf text-to-image models. While guidance-based methods in theory allow generic models to be used for inverse problems such as inpainting, in practice, their effectiveness is limited, leading to the necessity of specialized inpainting-specific models. In this work, we argue that the missing ingredient for training-free inpainting… ▽ More

    Submitted 26 November, 2025; v1 submitted 25 November, 2025; originally announced November 2025.

  8. arXiv:2511.10218  [pdf, ps, other

    cs.AI

    MTP: Exploring Multimodal Urban Traffic Profiling with Modality Augmentation and Spectrum Fusion

    Authors: Haolong Xiang, Peisi Wang, Xiaolong Xu, Kun Yi, Xuyun Zhang, Quanzheng Sheng, Amin Beheshti, Wei Fan

    Abstract: With rapid urbanization in the modern era, traffic signals from various sensors have been playing a significant role in monitoring the states of cities, which provides a strong foundation in ensuring safe travel, reducing traffic congestion and optimizing urban mobility. Most existing methods for traffic signal modeling often rely on the original data modality, i.e., numerical direct readings from… ▽ More

    Submitted 16 November, 2025; v1 submitted 13 November, 2025; originally announced November 2025.

  9. arXiv:2510.19710  [pdf, ps, other

    cs.LG

    SEMPO: Lightweight Foundation Models for Time Series Forecasting

    Authors: Hui He, Kun Yi, Yuanchi Ma, Qi Zhang, Zhendong Niu, Guansong Pang

    Abstract: The recent boom of large pre-trained models witnesses remarkable success in developing foundation models (FMs) for time series forecasting. Despite impressive performance across diverse downstream forecasting tasks, existing time series FMs possess massive network architectures and require substantial pre-training on large-scale datasets, which significantly hinders their deployment in resource-co… ▽ More

    Submitted 22 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS 2025

  10. arXiv:2510.05399  [pdf, ps, other

    cs.LG astro-ph.SR cs.AI

    Comparing LSTM-Based Sequence-to-Sequence Forecasting Strategies for 24-Hour Solar Proton Flux Profiles Using GOES Data

    Authors: Kangwoo Yi, Bo Shen, Qin Li, Haimin Wang, Yong-Jae Moon, Jaewon Lee, Hwanhee Lee

    Abstract: Solar Proton Events (SPEs) cause significant radiation hazards to satellites, astronauts, and technological systems. Accurate forecasting of their proton flux time profiles is crucial for early warnings and mitigation. This paper explores deep learning sequence-to-sequence (seq2seq) models based on Long Short-Term Memory networks to predict 24-hour proton flux profiles following SPE onsets. We use… ▽ More

    Submitted 6 October, 2025; originally announced October 2025.

    Comments: 7 pages; accepted as a workshop paper at ICDM 2025

  11. arXiv:2509.20447  [pdf, ps, other

    astro-ph.EP astro-ph.IM cs.LG

    Neural Networks as Surrogate Solvers for Time-Dependent Accretion Disk Dynamics

    Authors: Shunyuan Mao, Weiqi Wang, Sifan Wang, Ruobing Dong, Lu Lu, Kwang Moo Yi, Paris Perdikaris, Andrea Isella, Sébastien Fabbro, Lile Wang

    Abstract: Accretion disks are ubiquitous in astrophysics, appearing in diverse environments from planet-forming systems to X-ray binaries and active galactic nuclei. Traditionally, modeling their dynamics requires computationally intensive (magneto)hydrodynamic simulations. Recently, Physics-Informed Neural Networks (PINNs) have emerged as a promising alternative. This approach trains neural networks direct… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

    Comments: Astrophysical Journal Letters accepted; associate animations are available at https://doi.org/10.6084/m9.figshare.30192904

  12. arXiv:2509.08233  [pdf, ps, other

    cs.LG cs.AI

    Strategies for Improving Communication Efficiency in Distributed and Federated Learning: Compression, Local Training, and Personalization

    Authors: Kai Yi

    Abstract: Distributed and federated learning are essential paradigms for training models across decentralized data sources while preserving privacy, yet communication overhead remains a major bottleneck. This dissertation explores strategies to improve communication efficiency, focusing on model compression, local training, and personalization. We establish a unified framework for biased and unbiased compre… ▽ More

    Submitted 9 September, 2025; originally announced September 2025.

    Comments: PhD Dissertation

  13. arXiv:2508.16884  [pdf, ps, other

    cs.CV cs.NE

    A Lightweight Convolution and Vision Transformer integrated model with Multi-scale Self-attention Mechanism

    Authors: Yi Zhang, Lingxiao Wei, Bowei Zhang, Ziwei Liu, Kai Yi, Shu Hu

    Abstract: Vision Transformer (ViT) has prevailed in computer vision tasks due to its strong long-range dependency modelling ability. \textcolor{blue}{However, its large model size and weak local feature modeling ability hinder its application in real scenarios. To balance computation efficiency and performance in downstream vision tasks, we propose an efficient ViT model with sparse attention (dubbed SAEViT… ▽ More

    Submitted 11 September, 2025; v1 submitted 22 August, 2025; originally announced August 2025.

  14. arXiv:2508.08749  [pdf, ps, other

    cs.CR cs.DB

    Approximate DBSCAN under Differential Privacy

    Authors: Yuan Qiu, Ke Yi

    Abstract: This paper revisits the DBSCAN problem under differential privacy (DP). Existing DP-DBSCAN algorithms aim at publishing the cluster labels of the input points. However, we show that both empirically and theoretically, this approach cannot offer any utility in the published results. We therefore propose an alternative definition of DP-DBSCAN based on the notion of spans. We argue that publishing th… ▽ More

    Submitted 13 August, 2025; v1 submitted 12 August, 2025; originally announced August 2025.

  15. arXiv:2507.14156  [pdf, ps, other

    q-bio.BM cs.AI

    All-atom inverse protein folding through discrete flow matching

    Authors: Kai Yi, Kiarash Jamali, Sjors H. W. Scheres

    Abstract: The recent breakthrough of AlphaFold3 in modeling complex biomolecular interactions, including those between proteins and ligands, nucleotides, or metal ions, creates new opportunities for protein design. In so-called inverse protein folding, the objective is to find a sequence of amino acids that adopts a target protein structure. Many inverse folding methods struggle to predict sequences for com… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: ICML2025

  16. arXiv:2507.05234  [pdf, ps, other

    cs.PL cs.SE

    React-tRace: A Semantics for Understanding React Hooks

    Authors: Jay Lee, Joongwon Ahn, Kwangkeun Yi

    Abstract: React has become the most widely used web front-end framework, enabling the creation of user interfaces in a declarative and compositional manner. Hooks are a set of APIs that manage side effects in function components in React. However, their semantics are often seen as opaque to developers, leading to UI bugs. We introduce React-tRace, a formalization of the semantics of the essence of React Hoo… ▽ More

    Submitted 21 August, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: To be published in OOPSLA2 2025

  17. arXiv:2505.18505  [pdf, ps, other

    cs.LG cs.AI

    How Particle System Theory Enhances Hypergraph Message Passing

    Authors: Yixuan Ma, Kai Yi, Pietro Lio, Shi Jin, Yu Guang Wang

    Abstract: Hypergraphs effectively model higher-order relationships in natural phenomena, capturing complex interactions beyond pairwise connections. We introduce a novel hypergraph message passing framework inspired by interacting particle systems, where hyperedges act as fields inducing shared node dynamics. By incorporating attraction, repulsion, and Allen-Cahn forcing terms, particles of varying classes… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  18. arXiv:2504.03279  [pdf, other

    cs.DB

    Yannakakis+: Practical Acyclic Query Evaluation with Theoretical Guarantees

    Authors: Qichen Wang, Bingnan Chen, Binyang Dai, Ke Yi, Feifei Li, Liang Lin

    Abstract: Acyclic conjunctive queries form the backbone of most analytical workloads, and have been extensively studied in the literature from both theoretical and practical angles. However, there is still a large divide between theory and practice. While the 40-year-old Yannakakis algorithm has strong theoretical running time guarantees, it has not been adopted in real systems due to its high hidden consta… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: Technical report for the SIGMOD 2025 paper

    ACM Class: H.2.4

  19. arXiv:2503.24366  [pdf, other

    cs.CV cs.GR

    StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting

    Authors: Shakiba Kheradmand, Delio Vicini, George Kopanas, Dmitry Lagun, Kwang Moo Yi, Mark Matthews, Andrea Tagliasacchi

    Abstract: 3D Gaussian splatting (3DGS) is a popular radiance field method, with many application-specific extensions. Most variants rely on the same core algorithm: depth-sorting of Gaussian splats then rasterizing in primitive order. This ensures correct alpha compositing, but can cause rendering artifacts due to built-in approximations. Moreover, for a fixed representation, sorted rendering offers little… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  20. arXiv:2503.10256  [pdf, ps, other

    cs.CV

    ROODI: Reconstructing Occluded Objects with Denoising Inpainters

    Authors: Yeonjin Chang, Erqun Dong, Seunghyeon Seo, Nojun Kwak, Kwang Moo Yi

    Abstract: While the quality of novel-view images has improved dramatically with 3D Gaussian Splatting, extracting specific objects from scenes remains challenging. Isolating individual 3D Gaussian primitives for each object and handling occlusions in scenes remains far from being solved. We propose a novel object extraction method based on two key principles: (1) object-centric reconstruction through remova… ▽ More

    Submitted 9 August, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: Project page: https://yeonjin-chang.github.io/ROODI/

  21. arXiv:2502.12534  [pdf, other

    cs.CV

    NoKSR: Kernel-Free Neural Surface Reconstruction via Point Cloud Serialization

    Authors: Zhen Li, Weiwei Sun, Shrisudhan Govindarajan, Shaobo Xia, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi

    Abstract: We present a novel approach to large-scale point cloud surface reconstruction by developing an efficient framework that converts an irregular point cloud into a signed distance field (SDF). Our backbone builds upon recent transformer-based architectures (i.e., PointTransformerV3), that serializes the point cloud into a locality-preserving sequence of tokens. We efficiently predict the SDF value at… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: Project page: see https://theialab.github.io/noksr/

  22. arXiv:2502.04515  [pdf, other

    cs.LG cs.AI

    MedGNN: Towards Multi-resolution Spatiotemporal Graph Learning for Medical Time Series Classification

    Authors: Wei Fan, Jingru Fei, Dingyu Guo, Kun Yi, Xiaozhuang Song, Haolong Xiang, Hangting Ye, Min Li

    Abstract: Medical time series has been playing a vital role in real-world healthcare systems as valuable information in monitoring health conditions of patients. Accurate classification for medical time series, e.g., Electrocardiography (ECG) signals, can help for early detection and diagnosis. Traditional methods towards medical time series classification rely on handcrafted feature extraction and statisti… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

    Comments: Accepted by WWW 2025

  23. arXiv:2502.01157  [pdf, other

    cs.CV

    Radiant Foam: Real-Time Differentiable Ray Tracing

    Authors: Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi

    Abstract: Research on differentiable scene representations is consistently moving towards more efficient, real-time models. Recently, this has led to the popularization of splatting methods, which eschew the traditional ray-based rendering of radiance fields in favor of rasterization. This has yielded a significant improvement in rendering speeds due to the efficiency of rasterization algorithms and hardwar… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  24. arXiv:2501.18980  [pdf, other

    cs.LG cs.AI

    Symmetric Pruning of Large Language Models

    Authors: Kai Yi, Peter Richtárik

    Abstract: Popular post-training pruning methods such as Wanda and RIA are known for their simple, yet effective, designs that have shown exceptional empirical performance. Wanda optimizes performance through calibrated activations during pruning, while RIA emphasizes the relative, rather than absolute, importance of weight elements. Despite their practical success, a thorough theoretical foundation explaini… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

  25. arXiv:2501.17216  [pdf, other

    cs.LG

    Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting

    Authors: Jingru Fei, Kun Yi, Wei Fan, Qi Zhang, Zhendong Niu

    Abstract: We propose an energy amplification technique to address the issue that existing models easily overlook low-energy components in time series forecasting. This technique comprises an energy amplification block and an energy restoration block. The energy amplification block enhances the energy of low-energy components to improve the model's learning efficiency for these components, while the energy r… ▽ More

    Submitted 22 February, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

    Comments: Accepted by AAAI 2025

  26. arXiv:2501.01648  [pdf, other

    cs.CV cs.MM

    Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection

    Authors: Kang Yi, Haoran Tang, Yumeng Li, Jing Xu, Jun Zhang

    Abstract: RGB-D salient object detection (SOD), aiming to highlight prominent regions of a given scene by jointly modeling RGB and depth information, is one of the challenging pixel-level prediction tasks. Recently, the dual-attention mechanism has been devoted to this area due to its ability to strengthen the detection process. However, most existing methods directly fuse attentional cross-modality feature… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

  27. arXiv:2412.19104  [pdf, other

    cs.CV cs.LG

    Improving Generative Pre-Training: An In-depth Study of Masked Image Modeling and Denoising Models

    Authors: Hyesong Choi, Daeun Kim, Sungmin Cha, Kwang Moo Yi, Dongbo Min

    Abstract: In this work, we dive deep into the impact of additive noise in pre-training deep networks. While various methods have attempted to use additive noise inspired by the success of latent denoising diffusion models, when used in combination with masked image modeling, their gains have been marginal when it comes to recognition tasks. We thus investigate why this would be the case, in an attempt to fi… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

  28. arXiv:2412.17040  [pdf, other

    cs.LG

    HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories

    Authors: Eric Hedlin, Munawar Hayat, Fatih Porikli, Kwang Moo Yi, Shweta Mahajan

    Abstract: To efficiently adapt large models or to train generative models of neural representations, Hypernetworks have drawn interest. While hypernetworks work well, training them is cumbersome, and often requires ground truth optimized weights for each sample. However, obtaining each of these weights is a training problem of its own-one needs to train, e.g., adaptation weights or even an entire neural fie… ▽ More

    Submitted 19 May, 2025; v1 submitted 22 December, 2024; originally announced December 2024.

  29. arXiv:2412.07469  [pdf, ps, other

    stat.ML cs.LG

    Score-matching-based Structure Learning for Temporal Data on Networks

    Authors: Hao Chen, Kai Yi

    Abstract: Causal discovery is a crucial initial step in establishing causality from empirical data and background knowledge. Numerous algorithms have been developed for this purpose. Among them, the score-matching method has demonstrated superior performance across various evaluation metrics, particularly for the commonly encountered Additive Nonlinear Causal Models. However, current score-matching-based al… ▽ More

    Submitted 12 April, 2026; v1 submitted 10 December, 2024; originally announced December 2024.

  30. arXiv:2412.04867  [pdf, other

    cs.CV

    MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects

    Authors: Lei Fan, Dongdong Fan, Zhiguang Hu, Yiwen Ding, Donglin Di, Kai Yi, Maurice Pagnucco, Yang Song

    Abstract: We present MANTA, a visual-text anomaly detection dataset for tiny objects. The visual component comprises over 137.3K images across 38 object categories spanning five typical domains, of which 8.6K images are labeled as anomalous with pixel-level annotations. Each image is captured from five distinct viewpoints to ensure comprehensive object coverage. The text component consists of two subsets: D… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

    Comments: https://grainnet.github.io/MANTA

  31. arXiv:2411.01623  [pdf, other

    cs.LG cs.AI eess.SP

    FilterNet: Harnessing Frequency Filters for Time Series Forecasting

    Authors: Kun Yi, Jingru Fei, Qi Zhang, Hui He, Shufeng Hao, Defu Lian, Wei Fan

    Abstract: While numerous forecasters have been proposed using different network architectures, the Transformer-based models have state-of-the-art performance in time series forecasting. However, forecasters based on Transformers are still suffering from vulnerability to high-frequency signals, efficiency in computation, and bottleneck in full-spectrum utilization, which essentially are the cornerstones for… ▽ More

    Submitted 4 November, 2024; v1 submitted 3 November, 2024; originally announced November 2024.

    Comments: Accepted by NeurIPS 2024

  32. arXiv:2409.20361  [pdf, other

    cs.LG cs.AI

    Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference

    Authors: Ke Yi, Zengke Liu, Jianwei Zhang, Chengyuan Li, Tong Zhang, Junyang Lin, Jingren Zhou

    Abstract: Large language models have demonstrated promising capabilities upon scaling up parameters. However, serving large language models incurs substantial computation and memory movement costs due to their large scale. Quantization methods have been employed to reduce service costs and latency. Nevertheless, outliers in activations hinder the development of INT4 weight-activation quantization. Existing… ▽ More

    Submitted 11 November, 2024; v1 submitted 30 September, 2024; originally announced September 2024.

  33. arXiv:2409.17228  [pdf, other

    astro-ph.EP cs.AI cs.LG

    Disk2Planet: A Robust and Automated Machine Learning Tool for Parameter Inference in Disk-Planet Systems

    Authors: Shunyuan Mao, Ruobing Dong, Kwang Moo Yi, Lu Lu, Sifan Wang, Paris Perdikaris

    Abstract: We introduce Disk2Planet, a machine learning-based tool to infer key parameters in disk-planet systems from observed protoplanetary disk structures. Disk2Planet takes as input the disk structures in the form of two-dimensional density and velocity maps, and outputs disk and planet properties, that is, the Shakura--Sunyaev viscosity, the disk aspect ratio, the planet--star mass ratio, and the plane… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: Accepted to ApJ

  34. arXiv:2409.12377  [pdf, other

    eess.IV cs.CV

    Fundus image enhancement through direct diffusion bridges

    Authors: Sehui Kim, Hyungjin Chung, Se Hie Park, Eui-Sang Chung, Kayoung Yi, Jong Chul Ye

    Abstract: We propose FD3, a fundus image enhancement method based on direct diffusion bridges, which can cope with a wide range of complex degradations, including haze, blur, noise, and shadow. We first propose a synthetic forward model through a human feedback loop with board-certified ophthalmologists for maximal quality improvement of low-quality in-vivo images. Using the proposed forward model, we train… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: Published at IEEE JBHI. 12 pages, 10 figures. Code and Data: https://github.com/heeheee888/FD3

  35. arXiv:2409.06104  [pdf, other

    cs.CV

    LSE-NeRF: Learning Sensor Modeling Errors for Deblured Neural Radiance Fields with RGB-Event Stereo

    Authors: Wei Zhi Tang, Daniel Rebain, Kostantinos G. Derpanis, Kwang Moo Yi

    Abstract: We present a method for reconstructing a clear Neural Radiance Field (NeRF) even with fast camera motions. To address blur artifacts, we leverage both (blurry) RGB images and event camera data captured in a binocular configuration. Importantly, when reconstructing our clear NeRF, we consider the camera modeling imperfections that arise from the simple pinhole camera model as learned embeddings for… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  36. arXiv:2409.06030  [pdf, other

    cs.GR cs.CV

    NESI: Shape Representation via Neural Explicit Surface Intersection

    Authors: Congyi Zhang, Jinfan Yang, Eric Hedlin, Suzuran Takikawa, Nicholas Vining, Kwang Moo Yi, Wenping Wang, Alla Sheffer

    Abstract: Compressed representations of 3D shapes that are compact, accurate, and can be processed efficiently directly in compressed form, are extremely useful for digital media applications. Recent approaches in this space focus on learned implicit or parametric representations. While implicits are well suited for tasks such as in-out queries, they lack natural 2D parameterization, complicating tasks such… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  37. arXiv:2409.05334  [pdf, other

    cs.CV

    Lagrangian Hashing for Compressed Neural Field Representations

    Authors: Shrisudhan Govindarajan, Zeno Sambugaro, Akhmedkhan, Shabanov, Towaki Takikawa, Daniel Rebain, Weiwei Sun, Nicola Conci, Kwang Moo Yi, Andrea Tagliasacchi

    Abstract: We present Lagrangian Hashing, a representation for neural fields combining the characteristics of fast training NeRF methods that rely on Eulerian grids (i.e.~InstantNGP), with those that employ points equipped with features as a way to represent information (e.g. 3D Gaussian Splatting or PointNeRF). We achieve this by incorporating a point-based representation into the high-resolution layers of… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: Project page: https://theialab.github.io/laghashes/

  38. arXiv:2408.02687  [pdf, other

    cs.CV

    Compositional Physical Reasoning of Objects and Events from Videos

    Authors: Zhenfang Chen, Shilong Dong, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

    Abstract: Understanding and reasoning about objects' physical properties in the natural world is a fundamental challenge in artificial intelligence. While some properties like colors and shapes can be directly observed, others, such as mass and electric charge, are hidden from the objects' visual appearance. This paper addresses the unique challenge of inferring these hidden physical properties from objects… ▽ More

    Submitted 26 May, 2025; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted by TPAMI 2025. arXiv admin note: text overlap with arXiv:2205.01089

  39. arXiv:2407.13194  [pdf, other

    cs.LG cs.AI

    Robust Multivariate Time Series Forecasting against Intra- and Inter-Series Transitional Shift

    Authors: Hui He, Qi Zhang, Kun Yi, Xiaojun Xue, Shoujin Wang, Liang Hu, Longbing Cao

    Abstract: The non-stationary nature of real-world Multivariate Time Series (MTS) data presents forecasting models with a formidable challenge of the time-variant distribution of time series, referred to as distribution shift. Existing studies on the distribution shift mostly adhere to adaptive normalization techniques for alleviating temporal mean and covariance shifts or time-variant modeling for capturing… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 19 pages, 11 figures

    MSC Class: 68Txx ACM Class: I.2.6

  40. arXiv:2407.00502  [pdf, other

    cs.LG cs.AI

    Deep Frequency Derivative Learning for Non-stationary Time Series Forecasting

    Authors: Wei Fan, Kun Yi, Hangting Ye, Zhiyuan Ning, Qi Zhang, Ning An

    Abstract: While most time series are non-stationary, it is inevitable for models to face the distribution shift issue in time series forecasting. Existing solutions manipulate statistical measures (usually mean and std.) to adjust time series distribution. However, these operations can be theoretically seen as the transformation towards zero frequency component of the spectrum which cannot reveal full distr… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted by IJCAI 2024

  41. arXiv:2406.05343  [pdf, other

    cs.AI cs.CL

    M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark

    Authors: Wei Song, Yadong Li, Jianhua Xu, Guowei Wu, Lingfeng Ming, Kexin Yi, Weihua Luo, Houyi Li, Yi Du, Fangda Guo, Kaicheng Yu

    Abstract: As recent multi-modality large language models (MLLMs) have shown formidable proficiency on various complex tasks, there has been increasing attention on debating whether these models could eventually mirror human intelligence. However, existing benchmarks mainly focus on evaluating solely on task performance, such as the accuracy of identifying the attribute of an object. Combining well-developed… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  42. arXiv:2406.01460  [pdf, other

    cs.CV cs.AI

    MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization

    Authors: Yu Zhang, Qi Zhang, Zixuan Gong, Yiwei Shi, Yepeng Liu, Duoqian Miao, Yang Liu, Ke Liu, Kun Yi, Wei Fan, Liang Hu, Changwei Wang

    Abstract: Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success, leading to rapid advancements in multimodal studies. However, CLIP faces a notable challenge in terms of inefficient data utilization. It relies on a single contrastive supervision for each image-text pair during representation learning, disregarding a substantial amount of valuable information that could offer richer s… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  43. arXiv:2406.01115  [pdf, other

    cs.LG

    Cohort Squeeze: Beyond a Single Communication Round per Cohort in Cross-Device Federated Learning

    Authors: Kai Yi, Timur Kharisov, Igor Sokolov, Peter Richtárik

    Abstract: Virtually all federated learning (FL) methods, including FedAvg, operate in the following manner: i) an orchestrating server sends the current model parameters to a cohort of clients selected via certain rule, ii) these clients then independently perform a local training procedure (e.g., via SGD or Adam) using their own training data, and iii) the resulting models are shipped to the server for agg… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  44. arXiv:2405.20623  [pdf, other

    cs.LG math.OC

    Sparse-ProxSkip: Accelerated Sparse-to-Sparse Training in Federated Learning

    Authors: Georg Meinhardt, Kai Yi, Laurent Condat, Peter Richtárik

    Abstract: In Federated Learning (FL), both client resource constraints and communication costs pose major problems for training large models. In the centralized setting, sparse training addresses resource constraints, while in the distributed setting, local training addresses communication costs. Recent work has shown that local training provably improves communication complexity through acceleration. In th… ▽ More

    Submitted 28 February, 2025; v1 submitted 31 May, 2024; originally announced May 2024.

  45. arXiv:2405.20202  [pdf, other

    cs.AI

    One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments

    Authors: Ke Yi, Yuhui Xu, Heng Chang, Chen Tang, Yuan Meng, Tong Zhang, Jia Li

    Abstract: Large Language Models (LLMs) have advanced rapidly but face significant memory demands. While quantization has shown promise for LLMs, current methods typically require lengthy training to alleviate the performance degradation from quantization loss. However, deploying LLMs across diverse scenarios with different resource constraints, e.g., servers and personal computers, requires repeated trainin… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  46. arXiv:2405.14852  [pdf, other

    cs.LG

    PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

    Authors: Vladimir Malinovskii, Denis Mazur, Ivan Ilin, Denis Kuznedelev, Konstantin Burlachenko, Kai Yi, Dan Alistarh, Peter Richtarik

    Abstract: There has been significant interest in "extreme" compression of large language models (LLMs), i.e., to 1-2 bits per parameter, which allows such models to be executed efficiently on resource-constrained devices. Existing work focused on improved one-shot quantization techniques and weight representations; yet, purely post-training approaches are reaching diminishing returns in terms of the accurac… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint

  47. arXiv:2405.06307  [pdf, other

    cs.CR

    Smooth Sensitivity for Geo-Privacy

    Authors: Yuting Liang, Ke Yi

    Abstract: Suppose each user $i$ holds a private value $x_i$ in some metric space $(U, \mathrm{dist})$, and an untrusted data analyst wishes to compute $\sum_i f(x_i)$ for some function $f : U \rightarrow \mathbb{R}$ by asking each user to send in a privatized $f(x_i)$. This is a fundamental problem in privacy-preserving population analytics, and the local model of differential privacy (LDP) is the predomina… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  48. arXiv:2405.01258  [pdf, other

    cs.CV cs.RO eess.IV

    Towards Consistent Object Detection via LiDAR-Camera Synergy

    Authors: Kai Luo, Hao Wu, Kefu Yi, Kailun Yang, Wei Hao, Rongdong Hu

    Abstract: As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. Currently, there is no existing model capable of detecting an object's position in both point clouds and images while also determining their corresponding relati… ▽ More

    Submitted 9 August, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted to IEEE SMC 2024. The source code will be made publicly available at https://github.com/xifen523/COD

  49. arXiv:2404.14396  [pdf, other

    cs.CV

    SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

    Authors: Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, Ying Shan

    Abstract: The rapid evolution of multimodal foundation model has demonstrated significant progresses in vision-language understanding and generation, e.g., our previous work SEED-LLaMA. However, there remains a gap between its capability and the real-world applicability, primarily due to the model's limited capacity to effectively respond to various user instructions and interact with diverse visual data. I… ▽ More

    Submitted 2 March, 2025; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: We added benchmark results (without updating models) and ablation study in this version. Project released at: https://github.com/AILab-CVC/SEED-X

  50. arXiv:2404.13282  [pdf, other

    cs.CV cs.MM

    Wills Aligner: Multi-Subject Collaborative Brain Visual Decoding

    Authors: Guangyin Bao, Qi Zhang, Zixuan Gong, Jialei Zhou, Wei Fan, Kun Yi, Usman Naseem, Liang Hu, Duoqian Miao

    Abstract: Decoding visual information from human brain activity has seen remarkable advancements in recent research. However, the diversity in cortical parcellation and fMRI patterns across individuals has prompted the development of deep learning models tailored to each subject. The personalization limits the broader applicability of brain visual decoding in real-world scenarios. To address this issue, we… ▽ More

    Submitted 16 December, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

    Comments: AAAI 2025, 16 pages