Skip to main content

Showing 1–50 of 221 results for author: Hsu, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.23208  [pdf, ps, other

    cs.LG

    A One-Inclusion Graph Approach to Multi-Group Learning

    Authors: Noah Bergam, Samuel Deng, Daniel Hsu

    Abstract: We prove the tightest-known upper bounds on the sample complexity of multi-group learning. Our algorithm extends the one-inclusion graph prediction strategy using a generalization of bipartite $b$-matching. In the group-realizable setting, we provide a lower bound confirming that our algorithm's $\log n / n$ convergence rate is optimal in general. If one relaxes the learning objective such that th… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  2. Navigation beyond Wayfinding: Robots Collaborating with Visually Impaired Users for Environmental Interactions

    Authors: Shaojun Cai, Nuwan Janaka, Ashwin Ram, Janidu Shehan, Yingjia Wan, Kotaro Hara, David Hsu

    Abstract: Robotic guidance systems have shown promise in supporting blind and visually impaired (BVI) individuals with wayfinding and obstacle avoidance. However, most existing systems assume a clear path and do not support a critical aspect of navigation - environmental interactions that require manipulating objects to enable movement. These interactions are challenging for a human-robot pair because they… ▽ More

    Submitted 15 March, 2026; originally announced March 2026.

    Comments: Accepted to ACM/IEEE HRI 2026, 10 pages, 6 figures

  3. arXiv:2603.11126  [pdf, ps, other

    cs.MA cs.CL

    Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

    Authors: Yuanhong Wu, Djallel Bouneffouf, D. Frank Hsu

    Abstract: Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved alignment, they often rely on a single evaluator or narrowly defined reward signals, limiting their ability to capture ethical pluralism. In this work, we propose… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

    Comments: 5 pages, 3 figures, accepted to 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

  4. arXiv:2603.10916  [pdf, ps, other

    cs.LG

    NCAA Bracket Prediction Using Machine Learning and Combinatorial Fusion Analysis

    Authors: Yuanhong Wu, Isaiah Smith, Tushar Marwah, Michael Schroeter, Mohamed Rahouti, D. Frank Hsu

    Abstract: Machine learning models have demonstrated remarkable success in sports prediction in the past years, often treating sports prediction as a classification task within the field. This paper introduces new perspectives for analyzing sports data to predict outcomes more accurately. We leverage rankings to generate team rankings for the 2024 dataset using Combinatorial Fusion Analysis (CFA), a new para… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

    Comments: 8 pages, 4 figures, Published in Proceedings of the 2024 IEEE Cyber Science and Technology Congress (CyberSciTech)

  5. arXiv:2603.10049  [pdf, ps, other

    cs.LG cs.AI

    InFusionLayer: a CFA-based ensemble tool to generate new classifiers for learning and modeling

    Authors: Eric Roginek, Jingyan Xu, D. Frank. Hsu

    Abstract: Ensemble learning is a well established body of methods for machine learning to enhance predictive performance by combining multiple algorithms/models. Combinatorial Fusion Analysis (CFA) has provided method and practice for combining multiple scoring systems, using rank-score characteristic (RSC) function and cognitive diversity (CD), including ensemble method and model fusion. However, there is… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

    Comments: 8 pages, 4 figures, 3 tables; Accepted to 2024 IEEE International Conference on Tools with Artificial Intelligence (IEEE ICTAI)

  6. arXiv:2603.07319  [pdf, ps, other

    cs.LG

    ShakyPrepend: A Multi-Group Learner with Improved Sample Complexity

    Authors: Lujing Zhang, Daniel Hsu, Sivaraman Balakrishnan

    Abstract: Multi-group learning is a learning task that focuses on controlling predictors' conditional losses over specified subgroups. We propose ShakyPrepend, a method that leverages tools inspired by differential privacy to obtain improved theoretical guarantees over existing approaches. Through numerical experiments, we demonstrate that ShakyPrepend adapts to both group structure and spatial heterogeneit… ▽ More

    Submitted 7 March, 2026; originally announced March 2026.

    Comments: 29 pages, 10 figures, submitted to ICML2026

  7. arXiv:2602.12686  [pdf, ps, other

    cs.RO

    SignScene: Visual Sign Grounding for Mapless Navigation

    Authors: Nicky Zimmerman, Joel Loo, Benjamin Koh, Zishuo Wang, David Hsu

    Abstract: Navigational signs enable humans to navigate unfamiliar environments without maps. This work studies how robots can similarly exploit signs for mapless navigation in the open world. A central challenge lies in interpreting signs: real-world signs are diverse and complex, and their abstract semantic contents need to be grounded in the local 3D scene. We formalize this as sign grounding, the problem… ▽ More

    Submitted 13 February, 2026; originally announced February 2026.

    Comments: Under review for a conference

  8. arXiv:2602.11168  [pdf, ps, other

    cs.CL cs.AI

    Enhancing SDG-Text Classification with Combinatorial Fusion Analysis and Generative AI

    Authors: Jingyan Xu, Marcelo L. LaFleur, Christina Schweikert, D. Frank Hsu

    Abstract: (Natural Language Processing) NLP techniques such as text classification and topic discovery are very useful in many application areas including information retrieval, knowledge discovery, policy formulation, and decision-making. However, it remains a challenging problem in cases where the categories are unavailable, difficult to differentiate, or are interrelated. Social analysis with human conte… ▽ More

    Submitted 18 January, 2026; originally announced February 2026.

    Comments: 8 pages, 8 figures, 4 tables; Accepted to 2025 IEEE International Conference on Pervasive Intelligence and Computing (PICom 2025)

  9. arXiv:2602.10603  [pdf, ps, other

    cs.LG

    dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning

    Authors: Arnav Shah, Junzhe Li, Parsa Idehpour, Adibvafa Fallahpour, Brandon Wang, Sukjun Hwang, Bo Wang, Patrick D. Hsu, Hani Goodarzi, Albert Gu

    Abstract: Genomic foundation models have the potential to decode DNA syntax, yet face a fundamental tradeoff in their input representation. Standard fixed-vocabulary tokenizers fragment biologically meaningful motifs such as codons and regulatory elements, while nucleotide-level models preserve biological coherence but incur prohibitive computational costs for long contexts. We introduce dnaHNet, a state-of… ▽ More

    Submitted 13 February, 2026; v1 submitted 11 February, 2026; originally announced February 2026.

  10. arXiv:2602.09002  [pdf, ps, other

    cs.RO cs.AI

    From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection

    Authors: Zilin Fang, Anxing Xiao, David Hsu, Gim Hee Lee

    Abstract: Navigating socially in human environments requires more than satisfying geometric constraints, as collision-free paths may still interfere with ongoing activities or conflict with social norms. Addressing this challenge calls for analyzing interactions between agents and incorporating common-sense reasoning into planning. This paper presents a social robot navigation framework that integrates geom… ▽ More

    Submitted 9 February, 2026; originally announced February 2026.

    Comments: Accepted to IEEE Robotics and Automation Letters (RA-L)

  11. arXiv:2602.00037  [pdf, ps, other

    q-fin.ST cs.AI cs.CE cs.LG

    Bitcoin Price Prediction using Machine Learning and Combinatorial Fusion Analysis

    Authors: Yuanhong Wu, Wei Ye, Jingyan Xu, D. Frank Hsu

    Abstract: In this work, we propose to apply a new model fusion and learning paradigm, known as Combinatorial Fusion Analysis (CFA), to the field of Bitcoin price prediction. Price prediction of financial product has always been a big topic in finance, as the successful prediction of the price can yield significant profit. Every machine learning model has its own strength and weakness, which hinders progress… ▽ More

    Submitted 8 March, 2026; v1 submitted 18 January, 2026; originally announced February 2026.

    Comments: 8 pages, 5 figures, 3 tables; Accepted to 2025 IEEE Conference on Artificial Intelligence (IEEE CAI)

  12. arXiv:2601.16922  [pdf, ps, other

    cs.LG stat.ML

    Group-realizable multi-group learning by minimizing empirical risk

    Authors: Navid Ardeshir, Samuel Deng, Daniel Hsu, Jingwen Liu

    Abstract: The sample complexity of multi-group learning is shown to improve in the group-realizable setting over the agnostic setting, even when the family of groups is infinite so long as it has finite VC dimension. The improved sample complexity is obtained by empirical risk minimization over the class of group-realizable concepts, which itself could have infinite VC dimension. Implementing this approach… ▽ More

    Submitted 23 January, 2026; originally announced January 2026.

  13. arXiv:2601.12684  [pdf, ps, other

    cs.CE

    A Model Fusion Approach for Enhancing Credit Approval Decision Making

    Authors: Yuanhong Wu, Jingyan Xu, Wei Ye, Christina Schweikert, D. Frank Hsu

    Abstract: Credit default poses significant challenges to financial institutions and consumers, resulting in substantial financial losses and diminished trust. As such, credit default risk management has been a critical topic in the financial industry. In this paper, we present Combinatorial Fusion Analysis (CFA), a model fusion framework, that combines multiple machine learning algorithms to detect and pred… ▽ More

    Submitted 18 January, 2026; originally announced January 2026.

    Comments: 7 pages, 3 figures, 2 tables; Accepted to 2025 IEEE International Conference on AI x Business (AIxB 2025)

  14. arXiv:2601.08128  [pdf, ps, other

    cs.AI

    Embedded AI Companion System on Edge Devices

    Authors: Rahul Gupta, Stephen D. H. Hsu

    Abstract: Computational resource constraints on edge devices make it difficult to develop a fully embedded AI companion system with a satisfactory user experience. AI companion and memory systems detailed in existing literature cannot be directly used in such an environment due to lack of compute resources and latency concerns. In this paper, we propose a memory paradigm that alternates between active and i… ▽ More

    Submitted 12 January, 2026; originally announced January 2026.

    Comments: 30 pages, 7 figures

  15. arXiv:2601.03099  [pdf, ps, other

    cs.LG econ.EM stat.ML

    Time-Aware Synthetic Control

    Authors: Saeyoung Rho, Cyrus Illick, Samhitha Narasipura, Alberto Abadie, Daniel Hsu, Vishal Misra

    Abstract: The synthetic control (SC) framework is widely used for observational causal inference with time-series panel data. SC has been successful in diverse applications, but existing methods typically treat the ordering of pre-intervention time indices interchangeable. This invariance means they may not fully take advantage of temporal structure when strong trends are present. We propose Time-Aware Synt… ▽ More

    Submitted 6 January, 2026; originally announced January 2026.

  16. arXiv:2512.17978  [pdf, ps, other

    q-bio.NC cs.LG cs.SD

    MEGState: Phoneme Decoding from Magnetoencephalography Signals

    Authors: Shuntaro Suzuki, Chia-Chun Dan Hsu, Yu Tsao, Komei Sugiura

    Abstract: Decoding linguistically meaningful representations from non-invasive neural recordings remains a central challenge in neural speech decoding. Among available neuroimaging modalities, magnetoencephalography (MEG) provides a safe and repeatable means of mapping speech-related cortical dynamics, yet its low signal-to-noise ratio and high temporal dimensionality continue to hinder robust decoding. In… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

    Comments: Accepted for presentation at LibriBrain Competition, NeurIPS 2025

  17. arXiv:2512.01242  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation

    Authors: Zirui Zhao, Boye Niu, David Hsu, Wee Sun Lee

    Abstract: We study abstract visual composition, in which identity is primarily determined by the spatial configuration and relations among a small set of geometric primitives (e.g., parts, symmetry, topology). They are invariant primarily to texture and photorealistic detail. Composing such structures from fixed components under geometric constraints and vague goal specification (such as text) is non-trivia… ▽ More

    Submitted 15 January, 2026; v1 submitted 30 November, 2025; originally announced December 2025.

  18. arXiv:2511.05936  [pdf, ps, other

    cs.RO cs.AI

    10 Open Challenges Steering the Future of Vision-Language-Action Models

    Authors: Soujanya Poria, Navonil Majumder, Chia-Yu Hung, Amir Ali Bagherzadeh, Chuan Li, Kenneth Kwok, Ziwei Wang, Cheston Tan, Jiajun Wu, David Hsu

    Abstract: Due to their ability of follow natural language instructions, vision-language-action (VLA) models are increasingly prevalent in the embodied AI arena, following the widespread success of their precursors -- LLMs and VLMs. In this paper, we discuss 10 principal milestones in the ongoing development of VLA models -- multimodality, reasoning, data, evaluation, cross-robot action generalization, effic… ▽ More

    Submitted 8 November, 2025; originally announced November 2025.

    Comments: AAAI 2026 (Senior Track)

  19. arXiv:2510.27638  [pdf, ps, other

    cs.LG

    Panprediction: Optimal Predictions for Any Downstream Task and Loss

    Authors: Sivaraman Balakrishnan, Nika Haghtalab, Daniel Hsu, Brian Lee, Eric Zhao

    Abstract: Supervised learning is classically formulated as training a model to minimize a fixed loss function over a fixed distribution, or task. However, an emerging paradigm instead views model training as extracting enough information from data so that the model can be used to minimize many losses on many downstream tasks. We formalize a mathematical framework for this paradigm, which we call panpredicti… ▽ More

    Submitted 31 October, 2025; originally announced October 2025.

    Comments: 25 pages

  20. arXiv:2510.27014  [pdf, ps, other

    cs.LG

    Enhancing Sentiment Classification with Machine Learning and Combinatorial Fusion

    Authors: Sean Patten, Pin-Yu Chen, Christina Schweikert, D. Frank Hsu

    Abstract: This paper presents a novel approach to sentiment classification using the application of Combinatorial Fusion Analysis (CFA) to integrate an ensemble of diverse machine learning models, achieving state-of-the-art accuracy on the IMDB sentiment analysis dataset of 97.072\%. CFA leverages the concept of cognitive diversity, which utilizes rank-score characteristic functions to quantify the dissimil… ▽ More

    Submitted 30 October, 2025; originally announced October 2025.

    Comments: IEEE PICom 2025

  21. arXiv:2510.24680  [pdf, ps, other

    cs.RO

    Fare: Failure Resilience in Learned Visual Navigation Control

    Authors: Zishuo Wang, Joel Loo, David Hsu

    Abstract: While imitation learning (IL) enables effective visual navigation, IL policies are prone to unpredictable failures in out-of-distribution (OOD) scenarios. We advance the notion of failure-resilient policies, which not only detect failures but also recover from them automatically. Failure recognition that identifies the factors causing failure is key to informing recovery: e.g. pinpointing image re… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

  22. arXiv:2510.24380  [pdf, ps, other

    cs.LG

    APEX: Approximate-but-exhaustive search for ultra-large combinatorial synthesis libraries

    Authors: Aryan Pedawi, Jordi Silvestre-Ryan, Bradley Worley, Darren J Hsu, Kushal S Shah, Elias Stehle, Jingrong Zhang, Izhar Wallach

    Abstract: Make-on-demand combinatorial synthesis libraries (CSLs) like Enamine REAL have significantly enabled drug discovery efforts. However, their large size presents a challenge for virtual screening, where the goal is to identify the top compounds in a library according to a computational objective (e.g., optimizing docking score) subject to computational constraints under a limited computational budge… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

  23. arXiv:2510.16609  [pdf, ps, other

    cs.LG cs.AI cs.CC cs.DS

    Prior Knowledge Makes It Possible: From Sublinear Graph Algorithms to LLM Test-Time Methods

    Authors: Avrim Blum, Daniel Hsu, Cyrus Rashtchian, Donya Saless

    Abstract: Test-time augmentation, such as Retrieval-Augmented Generation (RAG) or tool use, critically depends on an interplay between a model's parametric knowledge and externally retrieved information. However, the theoretical underpinnings of this relationship remain poorly understood. Specifically, it is not clear how much pre-training knowledge is required to answer queries with a small number of augme… ▽ More

    Submitted 2 April, 2026; v1 submitted 18 October, 2025; originally announced October 2025.

  24. arXiv:2510.16334  [pdf, ps, other

    cs.IR cs.CL

    Investigating the Association Between Text-Based Indications of Foodborne Illness from Yelp Reviews and New York City Health Inspection Outcomes (2023)

    Authors: Eden Shaveet, Crystal Su, Daniel Hsu, Luis Gravano

    Abstract: Foodborne illnesses are gastrointestinal conditions caused by consuming contaminated food. Restaurants are critical venues to investigate outbreaks because they share sourcing, preparation, and distribution of foods. Public reporting of illness via formal channels is limited, whereas social media platforms host abundant user-generated content that can provide timely public health signals. This pap… ▽ More

    Submitted 17 October, 2025; originally announced October 2025.

    Comments: Presented as a poster at Data Science Day 2024

  25. arXiv:2510.10778  [pdf, ps, other

    cs.RO

    Asset-Centric Metric-Semantic Maps of Indoor Environments

    Authors: Christopher D. Hsu, Pratik Chaudhari

    Abstract: Large Language Models (LLMs) can help robots reason about abstract task specifications. This requires augmenting classical representations of the environment used by robots, such as point-clouds and meshes, with natural language-based priors. There are a number of approaches to do so in the existing literature. While some navigation frameworks leverage scene-level semantics at the expense of objec… ▽ More

    Submitted 10 March, 2026; v1 submitted 12 October, 2025; originally announced October 2025.

    Comments: 9 pages, 8 figures, 3 tables

  26. arXiv:2509.09001  [pdf, ps, other

    cs.LG

    Fast attention mechanisms: a tale of parallelism

    Authors: Jingwen Liu, Hantao Yu, Clayton Sanford, Alexandr Andoni, Daniel Hsu

    Abstract: Transformers have the representational capacity to simulate Massively Parallel Computation (MPC) algorithms, but they suffer from quadratic time complexity, which severely limits their scalability. We introduce an efficient attention mechanism called Approximate Nearest Neighbor Attention (ANNA) with sub-quadratic time complexity. We prove that ANNA-transformers (1) retain the expressive power pre… ▽ More

    Submitted 10 September, 2025; originally announced September 2025.

  27. arXiv:2508.18606  [pdf, ps, other

    cs.RO

    SignLoc: Robust Localization using Navigation Signs and Public Maps

    Authors: Nicky Zimmerman, Joel Loo, Ayush Agrawal, David Hsu

    Abstract: Navigation signs and maps, such as floor plans and street maps, are widely available and serve as ubiquitous aids for way-finding in human environments. Yet, they are rarely used by robot systems. This paper presents SignLoc, a global localization method that leverages navigation signs to localize the robot on publicly available maps -- specifically floor plans and OpenStreetMap (OSM) graphs -- wi… ▽ More

    Submitted 29 August, 2025; v1 submitted 25 August, 2025; originally announced August 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  28. arXiv:2508.13534  [pdf, ps, other

    cs.RO cs.AI cs.CV

    MimicFunc: Imitating Tool Manipulation from a Single Human Video via Functional Correspondence

    Authors: Chao Tang, Anxing Xiao, Yuhong Deng, Tianrun Hu, Wenlong Dong, Hanbo Zhang, David Hsu, Hong Zhang

    Abstract: Imitating tool manipulation from human videos offers an intuitive approach to teaching robots, while also providing a promising and scalable alternative to labor-intensive teleoperation data collection for visuomotor policy learning. While humans can mimic tool manipulation behavior by observing others perform a task just once and effortlessly transfer the skill to diverse tools for functionally e… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

    Comments: Accepted to CoRL 2025

  29. arXiv:2508.13346  [pdf, ps, other

    cs.LG math.ST

    Dimension lower bounds for linear approaches to function approximation

    Authors: Daniel Hsu

    Abstract: This short note presents a linear algebraic approach to proving dimension lower bounds for linear methods that solve $L^2$ function approximation problems. The basic argument has appeared in the literature before (e.g., Barron, 1993) for establishing lower bounds on Kolmogorov $n$-widths. The argument is applied to give sample size lower bounds for kernel methods.

    Submitted 18 August, 2025; originally announced August 2025.

    Comments: First appeared on author's homepage in August 2021 https://www.cs.columbia.edu/~djhsu/papers/dimension-argument.pdf

  30. Open Scene Graphs for Open-World Object-Goal Navigation

    Authors: Joel Loo, Zhanxin Wu, David Hsu

    Abstract: How can we build general-purpose robot systems for open-world semantic navigation, e.g., searching a novel environment for a target object specified in natural language? To tackle this challenge, we introduce OSG Navigator, a modular system composed of foundation models, for open-world Object-Goal Navigation (ObjectNav). Foundation models provide enormous semantic knowledge about the world, but st… ▽ More

    Submitted 6 August, 2025; originally announced August 2025.

    Comments: In IJRR Special Issue: Foundation Models and Neuro-symbolic AI for Robotics. Journal extension to arXiv:2407.02473

    Report number: OSG-02

  31. arXiv:2508.02093  [pdf, ps, other

    cs.AI

    "Stack It Up!": 3D Stable Structure Generation from 2D Hand-drawn Sketch

    Authors: Yiqing Xu, Linfeng Li, Cunjun Yu, David Hsu

    Abstract: Imagine a child sketching the Eiffel Tower and asking a robot to bring it to life. Today's robot manipulation systems can't act on such sketches directly-they require precise 3D block poses as goals, which in turn demand structural analysis and expert tools like CAD. We present StackItUp, a system that enables non-experts to specify complex 3D structures using only 2D front-view hand-drawn sketche… ▽ More

    Submitted 4 August, 2025; originally announced August 2025.

    Comments: Accepted to CoRL 2025

  32. arXiv:2508.02068  [pdf, ps, other

    cs.RO

    "Set It Up": Functional Object Arrangement with Compositional Generative Models (Journal Version)

    Authors: Yiqing Xu, Jiayuan Mao, Linfeng Li, Yilun Du, Tomas Lozáno-Pérez, Leslie Pack Kaelbling, David Hsu

    Abstract: Functional object arrangement (FORM) is the task of arranging objects to fulfill a function, e.g., "set up a dining table for two". One key challenge here is that the instructions for FORM are often under-specified and do not explicitly specify the desired object goal poses. This paper presents SetItUp, a neuro-symbolic framework that learns to specify the goal poses of objects from a few training… ▽ More

    Submitted 7 August, 2025; v1 submitted 4 August, 2025; originally announced August 2025.

    Comments: This is the journal version accepted to the International Journal of Robotics Research (IJRR). It extends our prior work presented at Robotics: Science and Systems (RSS) 2024, with a new compositional program induction pipeline from natural language, and expanded evaluations on personalized bookshelf and bedroom furniture layout tasks

  33. arXiv:2507.19983  [pdf, ps, other

    cs.RO cs.AI

    CLASP: General-Purpose Clothes Manipulation with Semantic Keypoints

    Authors: Yuhong Deng, Chao Tang, Cunjun Yu, Linfeng Li, David Hsu

    Abstract: Clothes manipulation, such as folding or hanging, is a critical capability for home service robots. Despite recent advances, most existing methods remain limited to specific clothes types and tasks, due to the complex, high-dimensional geometry of clothes. This paper presents CLothes mAnipulation with Semantic keyPoints (CLASP), which aims at general-purpose clothes manipulation over diverse cloth… ▽ More

    Submitted 17 October, 2025; v1 submitted 26 July, 2025; originally announced July 2025.

  34. Progressive Sentences: Combining the Benefits of Word and Sentence Learning

    Authors: Nuwan Janaka, Shengdong Zhao, Ashwin Ram, Ruoxin Sun, Sherisse Tan Jing Wen, Danae Li, David Hsu

    Abstract: The rapid evolution of lightweight consumer augmented reality (AR) smart glasses (a.k.a. optical see-through head-mounted displays) offers novel opportunities for learning, particularly through their unique capability to deliver multimodal information in just-in-time, micro-learning scenarios. This research investigates how such devices can support mobile second-language acquisition by presenting… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

    Comments: 12 pages, 4 figures, 4 tables

  35. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3410 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 19 December, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  36. arXiv:2507.03254  [pdf, ps, other

    cs.AI

    CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs

    Authors: Bruce Yang, Xinfeng He, Huan Gao, Yifan Cao, Xiaofan Li, David Hsu

    Abstract: Effective prompt design is essential for improving the planning capabilities of large language model (LLM)-driven agents. However, existing structured prompting strategies are typically limited to single-agent, plan-only settings, and often evaluate performance solely based on task accuracy - overlooking critical factors such as token efficiency, modularity, and scalability in multi-agent environm… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  37. arXiv:2506.02556  [pdf, ps, other

    cs.RO

    Sign Language: Towards Sign Understanding for Robot Autonomy

    Authors: Ayush Agrawal, Joel Loo, Nicky Zimmerman, David Hsu

    Abstract: Navigational signs are common aids for human wayfinding and scene understanding, but are underutilized by robots. We argue that they benefit robot navigation and scene understanding, by directly encoding privileged information on actions, spatial regions, and relations. Interpreting signs in open-world settings remains a challenge owing to the complexity of scenes and signs, but recent advances in… ▽ More

    Submitted 16 September, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  38. arXiv:2505.23683  [pdf, ps, other

    cs.LG

    Learning Compositional Functions with Transformers from Easy-to-Hard Data

    Authors: Zixuan Wang, Eshaan Nichani, Alberto Bietti, Alex Damian, Daniel Hsu, Jason D. Lee, Denny Wu

    Abstract: Transformer-based language models have demonstrated impressive capabilities across a range of complex reasoning tasks. Prior theoretical work exploring the expressive power of transformers has shown that they can efficiently perform multi-step reasoning tasks involving parallelizable computations. However, the learnability of such constructions, particularly the conditions on the data distribution… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: COLT 2025

  39. arXiv:2505.20424  [pdf, ps, other

    cs.RO cs.AI eess.SY

    Robot Operation of Home Appliances by Reading User Manuals

    Authors: Jian Zhang, Hanbo Zhang, Anxing Xiao, David Hsu

    Abstract: Operating home appliances, among the most common tools in every household, is a critical capability for assistive home robots. This paper presents ApBot, a robot system that operates novel household appliances by "reading" their user manuals. ApBot faces multiple challenges: (i) infer goal-conditioned partial policies from their unstructured, textual descriptions in a user manual document, (ii) gr… ▽ More

    Submitted 23 July, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  40. arXiv:2504.10700  [pdf, other

    cs.DC cs.AI

    Optimizing Data Distribution and Kernel Performance for Efficient Training of Chemistry Foundation Models: A Case Study with MACE

    Authors: Jesun Firoz, Franco Pellegrini, Mario Geiger, Darren Hsu, Jenna A. Bilbrey, Han-Yi Chou, Maximilian Stadler, Markus Hoehnerbach, Tingyu Wang, Dejun Lin, Emine Kucukbenli, Henry W. Sprueill, Ilyes Batatia, Sotiris S. Xantheas, MalSoon Lee, Chris Mundy, Gabor Csanyi, Justin S. Smith, Ponnuswamy Sadayappan, Sutanay Choudhury

    Abstract: Chemistry Foundation Models (CFMs) that leverage Graph Neural Networks (GNNs) operating on 3D molecular graph structures are becoming indispensable tools for computational chemists and materials scientists. These models facilitate the understanding of matter and the discovery of new molecules and materials. In contrast to GNNs operating on a large homogeneous graphs, GNNs used by CFMs process a la… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted at The 34th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2025)

  41. arXiv:2504.05426  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Survey on Algorithms for multi-index models

    Authors: Joan Bruna, Daniel Hsu

    Abstract: We review the literature on algorithms for estimating the index space in a multi-index model. The primary focus is on computationally efficient (polynomial-time) algorithms in Gaussian space, the assumptions under which consistency is guaranteed by these methods, and their sample complexity. In many cases, a gap is observed between the sample complexity of the best known computationally efficient… ▽ More

    Submitted 13 June, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

  42. arXiv:2503.01868  [pdf, other

    cs.LG cs.AI cs.CL cs.DC

    Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale

    Authors: Jerome Ku, Eric Nguyen, David W. Romero, Garyk Brixi, Brandon Yang, Anton Vorontsov, Ali Taghibakhshi, Amy X. Lu, Dave P. Burke, Greg Brockman, Stefano Massaroli, Christopher Ré, Patrick D. Hsu, Brian L. Hie, Stefano Ermon, Michael Poli

    Abstract: We introduce convolutional multi-hybrid architectures, with a design grounded on two simple observations. First, operators in hybrid models can be tailored to token manipulation tasks such as in-context recall, multi-token recall, and compression, with input-dependent convolutions and attention offering complementary performance. Second, co-designing convolution operators and hardware-aware algori… ▽ More

    Submitted 25 February, 2025; originally announced March 2025.

  43. arXiv:2502.11744  [pdf, other

    cs.RO cs.CV

    FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation

    Authors: Chao Tang, Anxing Xiao, Yuhong Deng, Tianrun Hu, Wenlong Dong, Hanbo Zhang, David Hsu, Hong Zhang

    Abstract: Learning tool use from a single human demonstration video offers a highly intuitive and efficient approach to robot teaching. While humans can effortlessly generalize a demonstrated tool manipulation skill to diverse tools that support the same function (e.g., pouring with a mug versus a teapot), current one-shot imitation learning (OSIL) methods struggle to achieve this. A key challenge lies in e… ▽ More

    Submitted 21 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  44. arXiv:2502.00314  [pdf, other

    eess.IV cs.CV

    A Study on the Performance of U-Net Modifications in Retroperitoneal Tumor Segmentation

    Authors: Moein Heidari, Ehsan Khodapanah Aghdam, Alexander Manzella, Daniel Hsu, Rebecca Scalabrino, Wenjin Chen, David J. Foran, Ilker Hacihaliloglu

    Abstract: The retroperitoneum hosts a variety of tumors, including rare benign and malignant types, which pose diagnostic and treatment challenges due to their infrequency and proximity to vital structures. Estimating tumor volume is difficult due to their irregular shapes, and manual segmentation is time-consuming. Automatic segmentation using U-Net and its variants, incorporating Vision Transformer (ViT)… ▽ More

    Submitted 31 January, 2025; originally announced February 2025.

    Comments: Accepted for presentation at the 2025 SPIE Medical Imaging Conference

  45. AiGet: Transforming Everyday Moments into Hidden Knowledge Discovery with AI Assistance on Smart Glasses

    Authors: Runze Cai, Nuwan Janaka, Hyeongcheol Kim, Yang Chen, Shengdong Zhao, Yun Huang, David Hsu

    Abstract: Unlike the free exploration of childhood, the demands of daily life reduce our motivation to explore our surroundings, leading to missed opportunities for informal learning. Traditional tools for knowledge acquisition are reactive, relying on user initiative and limiting their ability to uncover hidden interests. Through formative studies, we introduce AiGet, a proactive AI assistant integrated wi… ▽ More

    Submitted 24 February, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

    Comments: CHI Conference on Human Factors in Computing Systems (CHI '25), April 26-May 01, 2025, Yokohama, Japan

    ACM Class: I.2.10; H.5.1; H.5.2

  46. arXiv:2411.10548  [pdf, ps, other

    cs.LG q-bio.BM

    BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

    Authors: Peter St. John, Dejun Lin, Polina Binder, Malcolm Greaves, Vega Shah, John St. John, Adrian Lange, Patrick Hsu, Rajesh Illango, Arvind Ramanathan, Anima Anandkumar, David H Brookes, Akosua Busia, Abhishaike Mahajan, Stephen Malina, Neha Prasad, Sam Sinai, Lindsay Edwards, Thomas Gaudelet, Cristian Regep, Martin Steinegger, Burkhard Rost, Alexander Brace, Kyle Hippe, Luca Naef , et al. (68 additional authors not shown)

    Abstract: Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational bio… ▽ More

    Submitted 8 September, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

  47. arXiv:2411.08798  [pdf, other

    cs.LG cs.NE math.AG

    Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence

    Authors: Berfin Şimşek, Amire Bendjeddou, Daniel Hsu

    Abstract: This work focuses on the gradient flow dynamics of a neural network model that uses correlation loss to approximate a multi-index function on high-dimensional standard Gaussian data. Specifically, the multi-index function we consider is a sum of neurons $f^*(x) \!=\! \sum_{j=1}^k \! σ^*(v_j^T x)$ where $v_1, \dots, v_k$ are unit vectors, and $σ^*$ lacks the first and second Hermite polynomials in… ▽ More

    Submitted 10 March, 2025; v1 submitted 13 November, 2024; originally announced November 2024.

    Comments: 22 pages, to be presented at AISTATS 2025

  48. arXiv:2410.19471  [pdf, other

    cs.LG cs.AI

    Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization

    Authors: Ryan Park, Darren J. Hsu, C. Brian Roland, Maria Korshunova, Chen Tessler, Shie Mannor, Olivia Viessmann, Bruno Trentini

    Abstract: Inverse folding models play an important role in structure-based design by predicting amino acid sequences that fold into desired reference structures. Models like ProteinMPNN, a message-passing encoder-decoder model, are trained to reliably produce new sequences from a reference structure. However, when applied to peptides, these models are prone to generating repetitive sequences that do not fol… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

    Comments: Preprint. 10 pages plus appendices

  49. arXiv:2409.20548  [pdf, other

    cs.RO cs.AI cs.HC

    Robi Butler: Multimodal Remote Interaction with a Household Robot Assistant

    Authors: Anxing Xiao, Nuwan Janaka, Tianrun Hu, Anshul Gupta, Kaixin Li, Cunjun Yu, David Hsu

    Abstract: Imagine a future when we can Zoom-call a robot to manage household chores remotely. This work takes one step in this direction. Robi Butler is a new household robot assistant that enables seamless multimodal remote interaction. It allows the human user to monitor its environment from a first-person view, issue voice or text commands, and specify target objects through hand-pointing gestures. At it… ▽ More

    Submitted 10 March, 2025; v1 submitted 30 September, 2024; originally announced September 2024.

    Comments: Accepted to ICRA 2025

  50. arXiv:2409.17725  [pdf, ps, other

    cs.RO

    Differentiable Contact Dynamics for Stable Object Placement Under Geometric Uncertainties

    Authors: Linfeng Li, Gang Yang, Lin Shao, David Hsu

    Abstract: From serving a cup of coffee to positioning mechanical parts during assembly, stable object placement is a crucial skill for future robots. It becomes particularly challenging under geometric uncertainties, e.g., when the object pose or shape is not known accurately. This work leverages a differentiable simulation model of contact dynamics to tackle this challenge. We derive a novel gradient that… ▽ More

    Submitted 30 November, 2025; v1 submitted 26 September, 2024; originally announced September 2024.