Skip to main content

Showing 1–50 of 77 results for author: Shu, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.10046  [pdf, ps, other

    cs.AI

    SimWorld-Robotics: Synthesizing Photorealistic and Dynamic Urban Environments for Multimodal Robot Navigation and Collaboration

    Authors: Yan Zhuang, Jiawei Ren, Xiaokang Ye, Jianzhi Shen, Ruixuan Zhang, Tianai Yue, Muhammad Faayez, Xuhong He, Ziqiao Ma, Lianhui Qin, Zhiting Hu, Tianmin Shu

    Abstract: Recent advances in foundation models have shown promising results in developing generalist robotics that can perform diverse tasks in open-ended scenarios given multimodal inputs. However, current work has been mainly focused on indoor, household scenarios. In this work, we present SimWorld-Robotics~(SWR), a simulation platform for embodied AI in large-scale, photorealistic urban environments. Bui… ▽ More

    Submitted 10 December, 2025; originally announced December 2025.

    Comments: Conference: NeurIPS 2025 (main)

  2. arXiv:2512.01078  [pdf, ps, other

    cs.AI

    SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds

    Authors: Jiawei Ren, Yan Zhuang, Xiaokang Ye, Lingjun Mao, Xuhong He, Jianzhi Shen, Mrinaal Dogra, Yiming Liang, Ruixuan Zhang, Tianai Yue, Yiqing Yang, Eric Liu, Ryan Wu, Kevin Benavente, Rajiv Mandya Nagaraju, Muhammad Faayez, Xiyan Zhang, Dhruv Vivek Sharma, Xianrui Zhong, Ziqiao Ma, Tianmin Shu, Zhiting Hu, Lianhui Qin

    Abstract: While LLM/VLM-powered AI agents have advanced rapidly in math, coding, and computer use, their applications in complex physical and social environments remain challenging. Building agents that can survive and thrive in the real world (for example, by autonomously earning income or running a business) requires massive-scale interaction, reasoning, training, and evaluation across diverse embodied sc… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

  3. arXiv:2510.18135  [pdf, ps, other

    cs.CV

    World-in-World: World Models in a Closed-Loop World

    Authors: Jiahan Zhang, Muqing Jiang, Nanru Dai, Taiming Lu, Arda Uzunoglu, Shunchi Zhang, Yana Wei, Jiahao Wang, Vishal M. Patel, Paul Pu Liang, Daniel Khashabi, Cheng Peng, Rama Chellappa, Tianmin Shu, Alan Yuille, Yilun Du, Jieneng Chen

    Abstract: Generative world models (WMs) can now simulate worlds with striking visual realism, which naturally raises the question of whether they can endow embodied agents with predictive perception for decision making. Progress on this question has been limited by fragmented evaluation: most existing benchmarks adopt open-loop protocols that emphasize visual quality in isolation, leaving the core issue of… ▽ More

    Submitted 20 October, 2025; originally announced October 2025.

    Comments: Code is at https://github.com/World-In-World/world-in-world

  4. arXiv:2509.25208  [pdf, ps, other

    cs.LG physics.ao-ph

    DPSformer: A long-tail-aware model for improving heavy rainfall prediction

    Authors: Zenghui Huang, Ting Shu, Zhonglei Wang, Yang Lu, Yan Yan, Wei Zhong, Hanzi Wang

    Abstract: Accurate and timely forecasting of heavy rainfall remains a critical challenge for modern society. Precipitation exhibits a highly imbalanced distribution: most observations record no or light rain, while heavy rainfall events are rare. Such an imbalanced distribution obstructs deep learning models from effectively predicting heavy rainfall events. To address this challenge, we treat rainfall fore… ▽ More

    Submitted 20 September, 2025; originally announced September 2025.

  5. arXiv:2509.25137  [pdf, ps, other

    cs.AI cs.CL cs.LG

    The Era of Real-World Human Interaction: RL from User Conversations

    Authors: Chuanyang Jin, Jing Xu, Bo Liu, Leitian Tao, Olga Golovneva, Tianmin Shu, Wenting Zhao, Xian Li, Jason Weston

    Abstract: We posit that to achieve continual model improvement and multifaceted alignment, future models must learn from natural human interaction. Current conversational models are aligned using pre-annotated, expert-generated human feedback. In this work, we introduce Reinforcement Learning from Human Interaction (RLHI), a paradigm that learns directly from in-the-wild user conversations. We develop two c… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  6. arXiv:2509.05091  [pdf, ps, other

    cs.AI cs.MA

    ProToM: Promoting Prosocial Behaviour via Theory of Mind-Informed Feedback

    Authors: Matteo Bortoletto, Yichao Zhou, Lance Ying, Tianmin Shu, Andreas Bulling

    Abstract: While humans are inherently social creatures, the challenge of identifying when and how to assist and collaborate with others - particularly when pursuing independent goals - can hinder cooperation. To address this challenge, we aim to develop an AI system that provides useful feedback to promote prosocial behaviour - actions that benefit others, even when not directly aligned with one's own goals… ▽ More

    Submitted 5 September, 2025; originally announced September 2025.

    Comments: Website at https://www.matteobortoletto.org/protom/

  7. arXiv:2507.22933  [pdf, ps, other

    cs.CL cs.AI

    Augmented Vision-Language Models: A Systematic Review

    Authors: Anthony C Davis, Burhan Sadiq, Tianmin Shu, Chien-Ming Huang

    Abstract: Recent advances in visual-language machine learning models have demonstrated exceptional ability to use natural language and understand visual scenes by training on large, unstructured datasets. However, this training paradigm cannot produce interpretable explanations for its outputs, requires retraining to integrate new information, is highly resource-intensive, and struggles with certain forms o… ▽ More

    Submitted 24 July, 2025; originally announced July 2025.

  8. arXiv:2506.21876  [pdf, ps, other

    cs.CL cs.AI cs.CV

    Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation

    Authors: Qiyue Gao, Xinyu Pi, Kevin Liu, Junrong Chen, Ruolan Yang, Xinqi Huang, Xinyu Fang, Lu Sun, Gautham Kishore, Bo Ai, Stone Tao, Mengyang Liu, Jiaxi Yang, Chao-Jung Lai, Chuanyang Jin, Jiannan Xiang, Benhao Huang, Zeming Chen, David Danks, Hao Su, Tianmin Shu, Ziqiao Ma, Lianhui Qin, Zhiting Hu

    Abstract: Internal world models (WMs) enable agents to understand the world's state and predict transitions, serving as the basis for advanced deliberative reasoning. Recent large Vision-Language Models (VLMs), such as OpenAI o3, GPT-4o and Gemini, exhibit potential as general-purpose WMs. While the latest studies have evaluated and shown limitations in specific capabilities such as visual understanding, a… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: ACL 2025 (Findings)

  9. arXiv:2505.21652  [pdf, ps, other

    cs.RO cs.AI

    PartInstruct: Part-level Instruction Following for Fine-grained Robot Manipulation

    Authors: Yifan Yin, Zhengtao Han, Shivam Aarya, Jianxin Wang, Shuhang Xu, Jiawei Peng, Angtian Wang, Alan Yuille, Tianmin Shu

    Abstract: Fine-grained robot manipulation, such as lifting and rotating a bottle to display the label on the cap, requires robust reasoning about object parts and their relationships with intended tasks. Despite recent advances in training general-purpose robot manipulation policies guided by language instructions, there is a notable lack of large-scale datasets for fine-grained manipulation tasks with part… ▽ More

    Submitted 16 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  10. arXiv:2505.15626  [pdf, ps, other

    cs.LG stat.ML

    Direct Preference Optimization for Adaptive Concept-based Explanations

    Authors: Jacopo Teneggi, Zhenzhen Wang, Paul H. Yi, Tianmin Shu, Jeremias Sulam

    Abstract: Concept-based explanation methods aim at making machine learning models more transparent by finding the most important semantic features of an input (e.g., colors, patterns, shapes) for a given prediction task. However, these methods generally ignore the communicative context of explanations, such as the preferences of a listener. For example, medical doctors understand explanations in terms of cl… ▽ More

    Submitted 1 October, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  11. arXiv:2505.03798  [pdf, other

    cs.LG cs.AI

    Position: Foundation Models Need Digital Twin Representations

    Authors: Yiqing Shen, Hao Ding, Lalithkumar Seenivasan, Tianmin Shu, Mathias Unberath

    Abstract: Current foundation models (FMs) rely on token representations that directly fragment continuous real-world multimodal data into discrete tokens. They limit FMs to learning real-world knowledge and relationships purely through statistical correlation rather than leveraging explicit domain knowledge. Consequently, current FMs struggle with maintaining semantic coherence across modalities, capturing… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  12. arXiv:2504.10445  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.LG

    RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users

    Authors: Suyu Ye, Haojun Shi, Darren Shih, Hyokun Yun, Tanya Roosta, Tianmin Shu

    Abstract: To achieve successful assistance with long-horizon web-based tasks, AI agents must be able to sequentially follow real-world user instructions over a long period. Unlike existing web-based agent benchmarks, sequential instruction following in the real world poses significant challenges beyond performing a single, clearly defined task. For instance, real-world human instructions can be ambiguous, r… ▽ More

    Submitted 1 December, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: Project Website: https://scai.cs.jhu.edu/projects/RealWebAssist/ Code: https://github.com/SCAI-JHU/RealWebAssist

  13. arXiv:2504.03510  [pdf, other

    cs.CV

    FADConv: A Frequency-Aware Dynamic Convolution for Farmland Non-agriculturalization Identification and Segmentation

    Authors: Tan Shu, Li Shen

    Abstract: Cropland non-agriculturalization refers to the conversion of arable land into non-agricultural uses such as forests, residential areas, and construction sites. This phenomenon not only directly leads to the loss of cropland resources but also poses systemic threats to food security and agricultural sustainability. Accurate identification of cropland and non-cropland areas is crucial for detecting… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  14. arXiv:2502.20502  [pdf, other

    cs.AI

    On Benchmarking Human-Like Intelligence in Machines

    Authors: Lance Ying, Katherine M. Collins, Lionel Wong, Ilia Sucholutsky, Ryan Liu, Adrian Weller, Tianmin Shu, Thomas L. Griffiths, Joshua B. Tenenbaum

    Abstract: Recent benchmark studies have claimed that AI has approached or even surpassed human-level performances on various cognitive tasks. However, this position paper argues that current AI evaluation paradigms are insufficient for assessing human-like cognitive capabilities. We identify a set of key shortcomings: a lack of human-validated labels, inadequate representation of human response variability… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 18 pages, 5 figures

  15. arXiv:2502.15676  [pdf, ps, other

    cs.AI cs.CL

    AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling

    Authors: Zhining Zhang, Chuanyang Jin, Mung Yao Jia, Shunchi Zhang, Tianmin Shu

    Abstract: Theory of Mind (ToM), the ability to understand people's minds based on their behavior, is key to developing socially intelligent agents. Current approaches to ToM reasoning either rely on prompting Large Language Models (LLMs), which are prone to systematic errors, or use handcrafted, rigid agent models for model-based inference, which are more robust but fail to generalize across domains. In thi… ▽ More

    Submitted 29 June, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

    Comments: 39 pages, 10 figures, 13 tables. Website at https://chuanyangjin.com/AutoToM/

  16. arXiv:2412.09624  [pdf, other

    cs.CV cs.RO

    GenEx: Generating an Explorable World

    Authors: Taiming Lu, Tianmin Shu, Junfei Xiao, Luoxin Ye, Jiahao Wang, Cheng Peng, Chen Wei, Daniel Khashabi, Rama Chellappa, Alan Yuille, Jieneng Chen

    Abstract: Understanding, navigating, and exploring the 3D physical real world has long been a central challenge in the development of artificial intelligence. In this work, we take a step toward this goal by introducing GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination that forms priors (expectations) about the surrounding environments. GenEx genera… ▽ More

    Submitted 20 January, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: Website: GenEx.world

  17. arXiv:2411.11844  [pdf, ps, other

    cs.CV cs.RO

    Generative World Explorer

    Authors: Taiming Lu, Tianmin Shu, Alan Yuille, Daniel Khashabi, Jieneng Chen

    Abstract: Planning with partial observation is a central challenge in embodied AI. A majority of prior works have tackled this challenge by developing agents that physically explore their environment to update their beliefs about the world state. In contrast, humans can $\textit{imagine}$ unseen parts of the world through a mental exploration and $\textit{revise}$ their beliefs with imagined observations. S… ▽ More

    Submitted 8 September, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

    Comments: Website: generative-world-explorer.github.io

  18. arXiv:2411.04987  [pdf, other

    cs.AI cs.LG cs.RO

    Few-Shot Task Learning through Inverse Generative Modeling

    Authors: Aviv Netanyahu, Yilun Du, Antonia Bronars, Jyothish Pari, Joshua Tenenbaum, Tianmin Shu, Pulkit Agrawal

    Abstract: Learning the intents of an agent, defined by its goals or motion style, is often extremely challenging from just a few examples. We refer to this problem as task concept learning and present our approach, Few-Shot Task Learning through Inverse Generative Modeling (FTL-IGM), which learns new task concepts by leveraging invertible neural generative models. The core idea is to pretrain a generative m… ▽ More

    Submitted 13 January, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

    Comments: Added acknowledgment

  19. arXiv:2411.01796  [pdf, ps, other

    cs.AI cs.HC cs.RO

    Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge

    Authors: Weihua Du, Qiushi Lyu, Jiaming Shan, Zhenting Qi, Hongxin Zhang, Sunli Chen, Andi Peng, Tianmin Shu, Kwonjoon Lee, Behzad Dariush, Chuang Gan

    Abstract: We introduce Constrained Human-AI Cooperation (CHAIC), an inclusive embodied social intelligence challenge designed to test social perception and cooperation in embodied agents. In CHAIC, the goal is for an embodied agent equipped with egocentric observations to assist a human who may be operating under physical constraints -- e.g., unable to reach high places or confined to a wheelchair -- in per… ▽ More

    Submitted 11 June, 2025; v1 submitted 3 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024 Dataset and Benchmark Track. The first two authors contributed equally. Project Website at https://umass-embodied-agi.github.io/CHAIC/

  20. arXiv:2410.10260  [pdf, other

    cs.CV

    Slide-based Graph Collaborative Training for Histopathology Whole Slide Image Analysis

    Authors: Jun Shi, Tong Shu, Zhiguo Jiang, Wei Wang, Haibo Wu, Yushan Zheng

    Abstract: The development of computational pathology lies in the consensus that pathological characteristics of tumors are significant guidance for cancer diagnostics. Most existing research focuses on the inner-contextual information within each WSI yet ignores the possible inter-correlations between slides. As the development of tumors is a continuous process involving a series of histological, morphologi… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  21. arXiv:2409.10849  [pdf, ps, other

    cs.RO cs.AI cs.HC cs.MA

    Pragmatic Embodied Spoken Instruction Following in Human-Robot Collaboration with Theory of Mind

    Authors: Lance Ying, Xinyi Li, Shivam Aarya, Yizirui Fang, Yifan Yin, Jason Xinyu Liu, Stefanie Tellex, Joshua B. Tenenbaum, Tianmin Shu

    Abstract: Spoken language instructions are ubiquitous in agent collaboration. However, in real-world human-robot collaboration, following human spoken instructions can be challenging due to various speaker and environmental factors, such as background noise or mispronunciation. When faced with noisy auditory inputs, humans can leverage the collaborative context in the embodied environment to interpret noisy… ▽ More

    Submitted 6 October, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: 8 pages, 7 figures

  22. arXiv:2408.12574  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

    Authors: Haojun Shi, Suyu Ye, Xinyu Fang, Chuanyang Jin, Leyla Isik, Yen-Ling Kuo, Tianmin Shu

    Abstract: Understanding people's social interactions in complex real-world scenarios often relies on intricate mental reasoning. To truly understand how and why people interact with one another, we must infer the underlying mental states that give rise to the social interactions, i.e., Theory of Mind reasoning in multi-agent interactions. Additionally, social interactions are often multi-modal -- we can wat… ▽ More

    Submitted 23 January, 2025; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: AAAI-25 (Oral). Project website: https://scai.cs.jhu.edu/projects/MuMA-ToM/ Code: https://github.com/SCAI-JHU/MuMA-ToM

  23. arXiv:2407.08968  [pdf, other

    cs.CV

    SlideGCD: Slide-based Graph Collaborative Training with Knowledge Distillation for Whole Slide Image Classification

    Authors: Tong Shu, Jun Shi, Dongdong Sun, Zhiguo Jiang, Yushan Zheng

    Abstract: Existing WSI analysis methods lie on the consensus that histopathological characteristics of tumors are significant guidance for cancer diagnostics. Particularly, as the evolution of cancers is a continuous process, the correlations and differences across various stages, anatomical locations and patients should be taken into account. However, recent research mainly focuses on the inner-contextual… ▽ More

    Submitted 19 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted for MICCAI 2024

  24. arXiv:2406.18924  [pdf, other

    cs.AI cs.LG cs.RO

    Learning Pareto Set for Multi-Objective Continuous Robot Control

    Authors: Tianye Shu, Ke Shang, Cheng Gong, Yang Nan, Hisao Ishibuchi

    Abstract: For a control problem with multiple conflicting objectives, there exists a set of Pareto-optimal policies called the Pareto set instead of a single optimal policy. When a multi-objective control problem is continuous and complex, traditional multi-objective reinforcement learning (MORL) algorithms search for many Pareto-optimal deep policies to approximate the Pareto set, which is quite resource-c… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  25. arXiv:2405.14769  [pdf, other

    cs.LG cs.CL

    Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input

    Authors: Andi Peng, Yuying Sun, Tianmin Shu, David Abel

    Abstract: Humans use social context to specify preferences over behaviors, i.e. their reward functions. Yet, algorithms for inferring reward models from preference data do not take this social learning view into account. Inspired by pragmatic human communication, we study how to extract fine-grained data regarding why an example is preferred that is useful for learning more accurate reward models. We propos… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  26. arXiv:2404.10775  [pdf, other

    cs.CV cs.AI cs.MA

    COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

    Authors: Hongxin Zhang, Zeyuan Wang, Qiushi Lyu, Zheyuan Zhang, Sunli Chen, Tianmin Shu, Behzad Dariush, Kwonjoon Lee, Yilun Du, Chuang Gan

    Abstract: In this paper, we investigate the problem of embodied multi-agent cooperation, where decentralized agents must cooperate given only egocentric views of the world. To effectively plan in this setting, in contrast to learning world dynamics in a single-agent scenario, we must simulate world dynamics conditioned on an arbitrary number of agents' actions given only partial egocentric visual observatio… ▽ More

    Submitted 15 April, 2025; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: Published at ICLR 2025. 24 pages. The first three authors contributed equally

  27. arXiv:2404.00282  [pdf, other

    cs.LG cs.AI cs.CL cs.RO

    Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

    Authors: Yuji Cao, Huan Zhao, Yuheng Cheng, Ting Shu, Yue Chen, Guolong Liu, Gaoqi Liang, Junhua Zhao, Jinyue Yan, Yun Li

    Abstract: With extensive pre-trained knowledge and high-level general capabilities, large language models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in aspects such as multi-task learning, sample efficiency, and high-level task planning. In this survey, we provide a comprehensive review of the existing literature in LLM-enhanced RL and summarize its characteristics compared t… ▽ More

    Submitted 29 October, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: 22 pages (including bibliography), 6 figures

  28. arXiv:2403.11075  [pdf, other

    cs.HC cs.AI cs.MA

    GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment

    Authors: Lance Ying, Kunal Jha, Shivam Aarya, Joshua B. Tenenbaum, Antonio Torralba, Tianmin Shu

    Abstract: Verbal communication plays a crucial role in human cooperation, particularly when the partners only have incomplete information about the task, environment, and each other's mental state. In this paper, we propose a novel cooperative communication framework, Goal-Oriented Mental Alignment (GOMA). GOMA formulates verbal communication as a planning problem that minimizes the misalignment between the… ▽ More

    Submitted 14 January, 2025; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures

  29. arXiv:2401.08743  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    MMToM-QA: Multimodal Theory of Mind Question Answering

    Authors: Chuanyang Jin, Yutong Wu, Jing Cao, Jiannan Xiang, Yen-Ling Kuo, Zhiting Hu, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum, Tianmin Shu

    Abstract: Theory of Mind (ToM), the ability to understand people's mental states, is an essential ingredient for developing machines with human-level social intelligence. Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding. However, existing ToM benchmarks use unimodal datasets - either video or text. Human ToM, on the other hand, is more than v… ▽ More

    Submitted 15 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: ACL 2024. 26 pages, 11 figures, 7 tables

  30. arXiv:2401.01493  [pdf, other

    cs.LG cs.AI cs.CR

    Free Lunch for Federated Remote Sensing Target Fine-Grained Classification: A Parameter-Efficient Framework

    Authors: Shengchao Chen, Ting Shu, Huan Zhao, Jiahao Wang, Sufen Ren, Lina Yang

    Abstract: Remote Sensing Target Fine-grained Classification (TFGC) is of great significance in both military and civilian fields. Due to location differences, growth in data size, and centralized server storage constraints, these data are usually stored under different databases across regions/countries. However, privacy laws and national security concerns constrain researchers from accessing these sensitiv… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Under Review, 23 pages, 3 figures, 12 tables

  31. arXiv:2312.05230  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.RO

    Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning

    Authors: Zhiting Hu, Tianmin Shu

    Abstract: Despite their tremendous success in many applications, large language models often fall short of consistent reasoning and planning in various (language, embodied, and social) scenarios, due to inherent limitations in their inference, learning, and modeling capabilities. In this position paper, we present a new perspective of machine reasoning, LAW, that connects the concepts of Language models, Ag… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: Position paper. Accompanying NeurIPS2023 Tutorial: https://sites.google.com/view/neurips2023law/home

  32. arXiv:2308.14242  [pdf

    cs.AI cs.CL

    The Cultural Psychology of Large Language Models: Is ChatGPT a Holistic or Analytic Thinker?

    Authors: Chuanyang Jin, Songyang Zhang, Tianmin Shu, Zhihan Cui

    Abstract: The prevalent use of Large Language Models (LLMs) has necessitated studying their mental models, yielding noteworthy theoretical and practical implications. Current research has demonstrated that state-of-the-art LLMs, such as ChatGPT, exhibit certain theory of mind capabilities and possess relatively stable Big Five and/or MBTI personality traits. In addition, cognitive process features form an e… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

  33. arXiv:2308.11071  [pdf, other

    cs.AI cs.LG cs.MA cs.RO

    Neural Amortized Inference for Nested Multi-agent Reasoning

    Authors: Kunal Jha, Tuan Anh Le, Chuanyang Jin, Yen-Ling Kuo, Joshua B. Tenenbaum, Tianmin Shu

    Abstract: Multi-agent interactions, such as communication, teaching, and bluffing, often rely on higher-order social inference, i.e., understanding how others infer oneself. Such intricate reasoning can be effectively modeled through nested multi-agent reasoning. Nonetheless, the computational complexity escalates exponentially with each level of reasoning, posing a significant challenge. However, humans ef… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: 8 pages, 10 figures

  34. arXiv:2308.02242  [pdf, ps, other

    cs.NI

    Countering Eavesdroppers with Meta-learning-based Cooperative Ambient Backscatter Communications

    Authors: Nam H. Chu, Nguyen Van Huynh, Diep N. Nguyen, Dinh Thai Hoang, Shimin Gong, Tao Shu, Eryk Dutkiewicz, Khoa T. Phan

    Abstract: This article introduces a novel lightweight framework using ambient backscattering communications to counter eavesdroppers. In particular, our framework divides an original message into two parts: (i) the active-transmit message transmitted by the transmitter using conventional RF signals and (ii) the backscatter message transmitted by an ambient backscatter tag that backscatters upon the active s… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

  35. arXiv:2307.06333  [pdf, other

    cs.LG cs.AI cs.HC cs.RO

    Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation

    Authors: Andi Peng, Aviv Netanyahu, Mark Ho, Tianmin Shu, Andreea Bobu, Julie Shah, Pulkit Agrawal

    Abstract: Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. However, designers don't know which concepts are irrelevant a priori, especially when different end users have different preferences a… ▽ More

    Submitted 13 July, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: International Conference on Machine Learning (ICML) 2023

  36. arXiv:2307.02485  [pdf, other

    cs.AI cs.CL cs.CV cs.RO

    Building Cooperative Embodied Agents Modularly with Large Language Models

    Authors: Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan

    Abstract: In this work, we address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. While previous research either presupposes a cost-free communication channel or relies on a centralized controller with shared observations, we harness the commonsense knowledge, re… ▽ More

    Submitted 17 February, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: ICLR24. The first two authors contributed equally

  37. arXiv:2305.10626  [pdf, other

    cs.CL cs.AI cs.LG

    Language Models Meet World Models: Embodied Experiences Enhance Language Models

    Authors: Jiannan Xiang, Tianhua Tao, Yi Gu, Tianmin Shu, Zirui Wang, Zichao Yang, Zhiting Hu

    Abstract: While large language models (LMs) have shown remarkable capabilities across numerous tasks, they often struggle with simple reasoning and planning in physical environments, such as understanding object permanence or planning household activities. The limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills. In this paper, we propose… ▽ More

    Submitted 28 October, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  38. MASK-CNN-Transformer For Real-Time Multi-Label Weather Recognition

    Authors: Shengchao Chen, Ting Shu, Huan Zhao, Yuan Yan Tang

    Abstract: Weather recognition is an essential support for many practical life applications, including traffic safety, environment, and meteorology. However, many existing related works cannot comprehensively describe weather conditions due to their complex co-occurrence dependencies. This paper proposes a novel multi-label weather recognition model considering these dependencies. The proposed model called M… ▽ More

    Submitted 19 August, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: Have been accepted. Appears in Knowledge-Based Systems, see https://www.sciencedirect.com/science/article/pii/S0950705123006317

    Journal ref: Knowledge-Based Systems, 110881 (2023)

  39. TempEE: Temporal-Spatial Parallel Transformer for Radar Echo Extrapolation Beyond Auto-Regression

    Authors: Shengchao Chen, Ting Shu, Huan Zhao, Guo Zhong, Xunlai Chen

    Abstract: Meteorological radar reflectivity data (i.e. radar echo) significantly influences precipitation prediction. It can facilitate accurate and expeditious forecasting of short-term heavy rainfall bypassing the need for complex Numerical Weather Prediction (NWP) models. In comparison to conventional models, Deep Learning (DL)-based radar echo extrapolation algorithms exhibit higher effectiveness and ef… ▽ More

    Submitted 14 September, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Have been accepted by IEEE Transactions on Geoscience and Remote Sensing, see https://ieeexplore.ieee.org/document/10238744

    Journal ref: IEEE Transactions on Geoscience and Remote Sensing 61, 5108914 (2023)

  40. arXiv:2303.10421  [pdf, other

    cs.CV cs.AI

    Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos

    Authors: Tao Shu, Xinke Wang, Ruotong Wang, Chuang Chen, Yixin Zhang, Xiao Sun

    Abstract: The continuous improvement of human-computer interaction technology makes it possible to compute emotions. In this paper, we introduce our submission to the CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW). Sentiment analysis in human-computer interaction should, as far as possible Start with multiple dimensions, fill in the single imperfect emotion channel, and finally dete… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Comments: 5 pages, 1 figures

  41. arXiv:2302.13445  [pdf, ps, other

    cs.NI cs.DC cs.LG

    Dynamic Resource Allocation for Metaverse Applications with Deep Reinforcement Learning

    Authors: Nam H. Chu, Diep N. Nguyen, Dinh Thai Hoang, Khoa T. Phan, Eryk Dutkiewicz, Dusit Niyato, Tao Shu

    Abstract: This work proposes a novel framework to dynamically and effectively manage and allocate different types of resources for Metaverse applications, which are forecasted to demand massive resources of various types that have never been seen before. Specifically, by studying functions of Metaverse applications, we first propose an effective solution to divide applications into groups, namely MetaInstan… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

    Comments: To be published in the Proceedings of the IEEE WCNC 2023

  42. arXiv:2301.05223  [pdf, other

    cs.RO cs.AI cs.LG cs.MA

    NOPA: Neurally-guided Online Probabilistic Assistance for Building Socially Intelligent Home Assistants

    Authors: Xavier Puig, Tianmin Shu, Joshua B. Tenenbaum, Antonio Torralba

    Abstract: In this work, we study how to build socially intelligent robots to assist people in their homes. In particular, we focus on assistance with online goal inference, where robots must simultaneously infer humans' goals and how to help them achieve those goals. Prior assistance methods either lack the adaptivity to adjust helping strategies (i.e., when and how to help) in response to uncertainty about… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

    Comments: Project website: https://www.tshu.io/online_watch_and_help. Code: https://github.com/xavierpuigf/online_watch_and_help

  43. arXiv:2212.02951  [pdf, other

    cs.AI

    State Space Closure: Revisiting Endless Online Level Generation via Reinforcement Learning

    Authors: Ziqi Wang, Tianye Shu, Jialin Liu

    Abstract: In this paper, we revisit endless online level generation with the recently proposed experience-driven procedural content generation via reinforcement learning (EDRL) framework. Inspired by an observation that EDRL tends to generate recurrent patterns, we formulate a notion of state space closure which makes any stochastic state appeared possibly in an infinite-horizon online generation process ca… ▽ More

    Submitted 24 March, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: Accepted by the IEEE Transactions on Games

    ACM Class: I.2.6

  44. arXiv:2211.15339  [pdf, other

    cs.LG cs.AI cs.RO

    Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning

    Authors: Aviv Netanyahu, Tianmin Shu, Joshua Tenenbaum, Pulkit Agrawal

    Abstract: In this work, we consider one-shot imitation learning for object rearrangement tasks, where an AI agent needs to watch a single expert demonstration and learn to perform the same task in different environments. To achieve a strong generalization, the AI agent must infer the spatial goal specification for the task. However, there can be multiple goal specifications that fit the given demonstration.… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: ICML 2022, the first two authors contributed equally, project page https://www.tshu.io/GEM

  45. arXiv:2210.03022  [pdf, other

    cs.AI cs.LG

    Stateful active facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning

    Authors: Dianbo Liu, Vedant Shah, Oussama Boussif, Cristian Meo, Anirudh Goyal, Tianmin Shu, Michael Mozer, Nicolas Heess, Yoshua Bengio

    Abstract: In cooperative multi-agent reinforcement learning, a team of agents works together to achieve a common goal. Different environments or tasks may require varying degrees of coordination among agents in order to achieve the goal in an optimal way. The nature of coordination will depend on the properties of the environment -- its spatial layout, distribution of obstacles, dynamics, etc. We term this… ▽ More

    Submitted 6 October, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: Published at ICLR 2023

  46. arXiv:2209.10255  [pdf, other

    cs.SE

    PTSG: a test generation tool based on extended finite state machine

    Authors: Zhijie Pan, Ting Shu, Zuohua Ding

    Abstract: The Extended Finite State Machine (EFSM) is one of the most popular modeling approaches for model-based testing. However, EFSM-based test case generation is susceptible to the infeasible (inexecutable) path problem, which stems from the conflict of predicates (guards) between transitions in the path. Therefore, in order to derive feasible test cases, a test generation algorithm needs to dynamicall… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

  47. arXiv:2209.03100  [pdf, other

    cs.NE

    Effects of Archive Size on Computation Time and Solution Quality for Multi-Objective Optimization

    Authors: Tianye Shu, Ke Shang, Hisao Ishibuchi, Yang Nan

    Abstract: An unbounded external archive has been used to store all nondominated solutions found by an evolutionary multi-objective optimization algorithm in some studies. It has been shown that a selected solution subset from the stored solutions is often better than the final population. However, the use of the unbounded archive is not always realistic. When the number of examined solutions is huge, we mus… ▽ More

    Submitted 2 November, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

  48. arXiv:2207.12327  [pdf, other

    cs.AI cs.LG

    Technical Report: Assisting Backdoor Federated Learning with Whole Population Knowledge Alignment

    Authors: Tian Liu, Xueyang Hu, Tao Shu

    Abstract: Due to the distributed nature of Federated Learning (FL), researchers have uncovered that FL is vulnerable to backdoor attacks, which aim at injecting a sub-task into the FL without corrupting the performance of the main task. Single-shot backdoor attack achieves high accuracy on both the main task and backdoor sub-task when injected at the FL model convergence. However, the early-injected single-… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  49. arXiv:2205.12548  [pdf, other

    cs.CL cs.LG

    RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning

    Authors: Mingkai Deng, Jianyu Wang, Cheng-Ping Hsieh, Yihan Wang, Han Guo, Tianmin Shu, Meng Song, Eric P. Xing, Zhiting Hu

    Abstract: Prompting has shown impressive success in enabling large pretrained language models (LMs) to perform diverse NLP tasks, especially when only few downstream data are available. Automatically finding the optimal prompt for each task, however, is challenging. Most existing work resorts to tuning soft prompt (e.g., embeddings) which falls short of interpretability, reusability across LMs, and applicab… ▽ More

    Submitted 22 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022 Camera Ready. Code available at https://github.com/mingkaid/rl-prompt

  50. arXiv:2205.11087  [pdf, ps, other

    cs.NI cs.DC

    MetaSlicing: A Novel Resource Allocation Framework for Metaverse

    Authors: Nam H. Chu, Dinh Thai Hoang, Diep N. Nguyen, Khoa T. Phan, Eryk Dutkiewicz, Dusit Niyato, Tao Shu

    Abstract: Creating and maintaining the Metaverse requires enormous resources that have never been seen before, especially computing resources for intensive data processing to support the Extended Reality, enormous storage resources, and massive networking resources for maintaining ultra high-speed and low-latency connections. Therefore, this work aims to propose a novel framework, namely MetaSlicing, that c… ▽ More

    Submitted 26 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: Revised figures, fix typos