Skip to main content

Showing 1–18 of 18 results for author: Shieh, M Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.04759  [pdf, ps, other

    cs.CR cs.AI cs.CL

    Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

    Authors: Zijun Wang, Haoqin Tu, Letian Zhang, Hardy Chen, Juncheng Wu, Xiangyan Liu, Zhenlong Yuan, Tianyu Pang, Michael Qizhe Shieh, Fengze Liu, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie

    Abstract: OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack surface that existing sandboxed evaluations fail to capture. To address this gap,… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  2. arXiv:2603.15432  [pdf, ps, other

    cs.CV

    Gym-V: A Unified Vision Environment System for Agentic Vision Research

    Authors: Fanqing Meng, Lingxiao Du, Jiawei Gu, Jiaqi Liao, Linjie Li, Zijian Wu, Xiangyan Liu, Ziqi Zhao, Mengkang Hu, Zichen Liu, Jiaheng Zhang, Michael Qizhe Shieh

    Abstract: As agentic systems increasingly rely on reinforcement learning from verifiable rewards, standardized ``gym'' infrastructure has become essential for rapid iteration, reproducibility, and fair comparison. Vision agents lack such infrastructure, limiting systematic study of what drives their learning and where current models fall short. We introduce \textbf{Gym-V}, a unified platform of 179 procedur… ▽ More

    Submitted 8 April, 2026; v1 submitted 16 March, 2026; originally announced March 2026.

  3. arXiv:2603.08068  [pdf, ps, other

    cs.AI

    In-Context Reinforcement Learning for Tool Use in Large Language Models

    Authors: Yaoqi Ye, Yiran Zhao, Keyu Duan, Zeyu Zheng, Kenji Kawaguchi, Cihang Xie, Michael Qizhe Shieh

    Abstract: While large language models (LLMs) exhibit strong reasoning abilities, their performance on complex tasks is often constrained by the limitations of their internal knowledge. A compelling approach to overcome this challenge is to augment these models with external tools -- such as Python interpreters for mathematical computations or search engines for retrieving factual information. However, enabl… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

  4. arXiv:2603.08059  [pdf, ps, other

    cs.CV cs.AI

    ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning

    Authors: Yiran Zhao, Yaoqi Ye, Xiang Liu, Michael Qizhe Shieh, Trung Bui

    Abstract: With the rapid advancement of commercial multi-modal models, image editing has garnered significant attention due to its widespread applicability in daily life. Despite impressive progress, existing image editing systems, particularly closed-source or proprietary models, often struggle with complex, indirect, or multi-step user instructions. These limitations hinder their ability to perform nuance… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

  5. arXiv:2603.02146  [pdf, ps, other

    cs.CL

    LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards

    Authors: Guanzheng Chen, Michael Qizhe Shieh, Lidong Bing

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capabilities of Large Language Models (LLMs) by optimizing them against factual outcomes. However, this paradigm falters in long-context scenarios, as its reliance on internal parametric knowledge is ill-suited for tasks requiring contextual grounding--the ability to find and reason over externally provi… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

    Comments: ICLR 2026

  6. arXiv:2602.04919  [pdf, ps, other

    cs.LG

    Gradually Compacting Large Language Models for Reasoning Like a Boiling Frog

    Authors: Yiran Zhao, Shengyang Zhou, Zijian Wu, Tongyan Hu, Yuhui Xu, Rengan Dou, Kenji Kawaguchi, Shafiq Joty, Junnan Li, Michael Qizhe Shieh

    Abstract: Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, but their substantial size often demands significant computational resources. To reduce resource consumption and accelerate inference, it is essential to eliminate redundant parameters without compromising performance. However, conventional pruning methods that directly remove such parameters often lead to a dramatic… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

  7. arXiv:2511.03276  [pdf, ps, other

    cs.LG

    Diffusion Language Models are Super Data Learners

    Authors: Jinjie Ni, Qian Liu, Longxu Dou, Chao Du, Zili Wang, Hang Yan, Tianyu Pang, Michael Qizhe Shieh

    Abstract: Under strictly controlled pre-training settings, we observe a Crossover: when unique data is limited, diffusion language models (DLMs) consistently surpass autoregressive (AR) models by training for more epochs. The crossover shifts later with more or higher-quality data, earlier with larger models, and persists across dense and sparse architectures. We attribute the gains to three compounding fac… ▽ More

    Submitted 5 November, 2025; originally announced November 2025.

  8. arXiv:2510.27492  [pdf, ps, other

    cs.CV

    ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

    Authors: Jiawei Gu, Yunzhuo Hao, Huichen Will Wang, Linjie Li, Michael Qizhe Shieh, Yejin Choi, Ranjay Krishna, Yu Cheng

    Abstract: Multimodal reasoning requires iterative coordination between language and vision, yet it remains unclear what constitutes a meaningful interleaved chain of thought. We posit that text and image thoughts should function as complementary rather than isomorphic modalities that mutually advance reasoning. Guided by this principle, we build ThinkMorph, a unified model fine-tuned on approximately 24K hi… ▽ More

    Submitted 28 February, 2026; v1 submitted 30 October, 2025; originally announced October 2025.

    Comments: project page: https://thinkmorph.github.io/

  9. arXiv:2510.03280  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Training Optimal Large Diffusion Language Models

    Authors: Jinjie Ni, Qian Liu, Chao Du, Longxu Dou, Hang Yan, Zili Wang, Tianyu Pang, Michael Qizhe Shieh

    Abstract: We introduce Quokka, the first systematic scaling law for diffusion language models (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-term practical guidance in DLMs training and long-term inspirations for the whole… ▽ More

    Submitted 5 November, 2025; v1 submitted 28 September, 2025; originally announced October 2025.

  10. arXiv:2509.24002  [pdf, ps, other

    cs.CL cs.AI

    MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

    Authors: Zijian Wu, Xiangyan Liu, Xinyuan Zhang, Lingjun Chen, Fanqing Meng, Lingxiao Du, Yiran Zhao, Fanshi Zhang, Yaoqi Ye, Jiawei Wang, Zirui Wang, Jinjie Ni, Yufan Yang, Arvin Xu, Michael Qizhe Shieh

    Abstract: MCP standardizes how LLMs interact with external systems, forming the foundation for general agents. However, existing MCP benchmarks remain narrow in scope: they focus on read-heavy tasks or tasks with limited interaction depth, and fail to capture the complexity and realism of real-world workflows. To address this gap, we propose MCPMark, a benchmark designed to evaluate MCP use in a more realis… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

    Comments: 42 pages, 27 figures, 10 tables

  11. arXiv:2506.09890  [pdf, ps, other

    cs.CL cs.AI

    The Emergence of Abstract Thought in Large Language Models Beyond Any Language

    Authors: Yuxin Chen, Yiran Zhao, Yang Zhang, An Zhang, Kenji Kawaguchi, Shafiq Joty, Junnan Li, Tat-Seng Chua, Michael Qizhe Shieh, Wenxuan Zhang

    Abstract: As large language models (LLMs) continue to advance, their capacity to function effectively across a diverse range of languages has shown marked improvement. Preliminary studies observe that the hidden activations of LLMs often resemble English, even when responding to non-English prompts. This has led to the widespread assumption that LLMs may "think" in English. However, more recent results show… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  12. arXiv:2506.02096  [pdf, ps, other

    cs.LG cs.CL cs.CV

    SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

    Authors: Zijian Wu, Jinjie Ni, Xiangyan Liu, Zichen Liu, Hang Yan, Michael Qizhe Shieh

    Abstract: Vision-language models (VLMs) trained via reinforcement learning with verifiable reward (RLVR) have shown notable progress in scaling test-time compute effectively. In this work, we investigate how synthesized RL data can further improve RLVR. To this end, we propose \textbf{SynthRL}-a scalable and guaranteed pipeline for automatic data scaling in reasoning-oriented RL training. SynthRL comprises… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  13. arXiv:2504.13055  [pdf, ps, other

    cs.CV

    NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

    Authors: Xiangyan Liu, Jinjie Ni, Zijian Wu, Chao Du, Longxu Dou, Haonan Wang, Tianyu Pang, Michael Qizhe Shieh

    Abstract: Recent advances in reinforcement learning (RL) have strengthened the reasoning capabilities of vision-language models (VLMs). However, enhancing policy exploration to better scale test-time compute remains largely underexplored. In addition, VLMs continue to struggle with imperfect visual perception, which in turn affects the subsequent reasoning process. We introduce NoisyRollout, a simple yet ef… ▽ More

    Submitted 31 October, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: NeurIPS 2025

  14. arXiv:2504.10559  [pdf, other

    cs.LG cs.AI

    Efficient Process Reward Model Training via Active Learning

    Authors: Keyu Duan, Zichen Liu, Xin Mao, Tianyu Pang, Changyu Chen, Qiguang Chen, Michael Qizhe Shieh, Longxu Dou

    Abstract: Process Reward Models (PRMs) provide step-level supervision to large language models (LLMs), but scaling up training data annotation remains challenging for both humans and LLMs. To address this limitation, we propose an active learning approach, ActPRM, which proactively selects the most uncertain samples for training, substantially reducing labeling costs. During training, we use the PRM to esti… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 15 pages, 4 figures

  15. arXiv:2503.16194  [pdf, other

    cs.CV

    Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction

    Authors: Ziyao Guo, Kaipeng Zhang, Michael Qizhe Shieh

    Abstract: Autoregressive models have shown remarkable success in image generation by adapting sequential prediction techniques from language modeling. However, applying these approaches to images requires discretizing continuous pixel data through vector quantization methods like VQ-VAE. To alleviate the quantization errors that existed in VQ-VAE, recent works tend to use larger codebooks. However, this wil… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: Work in progress

  16. arXiv:2503.01926  [pdf, ps, other

    cs.CL cs.AI

    Unnatural Languages Are Not Bugs but Features for LLMs

    Authors: Keyu Duan, Yiran Zhao, Zhili Feng, Jinjie Ni, Tianyu Pang, Qian Liu, Tianle Cai, Longxu Dou, Kenji Kawaguchi, Anirudh Goyal, J. Zico Kolter, Michael Qizhe Shieh

    Abstract: Large Language Models (LLMs) have been observed to process non-human-readable text sequences, such as jailbreak prompts, often viewed as a bug for aligned LLMs. In this work, we present a systematic investigation challenging this perception, demonstrating that unnatural languages - strings that appear incomprehensible to humans but maintain semantic meanings for LLMs - contain latent features usab… ▽ More

    Submitted 3 June, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

  17. arXiv:2502.20330  [pdf, ps, other

    cs.CL

    RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding

    Authors: Guanzheng Chen, Qilong Feng, Jinjie Ni, Xin Li, Michael Qizhe Shieh

    Abstract: The emergence of long-context large language models (LLMs) offers a promising alternative to traditional retrieval-augmented generation (RAG) for processing extensive documents. However, the computational overhead of long-context inference presents significant efficiency challenges. While Speculative Decoding (SD) traditionally accelerates inference using smaller draft models, its effectiveness di… ▽ More

    Submitted 22 June, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: ICML 2025 Spotlight

  18. arXiv:2502.13922  [pdf, other

    cs.CL cs.LG

    LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

    Authors: Guanzheng Chen, Xin Li, Michael Qizhe Shieh, Lidong Bing

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities through pretraining and alignment. However, superior short-context LLMs may underperform in long-context scenarios due to insufficient long-context alignment. This alignment process remains challenging due to the impracticality of human annotation for extended contexts and the difficulty in balancing short- and long-context per… ▽ More

    Submitted 1 March, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: ICLR 2025