Computer Science > Robotics

arXiv:2510.19495 (cs)

[Submitted on 22 Oct 2025 (v1), last revised 25 Oct 2025 (this version, v2)]

Title:Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning

Authors:Kevin Huang, Rosario Scalise, Cleah Winston, Ayush Agrawal, Yunchu Zhang, Rohan Baijal, Markus Grotz, Byron Boots, Benjamin Burchfiel, Masha Itkina, Paarth Shah, Abhishek Gupta

View PDF HTML (experimental)

Abstract:Imitation learning has proven effective for training robots to perform complex tasks from expert human demonstrations. However, it remains limited by its reliance on high-quality, task-specific data, restricting adaptability to the diverse range of real-world object configurations and scenarios. In contrast, non-expert data -- such as play data, suboptimal demonstrations, partial task completions, or rollouts from suboptimal policies -- can offer broader coverage and lower collection costs. However, conventional imitation learning approaches fail to utilize this data effectively. To address these challenges, we posit that with right design decisions, offline reinforcement learning can be used as a tool to harness non-expert data to enhance the performance of imitation learning policies. We show that while standard offline RL approaches can be ineffective at actually leveraging non-expert data under the sparse data coverage settings typically encountered in the real world, simple algorithmic modifications can allow for the utilization of this data, without significant additional assumptions. Our approach shows that broadening the support of the policy distribution can allow imitation algorithms augmented by offline RL to solve tasks robustly, showing considerably enhanced recovery and generalization behavior. In manipulation tasks, these innovations significantly increase the range of initial conditions where learned policies are successful when non-expert data is incorporated. Moreover, we show that these methods are able to leverage all collected data, including partial or suboptimal demonstrations, to bolster task-directed policy performance. This underscores the importance of algorithmic techniques for using non-expert data for robust policy learning in robotics. Website: this https URL

Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2510.19495 [cs.RO]
	(or arXiv:2510.19495v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2510.19495

Submission history

From: Kevin Huang [view email]
[v1] Wed, 22 Oct 2025 11:43:39 UTC (9,927 KB)
[v2] Sat, 25 Oct 2025 01:18:43 UTC (9,927 KB)

Computer Science > Robotics

Title:Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators