Skip to main content

Showing 1–39 of 39 results for author: Zakharov, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.16881  [pdf, ps, other

    cs.RO cs.LG

    PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies

    Authors: Arhan Jain, Mingtong Zhang, Kanav Arora, William Chen, Marcel Torne, Muhammad Zubair Irshad, Sergey Zakharov, Yue Wang, Sergey Levine, Chelsea Finn, Wei-Chiu Ma, Dhruv Shah, Abhishek Gupta, Karl Pertsch

    Abstract: A significant challenge for robot learning research is our ability to accurately measure and compare the performance of robot policies. Benchmarking in robotics is historically challenging due to the stochasticity, reproducibility, and time-consuming nature of real-world rollouts. This challenge is exacerbated for recent generalist policies, which has to be evaluated across a wide variety of scene… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

    Comments: Website: https://polaris-evals.github.io/

  2. arXiv:2509.22970  [pdf, ps, other

    cs.RO cs.CV cs.LG

    Robot Learning from Any Images

    Authors: Siheng Zhao, Jiageng Mao, Wei Chow, Zeyu Shangguan, Tianheng Shi, Rong Xue, Yuxi Zheng, Yijia Weng, Yang You, Daniel Seita, Leonidas Guibas, Sergey Zakharov, Vitor Guizilini, Yue Wang

    Abstract: We introduce RoLA, a framework that transforms any in-the-wild image into an interactive, physics-enabled robotic environment. Unlike previous methods, RoLA operates directly on a single image without requiring additional hardware or digital assets. Our framework democratizes robotic data generation by producing massive visuomotor robotic demonstrations within minutes from a wide range of image so… ▽ More

    Submitted 8 October, 2025; v1 submitted 26 September, 2025; originally announced September 2025.

    Comments: CoRL 2025 camera ready

  3. arXiv:2508.03669  [pdf, ps, other

    cs.CV cs.RO

    OmniShape: Zero-Shot Multi-Hypothesis Shape and Pose Estimation in the Real World

    Authors: Katherine Liu, Sergey Zakharov, Dian Chen, Takuya Ikeda, Greg Shakhnarovich, Adrien Gaidon, Rares Ambrus

    Abstract: We would like to estimate the pose and full shape of an object from a single observation, without assuming known 3D model or category. In this work, we propose OmniShape, the first method of its kind to enable probabilistic pose and shape estimation. OmniShape is based on the key insight that shape completion can be decoupled into two multi-modal distributions: one capturing how measurements proje… ▽ More

    Submitted 5 August, 2025; originally announced August 2025.

    Comments: 8 pages, 5 figures. This version has typo fixes on top of the version published at ICRA 2025

  4. arXiv:2507.05331  [pdf, ps, other

    cs.RO

    A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation

    Authors: TRI LBM Team, Jose Barreiros, Andrew Beaulieu, Aditya Bhat, Rick Cory, Eric Cousineau, Hongkai Dai, Ching-Hsin Fang, Kunimatsu Hashimoto, Muhammad Zubair Irshad, Masha Itkina, Naveen Kuppuswamy, Kuan-Hui Lee, Katherine Liu, Dale McConachie, Ian McMahon, Haruki Nishimura, Calder Phillips-Grafflin, Charles Richter, Paarth Shah, Krishnan Srinivasan, Blake Wulfe, Chen Xu, Mengchao Zhang, Alex Alspach , et al. (57 additional authors not shown)

    Abstract: Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. While these models have garnere… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  5. arXiv:2505.11905  [pdf, other

    cs.CV cs.RO

    GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity

    Authors: Takuya Ikeda, Sergey Zakharov, Muhammad Zubair Irshad, Istvan Balazs Opra, Shun Iwase, Dian Chen, Mark Tjersland, Robert Lee, Alexandre Dilly, Rares Ambrus, Koichi Nishiwaki

    Abstract: We present a novel method for 6-DoF object tracking and high-quality 3D reconstruction from monocular RGBD video. Existing methods, while achieving impressive results, often struggle with complex objects, particularly those exhibiting symmetry, intricate geometry or complex appearance. To bridge these gaps, we introduce an adaptive method that combines 3D Gaussian Splatting, hybrid geometry/appear… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: main contains 10 pages, 9 figures. And supplementary material contains 10 pages, 27 figures

  6. arXiv:2505.04831  [pdf, ps, other

    cs.RO cs.GR cs.LG

    Steerable Scene Generation with Post Training and Inference-Time Search

    Authors: Nicholas Pfaff, Hongkai Dai, Sergey Zakharov, Shun Iwase, Russ Tedrake

    Abstract: Training robots in simulation requires diverse 3D scenes that reflect the specific challenges of downstream tasks. However, scenes that satisfy strict task requirements, such as high-clutter environments with plausible spatial arrangement, are rare and costly to curate manually. Instead, we generate large-scale scene data using procedural models that approximate realistic environments for robotic… ▽ More

    Submitted 26 August, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

    Comments: Project website: https://steerable-scene-generation.github.io/

  7. arXiv:2504.10857  [pdf, other

    cs.RO cs.CV

    ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping

    Authors: Shun Iwase, Zubair Irshad, Katherine Liu, Vitor Guizilini, Robert Lee, Takuya Ikeda, Ayako Amma, Koichi Nishiwaki, Kris Kitani, Rares Ambrus, Sergey Zakharov

    Abstract: Robotic grasping is a cornerstone capability of embodied systems. Many methods directly output grasps from partial information without modeling the geometry of the scene, leading to suboptimal motion and even collisions. To address these issues, we introduce ZeroGrasp, a novel framework that simultaneously performs 3D reconstruction and grasp pose prediction in near real-time. A key insight of our… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: Published at CVPR 2025, Webpage: https://sh8.io/#/zerograsp

  8. arXiv:2411.07326  [pdf, other

    cs.CV

    $SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation

    Authors: Yinshuang Xu, Dian Chen, Katherine Liu, Sergey Zakharov, Rares Ambrus, Kostas Daniilidis, Vitor Guizilini

    Abstract: Incorporating inductive bias by embedding geometric entities (such as rays) as input has proven successful in multi-view learning. However, the methods adopting this technique typically lack equivariance, which is crucial for effective 3D learning. Equivariance serves as a valuable inductive prior, aiding in the generation of robust multi-view features for 3D scene understanding. In this paper, we… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

    Comments: Accepted at NeurIPS 2024

  9. arXiv:2409.03685  [pdf, ps, other

    cs.RO cs.AI cs.CV cs.LG

    View-Invariant Policy Learning via Zero-Shot Novel View Synthesis

    Authors: Stephen Tian, Blake Wulfe, Kyle Sargent, Katherine Liu, Sergey Zakharov, Vitor Guizilini, Jiajun Wu

    Abstract: Large-scale visuomotor policy learning is a promising approach toward developing generalizable manipulation systems. Yet, policies that can be deployed on diverse embodiments, environments, and observational modalities remain elusive. In this work, we investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: obs… ▽ More

    Submitted 31 May, 2025; v1 submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted to CoRL 2024

  10. arXiv:2409.03061  [pdf, other

    cs.CV cs.GR cs.RO

    Incorporating dense metric depth into neural 3D representations for view synthesis and relighting

    Authors: Arkadeep Narayan Chaudhury, Igor Vasiljevic, Sergey Zakharov, Vitor Guizilini, Rares Ambrus, Srinivasa Narasimhan, Christopher G. Atkeson

    Abstract: Synthesizing accurate geometry and photo-realistic appearance of small scenes is an active area of research with compelling use cases in gaming, virtual reality, robotic-manipulation, autonomous driving, convenient product capture, and consumer-level photography. When applying scene geometry and appearance estimation techniques to robotics, we found that the narrow cone of possible viewpoints due… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Project webpage: https://stereomfc.github.io

  11. arXiv:2406.04309  [pdf, other

    cs.CV cs.GR cs.LG cs.MM

    ReFiNe: Recursive Field Networks for Cross-modal Multi-scene Representation

    Authors: Sergey Zakharov, Katherine Liu, Adrien Gaidon, Rares Ambrus

    Abstract: The common trade-offs of state-of-the-art methods for multi-shape representation (a single model "packing" multiple objects) involve trading modeling accuracy against memory and storage. We show how to encode multiple shapes represented as continuous neural fields with a higher degree of precision than previously possible and with low memory usage. Key to our approach is a recursive hierarchical f… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: SIGGRAPH 2024. Project Page: https://zakharos.github.io/projects/refine/

  12. arXiv:2404.01300  [pdf, other

    cs.CV cs.AI cs.LG

    NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

    Authors: Muhammad Zubair Irshad, Sergey Zakharov, Vitor Guizilini, Adrien Gaidon, Zsolt Kira, Rares Ambrus

    Abstract: Neural fields excel in computer vision and robotics due to their ability to understand the 3D visual world such as inferring semantics, geometry, and dynamics. Given the capabilities of neural fields in densely representing a 3D scene from 2D images, we ask the question: Can we scale their self-supervised pretraining, specifically using masked autoencoders, to generate effective 3D representations… ▽ More

    Submitted 18 July, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to ECCV 2024. Project Page: https://nerf-mae.github.io/

  13. arXiv:2403.14628  [pdf, other

    cs.CV

    Zero-Shot Multi-Object Scene Completion

    Authors: Shun Iwase, Katherine Liu, Vitor Guizilini, Adrien Gaidon, Kris Kitani, Rares Ambrus, Sergey Zakharov

    Abstract: We present a 3D scene completion method that recovers the complete geometry of multiple unseen objects in complex scenes from a single RGB-D image. Despite notable advancements in single-object 3D shape completion, high-quality reconstructions in highly cluttered real-world multi-object scenes remains a challenge. To address this issue, we propose OctMAE, an architecture that leverages an Octree U… ▽ More

    Submitted 30 August, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Published at ECCV 2024, Webpage: https://sh8.io/#/oct_mae

  14. arXiv:2402.12647  [pdf, other

    cs.CV cs.RO

    DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation

    Authors: Takuya Ikeda, Sergey Zakharov, Tianyi Ko, Muhammad Zubair Irshad, Robert Lee, Katherine Liu, Rares Ambrus, Koichi Nishiwaki

    Abstract: This paper addresses the challenging problem of category-level pose estimation. Current state-of-the-art methods for this task face challenges when dealing with symmetric objects and when attempting to generalize to new environments solely through synthetic data training. In this work, we address these challenges by proposing a probabilistic model that relies on diffusion to estimate dense canonic… ▽ More

    Submitted 5 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 8 pages. 9 figures. This work has been submitted to the IEEE for possible publication

  15. arXiv:2310.12974  [pdf, other

    cs.CV cs.RO

    FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects

    Authors: Mayank Lunayach, Sergey Zakharov, Dian Chen, Rares Ambrus, Zsolt Kira, Muhammad Zubair Irshad

    Abstract: In this work, we address the challenging task of 3D object recognition without the reliance on real-world 3D labeled data. Our goal is to predict the 3D shape, size, and 6D pose of objects within a single RGB-D image, operating at the category level and eliminating the need for CAD models during inference. While existing self-supervised methods have made strides in this field, they often suffer fr… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Project page: https://fsd6d.github.io

  16. arXiv:2308.12967  [pdf, other

    cs.CV cs.AI cs.LG

    NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes

    Authors: Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Vitor Guizilini, Thomas Kollar, Adrien Gaidon, Zsolt Kira, Rares Ambrus

    Abstract: Recent implicit neural representations have shown great results for novel view synthesis. However, existing methods require expensive per-scene optimization from many views hence limiting their application to real-world unbounded urban settings where the objects of interest or backgrounds are observed from very few views. To mitigate this challenge, we introduce a new approach called NeO 360, Neur… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Accepted to International Conference on Computer Vision (ICCV), 2023. Project page: https://zubair-irshad.github.io/projects/neo360.html

  17. arXiv:2306.08748  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Multi-Object Manipulation via Object-Centric Neural Scattering Functions

    Authors: Stephen Tian, Yancheng Cai, Hong-Xing Yu, Sergey Zakharov, Katherine Liu, Adrien Gaidon, Yunzhu Li, Jiajun Wu

    Abstract: Learned visual dynamics models have proven effective for robotic manipulation tasks. Yet, it remains unclear how best to represent scenes involving multi-object interactions. Current methods decompose a scene into discrete objects, but they struggle with precise modeling and manipulation amid challenging lighting conditions as they only encode appearance tied with specific illuminations. In this w… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: First two authors contributed equally. Accepted at CVPR 2023. Project page: https://s-tian.github.io/projects/actionosf/

  18. arXiv:2304.02797  [pdf, other

    cs.CV

    DeLiRa: Self-Supervised Depth, Light, and Radiance Fields

    Authors: Vitor Guizilini, Igor Vasiljevic, Jiading Fang, Rares Ambrus, Sergey Zakharov, Vincent Sitzmann, Adrien Gaidon

    Abstract: Differentiable volumetric rendering is a powerful paradigm for 3D reconstruction and novel view synthesis. However, standard volume rendering approaches struggle with degenerate geometries in the case of limited viewpoint diversity, a common scenario in robotics applications. In this work, we propose to use the multi-view photometric objective from the self-supervised depth estimation literature a… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Project page: https://sites.google.com/view/tri-delira

  19. CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects

    Authors: Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Rares Andrei Ambrus, Jeannette Bohg, Abhinav Valada, Thomas Kollar

    Abstract: We present CARTO, a novel approach for reconstructing multiple articulated objects from a single stereo RGB observation. We use implicit object-centric representations and learn a single geometry and articulation decoder for multiple object categories. Despite training on multiple categories, our decoder achieves a comparable reconstruction accuracy to methods that train bespoke decoders separatel… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: 20 pages, 11 figures, accepted at CVPR 2023

  20. arXiv:2303.11328  [pdf, other

    cs.CV cs.GR cs.RO

    Zero-1-to-3: Zero-shot One Image to 3D Object

    Authors: Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, Carl Vondrick

    Abstract: We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image. To perform novel view synthesis in this under-constrained setting, we capitalize on the geometric priors that large-scale diffusion models learn about natural images. Our conditional diffusion model uses a synthetic dataset to learn controls of the relative camera viewpoint, which al… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: Website: https://zero123.cs.columbia.edu/

  21. arXiv:2212.06193  [pdf, other

    cs.CV cs.GR cs.RO

    ROAD: Learning an Implicit Recursive Octree Auto-Decoder to Efficiently Encode 3D Shapes

    Authors: Sergey Zakharov, Rares Ambrus, Katherine Liu, Adrien Gaidon

    Abstract: Compact and accurate representations of 3D shapes are central to many perception and robotics tasks. State-of-the-art learning-based methods can reconstruct single objects but scale poorly to large datasets. We present a novel recursive implicit representation to efficiently and accurately encode large datasets of complex 3D shapes by recursively traversing an implicit octree in latent space. Our… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: Accepted to Conference on Robot Learning (CoRL), 2022

  22. arXiv:2210.12682  [pdf, other

    cs.CV cs.RO

    Photo-realistic Neural Domain Randomization

    Authors: Sergey Zakharov, Rares Ambrus, Vitor Guizilini, Wadim Kehl, Adrien Gaidon

    Abstract: Synthetic data is a scalable alternative to manual supervision, but it requires overcoming the sim-to-real domain gap. This discrepancy between virtual and real worlds is addressed by two seemingly opposed approaches: improving the realism of simulation or foregoing realism entirely via domain randomization. In this paper, we show that the recent progress in neural rendering enables a new unified… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2022

  23. arXiv:2207.13691  [pdf, other

    cs.CV cs.LG cs.RO

    ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization

    Authors: Muhammad Zubair Irshad, Sergey Zakharov, Rares Ambrus, Thomas Kollar, Zsolt Kira, Adrien Gaidon

    Abstract: Our method studies the complex task of object-centric 3D understanding from a single RGB-D observation. As it is an ill-posed problem, existing methods suffer from low performance for both 3D shape and 6D pose and size estimation in complex multi-object scenarios with occlusions. We present ShAPO, a method for joint multi-object detection, 3D textured reconstruction, 6D object pose and size estima… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2022

  24. arXiv:2207.11232  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Neural Groundplans: Persistent Neural Scene Representations from a Single Image

    Authors: Prafull Sharma, Ayush Tewari, Yilun Du, Sergey Zakharov, Rares Ambrus, Adrien Gaidon, William T. Freeman, Fredo Durand, Joshua B. Tenenbaum, Vincent Sitzmann

    Abstract: We present a method to map 2D image observations of a scene to a persistent 3D scene representation, enabling novel view synthesis and disentangled representation of the movable and immovable components of the scene. Motivated by the bird's-eye-view (BEV) representation commonly used in vision and robotics, we propose conditional neural groundplans, ground-aligned 2D feature grids, as persistent a… ▽ More

    Submitted 9 April, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

    Comments: Project page: https://prafullsharma.net/neural_groundplans/

  25. arXiv:2207.05856  [pdf, other

    cs.CV

    SpOT: Spatiotemporal Modeling for 3D Object Tracking

    Authors: Colton Stearns, Davis Rempe, Jie Li, Rares Ambrus, Sergey Zakharov, Vitor Guizilini, Yanchao Yang, Leonidas J Guibas

    Abstract: 3D multi-object tracking aims to uniquely and consistently identify all mobile entities through time. Despite the rich spatiotemporal information available in this setting, current 3D tracking methods primarily rely on abstracted information and limited history, e.g. single-frame object bounding boxes. In this work, we develop a holistic representation of traffic scenes that leverages both spatial… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

  26. Multi-View Object Pose Refinement With Differentiable Renderer

    Authors: Ivan Shugurov, Ivan Pavlov, Sergey Zakharov, Slobodan Ilic

    Abstract: This paper introduces a novel multi-view 6 DoF object pose refinement approach focusing on improving methods trained on synthetic data. It is based on the DPOD detector, which produces dense 2D-3D correspondences between the model vertices and the image pixels in each frame. We have opted for the use of multiple frames with known relative camera transformations, as it allows introduction of geomet… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Journal ref: IEEE Robotics and Automation Letters, 2021

  27. DPODv2: Dense Correspondence-Based 6 DoF Pose Estimation

    Authors: Ivan Shugurov, Sergey Zakharov, Slobodan Ilic

    Abstract: We propose a three-stage 6 DoF object detection method called DPODv2 (Dense Pose Object Detector) that relies on dense correspondences. We combine a 2D object detector with a dense correspondence estimation network and a multi-view pose refinement method to estimate a full 6 DoF pose. Unlike other deep learning methods that are typically restricted to monocular RGB images, we propose a unified dee… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 2021

  28. arXiv:2205.03923  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.MM

    Unsupervised Discovery and Composition of Object Light Fields

    Authors: Cameron Smith, Hong-Xing Yu, Sergey Zakharov, Fredo Durand, Joshua B. Tenenbaum, Jiajun Wu, Vincent Sitzmann

    Abstract: Neural scene representations, both continuous and discrete, have recently emerged as a powerful new paradigm for 3D scene understanding. Recent efforts have tackled unsupervised discovery of object-centric neural scene representations. However, the high cost of ray-marching, exacerbated by the fact that each object representation has to be ray-marched separately, leads to insufficiently sampled ra… ▽ More

    Submitted 15 July, 2023; v1 submitted 8 May, 2022; originally announced May 2022.

    Comments: Project website: https://cameronosmith.github.io/colf. TMLR 2023

  29. arXiv:2204.07616  [pdf, other

    cs.CV

    Multi-Frame Self-Supervised Depth with Transformers

    Authors: Vitor Guizilini, Rares Ambrus, Dian Chen, Sergey Zakharov, Adrien Gaidon

    Abstract: Multi-frame depth estimation improves over single-frame approaches by also leveraging geometric relationships between images via feature matching, in addition to learning appearance-based features. In this paper we revisit feature matching for self-supervised monocular depth estimation, and propose a novel transformer architecture for cost volume generation. We use depth-discretized epipolar sampl… ▽ More

    Submitted 10 June, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: Accepted to CVPR 2022 (correct project page)

  30. arXiv:2012.04585  [pdf, other

    cs.CL cs.SI

    Discourse Parsing of Contentious, Non-Convergent Online Discussions

    Authors: Stepan Zakharov, Omri Hadar, Tovit Hakak, Dina Grossman, Yifat Ben-David Kolikant, Oren Tsur

    Abstract: Online discourse is often perceived as polarized and unproductive. While some conversational discourse parsing frameworks are available, they do not naturally lend themselves to the analysis of contentious and polarizing discussions. Inspired by the Bakhtinian theory of Dialogism, we propose a novel theoretical and computational framework, better suited for non-convergent discussions. We redefine… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

  31. arXiv:1911.11288  [pdf, other

    cs.CV

    Autolabeling 3D Objects with Differentiable Rendering of SDF Shape Priors

    Authors: Sergey Zakharov, Wadim Kehl, Arjun Bhargava, Adrien Gaidon

    Abstract: We present an automatic annotation pipeline to recover 9D cuboids and 3D shapes from pre-trained off-the-shelf 2D detectors and sparse LIDAR data. Our autolabeling method solves an ill-posed inverse problem by considering learned shape priors and optimizing geometric and physical parameters. To address this challenging problem, we apply a novel differentiable shape renderer to signed distance fiel… ▽ More

    Submitted 2 April, 2020; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: CVPR 2020 (Oral). 8 pages + supplementary material. The first two authors contributed equally to this work

  32. 3D Object Instance Recognition and Pose Estimation Using Triplet Loss with Dynamic Margin

    Authors: Sergey Zakharov, Wadim Kehl, Benjamin Planche, Andreas Hutter, Slobodan Ilic

    Abstract: In this paper, we address the problem of 3D object instance recognition and pose estimation of localized objects in cluttered environments using convolutional neural networks. Inspired by the descriptor learning approach of Wohlhart et al., we propose a method that introduces the dynamic margin in the manifold learning triplet loss function. Such a loss function is designed to map images of differ… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

    Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 552-559. IEEE, 2017

  33. arXiv:1904.03167  [pdf, other

    cs.CV cs.RO

    HomebrewedDB: RGB-D Dataset for 6D Pose Estimation of 3D Objects

    Authors: Roman Kaskman, Sergey Zakharov, Ivan Shugurov, Slobodan Ilic

    Abstract: Among the most important prerequisites for creating and evaluating 6D object pose detectors are datasets with labeled 6D poses. With the advent of deep learning, demand for such datasets is growing continuously. Despite the fact that some of exist, they are scarce and typically have restricted setups, such as a single object per sequence, or they focus on specific object types, such as textureless… ▽ More

    Submitted 30 September, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

    Comments: ICCVW 2019

  34. arXiv:1904.02750  [pdf, other

    cs.CV

    DeceptionNet: Network-Driven Domain Randomization

    Authors: Sergey Zakharov, Wadim Kehl, Slobodan Ilic

    Abstract: We present a novel approach to tackle domain adaptation between synthetic and real data. Instead, of employing "blind" domain randomization, i.e., augmenting synthetic renderings with random backgrounds or changing illumination and colorization, we leverage the task network as its own adversarial guide toward useful augmentations that maximize the uncertainty of the output. To this end, we design… ▽ More

    Submitted 20 August, 2019; v1 submitted 4 April, 2019; originally announced April 2019.

    Comments: ICCV 2019

  35. arXiv:1902.11020  [pdf, other

    cs.CV cs.RO

    DPOD: 6D Pose Object Detector and Refiner

    Authors: Sergey Zakharov, Ivan Shugurov, Slobodan Ilic

    Abstract: In this paper we present a novel deep learning method for 3D object detection and 6D pose estimation from RGB images. Our method, named DPOD (Dense Pose Object Detector), estimates dense multi-class 2D-3D correspondence maps between an input image and available 3D models. Given the correspondences, a 6DoF pose is computed via PnP and RANSAC. An additional RGB pose refinement of the initial pose es… ▽ More

    Submitted 20 August, 2019; v1 submitted 28 February, 2019; originally announced February 2019.

    Comments: ICCV 2019. 8 pages + supplementary material + references. The first two authors contributed equally to this work

  36. arXiv:1810.04158  [pdf, other

    cs.CV

    Seeing Beyond Appearance - Mapping Real Images into Geometrical Domains for Unsupervised CAD-based Recognition

    Authors: Benjamin Planche, Sergey Zakharov, Ziyan Wu, Andreas Hutter, Harald Kosch, Slobodan Ilic

    Abstract: While convolutional neural networks are dominating the field of computer vision, one usually does not have access to the large amount of domain-relevant data needed for their training. It thus became common to use available synthetic samples along domain adaptation schemes to prepare algorithms for the target domain. Tackling this problem from a different angle, we introduce a pipeline to map unse… ▽ More

    Submitted 9 October, 2018; originally announced October 2018.

    Comments: paper + supplementary material; previous work: "Keep it Unreal: Bridging the Realism Gap for 2.5D Recognition with Geometry Priors Only"

  37. When Regression Meets Manifold Learning for Object Recognition and Pose Estimation

    Authors: Mai Bui, Sergey Zakharov, Shadi Albarqouni, Slobodan Ilic, Nassir Navab

    Abstract: In this work, we propose a method for object recognition and pose estimation from depth images using convolutional neural networks. Previous methods addressing this problem rely on manifold learning to learn low dimensional viewpoint descriptors and employ them in a nearest neighbor search on an estimated descriptor space. In comparison we create an efficient multi-task learning framework combinin… ▽ More

    Submitted 16 May, 2018; originally announced May 2018.

    Journal ref: 2018 IEEE International Conference on Robotics and Automation (ICRA)

  38. arXiv:1804.09113  [pdf, other

    cs.CV

    Keep it Unreal: Bridging the Realism Gap for 2.5D Recognition with Geometry Priors Only

    Authors: Sergey Zakharov, Benjamin Planche, Ziyan Wu, Andreas Hutter, Harald Kosch, Slobodan Ilic

    Abstract: With the increasing availability of large databases of 3D CAD models, depth-based recognition methods can be trained on an uncountable number of synthetically rendered images. However, discrepancies with the real data acquired from various depth sensors still noticeably impede progress. Previous works adopted unsupervised approaches to generate more realistic depth data, but they all require real… ▽ More

    Submitted 24 May, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

    Comments: 10 pages + supplemetary material + references. The first two authors contributed equally to this work

  39. arXiv:1702.08558  [pdf, other

    cs.CV

    DepthSynth: Real-Time Realistic Synthetic Data Generation from CAD Models for 2.5D Recognition

    Authors: Benjamin Planche, Ziyan Wu, Kai Ma, Shanhui Sun, Stefan Kluckner, Terrence Chen, Andreas Hutter, Sergey Zakharov, Harald Kosch, Jan Ernst

    Abstract: Recent progress in computer vision has been dominated by deep neural networks trained over large amounts of labeled data. Collecting such datasets is however a tedious, often impossible task; hence a surge in approaches relying solely on synthetic data for their training. For depth images however, discrepancies with real scans still noticeably affect the end performance. We thus propose an end-to-… ▽ More

    Submitted 28 November, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

    Comments: International Conference on 3D Vision 2017