Robustness of Embodied Point Navigation Agents

Abstract

We make a step towards robust embodied AI by analyzing the performance of two successful Habitat Challenge 2021 agents under different visual corruptions (low lighting, blur, noise, etc.) and robot dynamics corruptions (noisy egomotion). The agents had underperformed overall. However, the VO2021 agent managed to handle multiple corruptions with ease, as the authors deliberately tackled robustness in their model. For specific corruptions, we concur with observations from literature that there is still a long way to go to recover the performance loss caused by corruptions, warranting more research on the robustness of embodied AI.

Video

Corruptions

We use the RGB corruptions from ImageNet-C by adding the 8 RGB corruptions shown below, as applied to a clean image of Vincent van Gogh's room. The depth noise is modeled by the Redwood Depth Noise Model. Besides visual corruptions, we add dynamics corruptions by tweaking the robot noise parameters. All robot movements are noisy. For example, turning the robot to the right by 30° will inevitably introduce some forward-backward motion and will not turn the robot by exactly 30°.

Overall, we use the corruptions in 42 different configurations: 5 configurations have only dynamics corruptions, 17 have visual corruptions only, and 20 have a combination of the two. Including more configurations would trade computation resources for the hope of finding more meaningful observations.

Defocus Blur

Color Jitter

Spatter

Speckle Noise

Low Lighting

narrower-horizontal-field-of-view corruption

Fixed Narrower HFOV

Motion Blur

Random Affine

Clean

Results

The performance of agents has decreased overall in corrupt environments, as measured by the Success Rate, SPL, SoftSPL, and Distance to Goal (see the figures below). Particular corruptions make all agents fail: color jitter, lower horizontal field of view, random image shifts, doubled robot noise.

The VO2021 agent is reasonably robust under multiple corruptions: low lighting, motion blur, defocus blur, noiseless RGB, and speckle noise. We attribute this achievement of VO2021 to the way the visual odometry module was designed to tackle the robustness. Baselines show poor performance under all conditions and are not useful for comparison. The only case in which either one of the baselines was better is when the UCU MLab agent failed for moderate and severe speckle noise.

Row identifier: 01 , 02, ..., 06, 07, ..., 23, 24, ..., 43
Corruption category: V: Visual corruption D: Dynamic corruption VD: Visual and Dynamic corruption combined
Corruption: CJ: Color Jitter DB: Defocus Blur DN: Depth Noise HFOV: Lower HFOV (50 degrees) L: Low lightening Lite: LoCoBot-Lite robot MB: MoveBase PyRobot controller PC: PyRobot Controller PNM: PyRobot Noise Multiplier RS: Random Shift S: Spatter SN: Spackle Noise

BibTeX

@inproceedings{rajivc2023robustness,
  title={Robustness of Embodied Point Navigation Agents},
  author={Raji{\v{c}}, Frano},
  booktitle={Computer Vision--ECCV 2022 Workshops: Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part VI},
  pages={193--204},
  year={2023},
  organization={Springer}
}