We make a step towards robust embodied AI by analyzing the performance of two successful Habitat Challenge 2021 agents under different visual corruptions (low lighting, blur, noise, etc.) and robot dynamics corruptions (noisy egomotion). The agents had underperformed overall. However, the VO2021 agent managed to handle multiple corruptions with ease, as the authors deliberately tackled robustness in their model. For specific corruptions, we concur with observations from literature that there is still a long way to go to recover the performance loss caused by corruptions, warranting more research on the robustness of embodied AI.
We use the RGB corruptions from ImageNet-C by adding the 8 RGB corruptions shown below, as applied to a clean image of Vincent van Gogh's room. The depth noise is modeled by the Redwood Depth Noise Model. Besides visual corruptions, we add dynamics corruptions by tweaking the robot noise parameters. All robot movements are noisy. For example, turning the robot to the right by 30° will inevitably introduce some forward-backward motion and will not turn the robot by exactly 30°.
Overall, we use the corruptions in 42 different configurations: 5 configurations have only dynamics corruptions, 17 have visual corruptions only, and 20 have a combination of the two. Including more configurations would trade computation resources for the hope of finding more meaningful observations.
The performance of agents has decreased overall in corrupt environments, as measured by the Success Rate, SPL, SoftSPL, and Distance to Goal (see the figures below). Particular corruptions make all agents fail: color jitter, lower horizontal field of view, random image shifts, doubled robot noise.
The VO2021 agent is reasonably robust under multiple corruptions: low lighting, motion blur, defocus blur, noiseless RGB, and speckle noise. We attribute this achievement of VO2021 to the way the visual odometry module was designed to tackle the robustness. Baselines show poor performance under all conditions and are not useful for comparison. The only case in which either one of the baselines was better is when the UCU MLab agent failed for moderate and severe speckle noise.
Row identifier:
01 ,
02, ..., 06,
07, ..., 23,
24, ..., 43
Corruption category:
V: Visual corruption
D: Dynamic corruption
VD: Visual and Dynamic corruption combined
Corruption:
CJ: Color Jitter
DB: Defocus Blur
DN: Depth Noise
HFOV: Lower HFOV (50 degrees)
L: Low lightening
Lite: LoCoBot-Lite robot
MB: MoveBase PyRobot controller
PC: PyRobot Controller
PNM: PyRobot Noise Multiplier
RS: Random Shift
S: Spatter
SN: Spackle Noise
@inproceedings{rajivc2023robustness,
title={Robustness of Embodied Point Navigation Agents},
author={Raji{\v{c}}, Frano},
booktitle={Computer Vision--ECCV 2022 Workshops: Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part VI},
pages={193--204},
year={2023},
organization={Springer}
}