THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation (2024-02-13T00:00:00.000000Z)