HILO: A Large-Scale Heterogeneous Object Dataset for Benchmarking Robotic Grasping Approaches

Accepted and Presented at ICARA 2025

Xinchao Song, Sean Banerjee, Natasha Kholgade Banerjee

Terascale All sensing Research Studio



Abstract

Robust object manipulation is essential for robotics applications in real-world environments, especially when handling diverse and complex everyday objects. To facilitate this research, we present HILO, a large-scale dataset of 253 everyday objects and 288 diverse scenes. HILO bridges a crucial gap in existing manipulation datasets through its heterogeneity and dual-resolution approach, combining HIgh-resolution individual object scans with LOw-resolution scans of cluttered scenes. This provides both the precise geometric data needed for grasp planning and realistic environmental context. The dataset's comprehensive representations enable rigorous benchmarking of robotic grasping algorithms. Our evaluation of three leading grasping algorithms—Contact-GraspNet, GraspNet Baseline, and DexNet 4.0—reveals critical trade-offs between grasp quantity and quality, demonstrating the dataset's value in advancing robotic grasping research. HILO's rich object diversity and dual-resolution methodology provide a foundation for developing more versatile robotic systems capable of reliable real-world robotic manipulation.


HIgh-Resolution Models

Example high-resolution models from the HILO dataset.
The HILO dataset comprises the high-resolution (HI) individual scanned models for 253 diverse everyday objects across seven categories: toys, food and drink items, cooking utensils, tools, mugs and containers, general household items, and office supplies. All these items are easily obtainable through retail or online vendors.

The number of objects, average mass, vertex count, face count, volume, and surface area per category of the HILO dataset
#Objects Mass (g) Vertices Faces Volume (cm3) Surface Area (cm2)
Toys 33 134 1,146,901 2,294,540 416 439
Food/Drink 33 296 1,190,135 2,380,509 455 375
Cooking 38 301 937,467 1,897,075 327 756
Tools 40 164 749,976 1,500,821 125 285
Mugs/Containers 35 329 1,210,220 2,420,620 923 685
Household 39 359 1,044,337 2,089,883 775 638
Office 35 193 1,074,618 2,154,697 377 698
Total 253 254 1,041,806 2,088,164 485 556


LOw-Resolution Scenes

The HILO dataset contains 288 low-resolution cluttered scenes from 72 groups of 10 objects randomly selected from the 253 objects. Each scene contains:
Example low-resolution scenes from the HILO dataset.
The HILO dataset contains 32,256 RGBD image from diverse viewpoints.


A GIT animation of an example point cloud generated by the masked undistorted RGBD image of a low-resolution scene and aligned with each high-resolution mesh.
An example point cloud generated by the masked undistorted RGBD image of a low-resolution scene and aligned with high-resolution meshes using the corresponding object transformation annotations.