To study the impact of the long-tailed open world on the multi-modal large language models
(MLLMs), we construct this dataset called OpenMMlo (Open
Multi-Modal Long-tailed dataset),
by extending the open-source datasets, namely ImageNet-LT, iNatualist2018 and
Places-LT. ImageNet-LT has 1,000 classes and contains 115.8k samples, with a maximum of
1,280 samples and a minimum of 5 samples for a category. Besides, it consists of 18k images
for OOD detection. Places-LT has 184.5K samples from 365 classes, with class samples ranging
from 4,980 to 5. The iNaturalist 2018 is a large-scale species dataset collected in the
natural world with 437.5K samples for 8,142 classes. We use the InstructBLIP to generate
the related caption of the image, with the prompt of "What does this picture describe?
Please describe in detail its size, location, color, and its relationship to the
surroundings.".