[1912.10154v1] Measuring Dataset Granularity
In the future, we would like to: 1) explore additional axioms of dataset granularity (if any) and study the completeness of our framework, 2) find other fundamental dataset properties that cannot be captured by granularity, and 3) apply granularity measure in constructing datasets and benchmarking existing methods

Abstract Despite the increasing visibility of fine-grained recognition in our field, “fine-grained” has thus far lacked a precise definition. In this work, building upon clustering theory, we pursue a framework for measuring dataset granularity. We argue that dataset granularity should depend not only on the data samples and their labels, but also on the distance function we choose. We propose an axiomatic framework to capture desired properties for a dataset granularity measure and provide examples of measures that satisfy these properties. We assess each measure via experiments on datasets with hierarchical labels of varying granularity. When measuring granularity in commonly used datasets with our measure, we find that certain datasets that are widely considered fine-grained in fact contain subsets of considerable size that are substantially more coarse-grained than datasets generally regarded as coarse-grained. We also investigate the interplay between dataset granularity with a variety of factors and find that fine-grained datasets are more difficult to learn from, more difficult to transfer to, more difficult to perform few-shot learning with, and more vulnerable to adversarial attacks.
‹Figure 1. Using our measure of dataset granularity, which ranges from 0 for the finest granularity to 1 for the coarsest, we show two subsets from CUB-200 with 10 classes that are fine-grained (CUB200-Bitter) and coarse-grained (CUB-200-Sweet). CIFAR-10, which is widely considered coarse-grained, is more fine-grained than CUB-200-Sweet. We represent a class by a randomly selected image from that class. (Introduction)Figure 2. Illustration of granularity consistent transformation and isomorphic transformation on a dataset with 4 classes, denoted by 4 disks with different sizes and colors. We show the original dataset S on the left and the transformed dataset S0 on the right. In granularity consistent transformation, we reduce the within-class distances and enlarge the between-class distances. In isomorphic transformation, we permute the class indices. (Properties of Dataset Granularity Measures)Figure 3. Dataset granularity measures RS, RSM, Rank and RankM on CIFAR-100 dataset with increasing number of coarsegrained classes re-labeled to fine-grained. The solid line denotes mean and the shaded region represents standard deviation. (Examples of Dataset Granularity Measures)Figure 4. Dataset granularity using features extracted from different networks pre-trained on ImageNet. (Distance Function)Figure 5. Classification accuracy vs. granularity on different tasters. Each dataset is represented by line segments with the same color. Markers with different shapes represent different tasters. (Experiments)Figure 6. Dataset granularity correlates well with classification accuracy on CIFAR. The purple square marker represents CIFAR10 dataset. Other markers represent the CIFAR-100 with different number of classes (by re-labeling coarse-grained classes with finegrained), where the purple triangle denotes CIFAR-100 with 20 coarse-grained labels and the red diamond represents CIFAR-100 with 100 fine-grained labels. (Granularity versus Classification)

Figure 7. Transfer learning performance and granularity on 7 datasets. Each dataset is represented by markers with the same color. We use markers with different shapes to represent pretraining on different datasets. (Granularity versus Classification)Figure 8. t-SNE embedding of first 20 classes from Stanford-Dogs and NABirds, with ImageNet pre-training (IN) on the left and iNaturalist-2017 (iNat) pre-training on the right. The granularity is shown in the bracket. (Granularity versus Classification)Figure 9. Examples of simulated datasets of 2 classes. Samples from each class are draw from a 2-dimensional multivariate normal distributions with unit covariance matrix. m denotes the distance between the means of two classes. (Appendix B: Learning Difficulty)Figure 10. Dataset granularity measures Fisher, RS, RSM, Rank and RankM on simulated dataset with variant m. The solid line denotes mean and the shaded region represents standard deviation. (Appendix B: Learning Difficulty)Figure 11. Dataset granularity correlates well with difficulty (training error rate using linear logistic regression). The purple square marker represents CIFAR-10 dataset. Other markers represent the CIFAR-100 with different number of classes (by re-labeling coarse-grained classes with fine-grained), where the purple triangle and the red diamond denote CIFAR-100 with 20 and 100 coarse-grained labels respectively. (Appendix B: Learning Difficulty)Figure 12. Training difficulty versus granularity on all pairs of classes from CIFAR. The number in the bracket denotes the Pearson’s ρ correlation coefficient. (Appendix B: Learning Difficulty)Figure 13. t-SNE embedding of subsets from CIFAR-10 and CIFAR-100. The granularity is shown in the bracket. (Appendix B: Learning Difficulty)Figure 14. Dataset granularity on different datasets with respect to nuisance factors including Gaussian noise, Salt & Pepper noise and reduced resolution. The datasets become more fine-grained when we increase the level of noise and reduce the image resolution. (Appendix C: Sensitivity to Nuisance Factors)