[1910.10871] Preventing Adversarial Use of Datasets through Fair Core-Set Construction
We hope that this work will pave the way for smaller and more private datasets in the future.
Abstract: We propose improving the privacy properties of a dataset by publishing only a
strategically chosen "core-set" of the data containing a subset of the
instances. The core-set allows strong performance on primary tasks, but forces
poor performance on unwanted tasks. We give methods for both linear models and
neural networks and demonstrate their efficacy on data.
‹Figure 1: Visualization of the datasets (green/red) and core-sets (blue) for each feature and label from Table 1. (A synthetic case study using linear regression)Figure 2: CNN CIFAR-100 test accuracy trained on k = 5000 class-balanced core-sets versus the average gradient norm of the examples used. Core-sets made of smaller-normed examples do better. (Core-sets for CIFAR-100)›