[2001.00030v1] Quantum Adversarial Machine Learning
We found that, similar to classical classifiers based on deep neural networks, quantum classifiers are likewise extremely vulnerable to adversarial attacks: adding a tiny amount of carefully-crafted perturbations, which are imperceptible to human eyes or ineffective to conventional methods, into the original legitimate data (either classical or quantum mechanical) will cause the quantum classifiers to make incorrect predictions with a notably high confidence level

\begin{abstract}
Adversarial machine learning is an emerging field that focuses on studying vulnerabilities of machine learning approaches in adversarial settings and developing techniques accordingly to make learning robust to adversarial manipulations. It plays a vital role in various machine learning applications and has attracted tremendous attention across different communities recently. In this paper, we explore different adversarial scenarios in the context of quantum machine learning. We find that, similar to traditional classifiers based on classical neural networks, quantum learning systems are likewise vulnerable to crafted adversarial examples, independent of whether the input data is classical or quantum. In particular, we find that a quantum classifier that achieves nearly the state-of-the-art accuracy can be conclusively deceived by adversarial examples obtained via adding imperceptible perturbations to the original legitimate samples. This is explicitly demonstrated with quantum adversarial learning in different scenarios, including classifying real-life images (e.g., handwritten digit images in the dataset MNIST), learning phases of matter (such as, ferromagnetic/paramagnetic orders and symmetry protected topological phases), and classifying quantum data. Furthermore, we show that based on the information of the adversarial examples at hand, practical defense strategies can be designed to fight against a number of different attacks. Our results uncover the notable vulnerability of quantum machine learning systems to adversarial perturbations, which not only reveals a novel perspective in bridging machine learning and quantum physics in theory but also provides valuable guidance for practical applications of quantum classifiers based on both near-term and future quantum technologies.
\end{abstract}
‹FIG. 1. A schematic illustration of quantum adversarial machine learning. (a) A quantum classifier that can successfully identify the image of a panda as “panda” with the state-of-the-art accuracy. (b) Adding a small amount of carefully-crafted noise will cause the same quantum classifier to misclassify the slightly modified image, which is indistinguishable from the original one to human eyes, into a “gibbon” with notable high confidence. (Introduction)FIG. 2. The sketch of a quantum circuit classifier. The classifier consists of p layers, with each layer containing a rotation unit and an entangler unit. The rotation unit performs arbitrary single-qubit Euler rotations implemented as a combination of Z and X gates: Uq,i(θ) = Zθc q,i Xθb q,i Zθa q,i with θ representing the Euler angles, q identifying the qubit, and i = 1, 2, · · · , p referring to the label of layers. The entangler unit entangles all qubits and is composed of a series of CNOT gates. The initial state |ψiin, which is a n-qubit state, encodes the complete information of the input data to be classified. The projection measurement on the output qubits give the predicting probability for each category and the input data is assigned a label that bearing the largest probability. (Vulnerability of quantum classifiers)FIG. 3. A sketch of adding adversarial perturbations to the input data for quantum classifiers. Throughout this paper, we mainly focus on evasion attack [25], which is the most common type of attack in adversarial learning. In this setting, the attacker attempts to deceive the quantum classifiers by adjusting malicious samples during the testing phase. Adding a tiny amount of adversarial noise can cause quantum classifiers to make incorrect predictions. (Vulnerability of quantum classifiers)FIG. 4. The average accuracy and loss as a function of the number of training steps. We use a depth-10 quantum classifier with structures shown in Fig. ?? to perform binary classification for images of digits 1 and 9 in MNIST. To train the classifier, we use the Adam optimizer with a batch size of 256 and a learning rate of 0.005 to minimize the loss function in Eq. (??). The accuracy and loss are averaged on 11633 training samples and 1058 validation samples (which are not contained in the training dataset). (Vulnerability of quantum classifiers)FIG. 5. The average accuracy and loss for the four-category quantum classifier as a function of the number of epochs. Here, we use a quantum classifier with structures shown in Fig. ?? and depth forty (p = 40) to perform multi-class classification for images of digits 1, 3, 7, and 9. To train the classifier, we use the Adam optimizer with a batch size of 512 and learning rate of 0.005 to minimize the loss function in Eq. (??). The accuracy and loss are averaged on 20000 training samples and 2000 validation samples. (Quantum adversarial learning images)FIG. 6. The clean and the corresponding adversarial images for the quantum classifier generated by the basic iterative method (see Appendix). Here, we apply the additive attack in the white-box untargeted setting. For the legitimate clean images, the quantum classifier can correctly predict their labels with confidence larger than 78%. After attacks, the classifier will misclassify the crafted images of digit 1 (9) as digit 9 (1) with notably high confidence, although the differences between the crafted and clean images are almost imperceptible to human eyes. In fact, the average fidelity is 0.916, which is very close to unity. (Quantum adversarial learning images)FIG. 7. Effect of adversarial untargeted additive attacks on the accuracy of the quantum classifier for the problem of classifying handwritten digits. We use the basic iterative method to obtain adversarial examples. The circuit depth of the model is 20. We choose the step size as 0.1. (a)-(b) For the classifier that classifies digit 1 and 9, accuracy decreases as the average fidelity between the adversarial samples and clean samples decreases. Accuracy decreases as we increase the number of iterations of the attacking algorithm. (c)-(d) Similar plots for the problem of classifying four digits 1, 3, 7, and 9. (Quantum adversarial learning images)FIG. 8. Effects of adversarial untargeted functional attack on the accuracy of the quantum classifier for the problem of classifying handwritten digits 1 and 9. Here, the adversarial perturbation operators are assumed to be a layer of local unitary transformation. We use both the BIM method and the FGSM method to obtain adversarial examples. (a) For the BIM method, we generated adversarial perturbations using different number of iterations with the fixed step size 0.1. (b) For the FGSM method, we generate adversarial perturbations using different step sizes, and the accuracy drops accordingly with increasing step size. (Quantum adversarial learning images)

FIG. 9. Visual illustration of adversarial examples crafted using different attacks. From top to bottom: the clean and adversarial images generated for the quantum classifier by the BIM algorithm. By applying the additive attack, we can change the quantum classifier’s classification result. The top images represent an correctly predicted legitimate example. The bottom images are incorrectly predicted adversarial example, even though they bear a close resemblance to the clean image. Here, the attacking algorithm we employed is BIM(0.1,3) (White-box attack: targeted)FIG. 11. Effects of depolarizing noises with varying strength on the accuracy of the quantum classifiers with depth p = 20. The mean classification accuracy is computed on the test set with respect to the fidelity between the original input states and the states affected by depolarizing noises on each qubit with varying strengths. The accuracy and fidelity are averaged over 1000 random realizations. (a) Results for the two-category quantum classifier. (b) Results for the four-category quantum classifier. (Adversarial perturbations are not random noises)FIG. 10. White-box targeted attacks for the four-category quantum classifier with depth p = 40. (a) The classification probabilities for each digits as a function of the number of attacking epochs. Here, we use the BIM method to attack the quantum classifier. (b) The loss for classifying the image to be 1 or 3 as a function of the number of epochs. (c-d) Similar plots for the functional attacks. (e-f) The accuracy as a function of the average fidelity during the attacking process. Here, we consider additive attacks with both the BIM (e) and FGSM (f) methods. (White-box attack: targeted)FIG. 12. (a) The average accuracy and loss for the two-category quantum classifier as a function of the number of epochs. Here, we use a quantum classifier with structures shown in Fig. ?? and depth ten (p = 10) to perform binary classification for topological/nontopological phases. To train the classifier, we use the Adam optimizer with a batch size of 512 and a learning rate of 0.005 to minimize the loss function in Eq. (??). The accuracy and loss are averaged on 19956 training samples and 6652 validation samples. (b) The accuracy of the quantum classifier as a function of the iterations of the BIM attack. Here, the BIM step size is 0.01. (Quantum adversarial learning topological phases of matter)FIG. 13. The clean and the corresponding adversarial time-offlight images for using the quantum classifier to classify topological phases. (Top) A legitimate sample of the density distribution in momentum space for the lower band with lattice size 10 × 10. (Bottom) An adversarial example obtained by the fast gradient sign method, which only differs with the original one by a tiny amount of noises that are imperceptible to human eyes. (Quantum adversarial learning topological phases of matter)FIG. 14. The average accuracy and loss function as a function of the number of training steps. We use a depth-10 quantum classifier with structures shown in Fig. ?? to classify the ferromagnetic/paramagnetic phases for the ground states of HIsing. We plot the accuracy of 1182 training samples and 395 validation samples (which are not in the training dataset). We present the results of the first 200 iteration epochs. The learning rate is 0.005. The difference between the training loss and validation loss is very small, indicating that the quantum classifier does not overfit. The final accuracy on the 395 test samples is roughly (98%). (Quantum adversarial learning topological phases of matter)FIG. 15. Effect of additive adversarial attack on the accuracy of the two-category quantum classifier in classifying the ferromagnetic/paramagnetic phases for the ground states of the transverse field Ising model. We use both the BIM and FGSM methods to generate adversarial examples in the white-box untargeted setting. For the BIM method, we fix the step size to be 0.05 and the iteration number to be ten. For the FGSM method, we perform the attack using a single step but with step size ranging from 0.1 to 1.0. The circuit depth of the quantum classifier being attacked is p = 10 and the system size for the Ising model is L = 8. (a) The results for the BIM attack. (b) The accuracy as a function of average fidelity between the legitimate and adversarial samples for both the BIM and FGSM methods. (Adversarial learning quantum data)FIG. 16. Strengthening the robustness of the quantum classifier against adversarial perturbations by quantum adversarial training. In each epoch, we first generate adequate adversarial examples with the BIM method for the quantum classifier with the current model parameters. The iteration number is set to be three and the BIM step size is set to be 0.05. Then, we train the quantum classifier with both the legitimate and crafted samples. The circuit depth of the quantum classifier is ten and the learning rate is set to be 0.005. (Defense: quantum adversarial training)FIG. 17. Quantum-adopted Fast Gradient Sign Method (White-box attacks)FIG. 18. Quantum-adapted Basic Iterative Method (White-box attacks)