[1910.03810v1] Adversarial Learning of Deepfakes in Accounting
We also provided initial evidence that such a model of disentangled latent generative factors can be maliciously misused by a potential perpetrator to attack CAATs regularly applied in financial audits.
Abstract Nowadays, organizations collect vast quantities of accounting relevant transactions, referred to as ’journal entries’, in ’Enterprise Resource Planning’ (ERP) systems. The aggregation of those entries ultimately defines an organization’s financial statement. To detect potential misstatements and fraud, international audit standards demand auditors to directly assess journal entries using ’Computer Assisted Audit Techniques’ (CAATs). At the same time, discoveries in deep learning research revealed that machine learning models are vulnerable to ’adversarial attacks’. It also became evident that such attack techniques can be misused to generate ’deepfakes’ designed to directly attack the perception of humans by creating convincingly altered media content. The research of such developments and their potential impact on the finance and accounting domain is still in its early stage. We believe that it is of vital relevance to investigate how such techniques could be maliciously misused in this sphere. In this work, we show an adversarial attack against CAATs using deep neural networks. We first introduce a real-world ’thread model’ designed to camouflage accounting anomalies such as fraudulent journal entries. Second, we show that adversarial autoencoder neural networks are capable of learning a human interpretable model of journal entries that disentangles the entries latent generative factors. Finally, we demonstrate how such a model can be maliciously misused by a perpetrator to generate robust ’adversarial’ journal entries that mislead CAATs.
‹Figure 1: Exemplary audit ’thread model’ designed to camouflage accounting anomalies or fraudulent activities. Regular journal entries are replaced or enriched by the injection of ’fake’ adversarial entries deliberately sampled from a deep generative model. (Introduction)Figure 2: The adversarial autoencoder architecture as introduced in [41], applied to learn a disentangled and human interpretable representation of the journal entries generative latent factors. (Adversarial Accounting Model)Figure 3: Exemplary prior distribution pn(x) consisting of a. 2D-grid of τ = 25 equidistant Gaussians (left). Learned latent posterior distribution qd(z) that disentangles the journal entries high-order factors of variation into the distinct Gaussians (middle). Learned low-order disentanglement of the journal entries log-normalized local posting amounts of a single Gaussian (right). (Latent Factor Disentanglement)Figure 4: Robust adversarial journal entry sampling: (a) the combination sampling map in Z of the local posting amount attribute in Data-A, (b) the corresponding robustness sampling map, (c) the obtained adversarial sampling region qs(zk=14) combining (a) and (b) with dφ(z) ≥ 0.568, and, (d) generated adversarial journal entries XAdv when sampling along the posting amount trajectory resulting in generated entries that exhibit an increased posting amount. (Creation of Adversarial Accounting Records)Figure 5: Exemplary AAE reconstruction LRE θ and discrimination-losses LDI θ,φ evaluated for Data-A (left) and Data-B (right) as well as varying learning rates η with progressing training 5,000 epochs. (Appendix B Experimental Details)Figure 6: Aggregated posterior distributions qd(z) learned when training the AAE architecture for up to 10,000 training epochs. Learned posterior qd(z) of Data-A when sampling from a prior pn(z) consisting of an equidistant 2D-grid of τ ∈ {9, 25, 36, 64} isotropic Gaussians (top row). Learned posterior qd(z) of Data-B when sampling from a prior pn(z) consisting of an equidistant 2D-grid of τ ∈ {36, 49, 64, 81} isotropic Gaussians (bottom row). (Appendix C High-Order Latent Factor Disentanglement) (Appendix D Low-Order Latent Factor Disentanglement)Figure 7: Exemplary robust sample map obtained for Gaussian zk=14 in Data-A (top-left) and corresponding combination sample maps of the distinct journal entry attributes xj (others). The corresponding AAE model is trained for 10,000 training epochs imposing a prior comprised of τ = 25 equidistant isotropic Gaussians. (Appendix D Low-Order Latent Factor Disentanglement) (Appendix D Low-Order Latent Factor Disentanglement)Figure 8: Exemplary robust sample map obtained for Gaussian zk=15 in Data-B (top-left) and corresponding combination sample maps of the distinct journal entry attributes xj (others). The corresponding AAE model is trained for 10,000 training epochs imposing a prior comprised of τ = 25 equidistant isotropic Gaussians. (Appendix D Low-Order Latent Factor Disentanglement) (Appendix E Robust Sampling of Adversarial Journal Entries)Figure 9: Exemplary equidistant traversal in z1 ∈ [−0.2; 0.6] with δ = 0.02 while keeping z2 = 0 fixed in the adversarial sampling region qs(zk=15) of Data-B. Sample ids i are denoted by the black arrows of the adversarial sampling maps. The individual journal entries generated per sample zi in qs(zk) are shown in Tab. ??. (Appendix E Robust Sampling of Adversarial Journal Entries) (Appendix E Robust Sampling of Adversarial Journal Entries)Figure 10: Exemplary random sampling outside of the adversarial sampling region qs(zk=15) in z1 ∈ [−0.2; 0.6] and z2 ∈ [0.15; −0.15] of Data-B. Sample ids i are denoted by the black arrows of the adversarial sampling maps. The individual journal entries generated per sample zi in qs(zk) are shown in Tab. ?? (Appendix E Robust Sampling of Adversarial Journal Entries)›