[1910.06663v1] AI Benchmark: All About Deep Learning on Smartphones in 2019
We presented an overview of recently released mobile chipsets that can be potentially used for accelerating the execution of neural networks on smartphones and other portable devices, and provided an overview of the latest changes in the Android machine learning pipeline
*Abstract The performance of mobile AI accelerators has been evolving rapidly in the past two years, nearly doubling with each new generation of SoCs. The current 4th generation of mobile NPUs is already approaching the results of CUDAcompatible Nvidia graphics cards presented not long ago, which together with the increased capabilities of mobile deep learning frameworks makes it possible to run complex and deep AI models on mobile devices. In this paper, we evaluate the performance and compare the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference. We also discuss the recent changes in the Android ML pipeline and provide an overview of the deployment of deep learning models on mobile devices. All numerical results provided in this paper can be found and are regularly updated on the official project website [http://ai-benchmark.com].
‹Figure 1: Performance evolution of mobile AI accelerators: image throughput for the float Inception-V3 model. Mobile devices were running the FP16 model using TensorFlow Lite and NNAPI. Acceleration on Intel CPUs was achieved using the Intel MKL-DNN library , on Nvidia GPUs – with CUDA  and cuDNN . The results on Intel and Nvidia hardware were obtained using the standard TensorFlow library  running the FP32 model with a batch size of 20 (the FP16 format is currently not supported by these CPUs / GPUs). Note that the Inception-V3 is a relatively small network, and for bigger models the advantage of Nvidia GPUs over other silicon might be larger. (Introduction)Figure 2: The overall architecture of the Exynos 9820 NPU . (Hardware Acceleration)Figure 3: A general architecture of the Huawei’s DaVinci Core. (Hardware Acceleration)Figure 4: SoC components integrated into the Kirin 990 chips. (HiSilicon chipsets / HiAI SDK)Figure 5: Qualcomm Snapdragon 855 (left) and MediaTek Helio P90 (right) block diagrams. (Qualcomm chipsets / SNPE SDK)Figure 6: Schematic representation of MediaTek NeuroPilot SDK. (MediaTek chipsets / NeuroPilot SDK)Figure 7: Schematic representation of Unisoc UNIAI SDK. (Unisoc chipsets / UNIAI SDK)Figure 8: Sample result visualizations displayed to the user in deep learning tests. (TensorFlow Lite GPU Delegate)Figure 9: Benchmark results displayed after the end of the tests. (Deep Learning Tests)Figure 10: Tests, results and options displayed in the PRO Mode. (PRO Mode)›