Enable JavaScript to see more content
Recently hyped ML content linked in one simple page
Sources: reddit/r/{MachineLearning,datasets}, arxivsanity, twitter, kaggle/kernels, hackernews, awesomedatasets, sota changes
Made by: Deep Phrase HK Limited
1


[1911.09723v1] Fast Sparse ConvNets
On a Snapdragon 835 the sparse networks we present in this paper outperform their dense equivalents by 1.3 − 2.4× in terms of wall clock time for a given top1 accuracy while needing only ≈ 66% as many parameters – equivalent to approximately one entire generation of improvement
*Abstract Historically, the pursuit of efficient inference has been one of the driving forces behind research into new deep learning architectures and building blocks. Some recent examples include: the squeezeandexcitation module [16], depthwise separable convolutions in Xception [4], and the inverted bottleneck in MobileNet v2 [36]. Notably, in all of these cases, the resulting building blocks enabled not only higher efficiency, but also higher accuracy, and found wide adoption in the field. In this work, we further expand the arsenal of efficient building blocks for neural network architectures; but instead of combining standard primitives (such as convolution), we advocate for the replacement of these dense primitives with their sparse counterparts. While the idea of using sparsity to decrease the parameter count is not new [43], the conventional wisdom is that this reduction in theoretical FLOPs does not translate into realworld efficiency gains. We aim to correct this misconception by introducing a family of efficient sparse kernels for ARM and WebAssembly, which we opensource for the benefit of the community as part of the XNNPACK [30] library. Equipped with our efficient implementation of sparse primitives, we show that sparse versions of MobileNet v1, MobileNet v2 and EfficientNet architectures substantially outperform strong dense baselines on the efficiencyaccuracy curve. On Snapdragon 835 our sparse networks outperform their dense equivalents by 1.3 − 2.4× – equivalent to approximately one entire generation of MobileNetfamily improvement. We hope that our findings will facilitate wider adoption of sparsity as a tool for creating efficient and accurate deep learning architectures.
‹Figure 1: MobileNet v1 and v2 and EfficientNet models. Sparse models: blue (solid), dense models: red (dotted). Sparse models include the cost of storing the location of nonzeros for sparse tensors as a bitmask converted back into parameter count. That is every 32 values in the bitmask contributes one “parameter”. (Introduction)Figure 2: Sparse 1x1 Convolution as SpMM. Left: Unstructured sparsity (or block size 1). Right: Output channel block size of 4 (Related Work)Figure 3: Visualization of the memory reads and writes of our algorithm. In step 1, we load 8 spatial locations simultaneously for each of the nonzero weights in the first row of the weight matrix. We also prefetch the values that will be needed for the next set of columns (shown in same color but hatched). We multiply each scalar weight by its corresponding row, accumulate the results, and in the end write them out. Step 2 performs the same calculation for the next output channel. After steps 1 and 2, all values for these spatial locations are in the cache, so future loads in steps 3 and 4 will be fast, despite being random access. (Related Work)Figure 4: FLOPs with increasing layer depth. All measurements taken on a Snapdragon (SD) 835. Effective assumes 90% sparse MBv1 and 85% sparse MBv2 models. (Methods)Figure 5: Effect of block size on top1 accuracy. It only matters how many elements are in a block, the configuration is unimportant. (ARM Kernel Performance)Figure 6: Effect of sparsity on top1 accuracy. The sparser a model is, the fewer flops it requires to achieve a given Top1 accuracy. (Model Performance)Figure 7: Efficiency of models with layer N and onward blocked. The xaxis corresponds to turning that layer and all following layers to block size 4, the prior layers are unstructured. The yaxis is the efficiency of making this change over an unstructured model given as a ratio where the numerator is the speedup of changing the block(s) from unstructured to block size 4 and the denominator is the decrease in top1 accuracy that occurs by making this change. (Model Performance)Figure 8: MBv1 layer wise sparsities found with Variational Dropout. The curve with the highest regularization coefficient shows an interesting phenomena of collapse – the model is actually less sparse and with more uniformity than models with lower regularization coefficients. The layer just before the final spatial resolution decrease and channel doubling is preferred to be less than those before and after. Early layers which have very few parameters are less sparse than later layers. (Comparison with Intel MKL)Figure 9: MBV1 (a) and MBv2 (b) achieved GFLOPs with increasing layer depth. Measurements taken on an Intel Xeon W2135. (NonUniform Layerwise Sparsity with Variational Dropout)Figure 10: MBv2 layer wise sparsities found with Variational Dropout. Generally, it is preferred to keep the expansion matrices slightly less sparse than the contraction matrices. There is a general trend to keeping to early layers with few parameters less sparse than later layers. Within that general trend layers where the spatial resolution decreases and the channel count increases are much less sparse than would otherwise be expected. (NonUniform Layerwise Sparsity with Variational Dropout)Figure 11: EfficientNet behavior with: (a) different block sizes and (b) different sparsity levels. (EfficientNet Plots)›



Related: TFIDF
[1911.09723] Fast Sparse ConvNets[1907.09707] RRNet: RepetitionReduction Network for Energy Efficient Decoder of Depth Estimation[1605.06489] Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups[1809.07196] Characterising AcrossStack Optimisations for Deep Convolutional Neural Networks[1802.10280] Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs[1903.10258] MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning[1612.06519] Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale[1904.02422] Resource Efficient 3D Convolutional Neural Networks[1907.02157] SlimCNN: A LightWeight CNN for Face Attribute Prediction

Related: TFIDF
[1911.09723] Fast Sparse ConvNets[1907.09707] RRNet: RepetitionReduction Network for Energy Efficient Decoder of Depth Estimation[1605.06489] Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups[1809.07196] Characterising AcrossStack Optimisations for Deep Convolutional Neural Networks[1802.10280] Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs[1903.10258] MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning[1612.06519] Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale[1904.02422] Resource Efficient 3D Convolutional Neural Networks[1907.02157] SlimCNN: A LightWeight CNN for Face Attribute Prediction