Enable JavaScript to see more content
Recently hyped ML content linked in one simple page
Sources: reddit/r/{MachineLearning,datasets}, arxivsanity, twitter, kaggle/kernels, hackernews, awesomedatasets, sota changes
Made by: Deep Phrase HK Limited
1


[1910.11144] A Comparative Study of Neural Network Compression
(in the figures we delimit the area of major interest with two vertical lines, since the highcompression regime suffers from poor accuracy and the lowcompression regime is still uninformative as for the efficiency of the method).
Abstract: There has recently been an increasing desire to evaluate neural networks
locally on computationallylimited devices in order to exploit their recent
effectiveness for several applications; such effectiveness has nevertheless
come together with a considerable increase in the size of modern neural
networks, which constitute a major downside in several of the aforementioned
computationallylimited settings. There has thus been a demand of compression
techniques for neural networks. Several proposal in this direction have been
made, which famously include hashingbased methods and pruningbased ones.
However, the evaluation of the efficacy of these techniques has so far been
heterogeneous, with no clear evidence in favor of any of them over the others.
The goal of this work is to address this latter issue by providing a
comparative study. While most previous studies test the capability of a
technique in reducing the number of parameters of stateoftheart networks ,
we follow [CWT + 15] in evaluating their performance on basic architectures on
the MNIST dataset and variants of it, which allows for a clearer analysis of
some aspects of their behavior. To the best of our knowledge, we are the first
to directly compare famous approaches such as HashedNet, Optimal Brain Damage
(OBD), and magnitudebased pruning with L1 and L2 regularization among them and
against equivalentsize feedforward neural networks with simple
(fullyconnected) and structural (convolutional) neural networks. Rather
surprisingly, our experiments show that (iterative) pruningbased methods are
substantially better than the HashedNet architecture, whose compression doesn't
appear advantageous to a carefully chosen convolutional network. We also show
that, when the compression level is high, the famous OBD pruning heuristics
deteriorates to the point of being less efficient than simple magnitudebased
techniques.
‹Figure 1: Accuracies of the HashedNet and convolutional network architectures, based on the number of parameters, on the MNIST dataset. The results for HashedNet are obtained by compressing a network with 100 hidden units in a single hidden layer from 0% compression up to 99% compression. The convolutional networks consistently outperform HashedNet. (Introduction)Figure 2: Accuracies of pruning methods and simple convolutional networks, with almost the same number of parameters on the MNIST dataset. The results for the pruning methods are obtained by compressing fully connected networks with 100 hidden units, starting from 0% compression up to 99% compression. (Introduction)Figure 3: Accuracies of pruning methods and simple convolutional networks with almost the same number of parameters on the rotated MNIST dataset. The results are obtained by compressing fully connected networks with 100 hidden units, starting from 0% compression up to 99% compression. (Introduction)Figure 4: Accuracies of pruning methods and simple convolutional network with almost the same number of parameters on the MNIST dataset with random background. The results are obtained by compressing fully connected networks with 100 hidden units, starting from 0% compression up to 99% compression. (Introduction)Figure 5: Accuracies of pruning methods and simple convolutional network with almost the same number of parameters on the MNIST dataset with images in background. The results are obtained by compressing fully connected networks with 100 hidden units, starting from 0% compression up to 99% compression. (Introduction)Figure 6: Histograms of the weight distribution in two networks with 50 hidden units and compression ratio of 0.04. The two networks have been pruned with magnitudebased pruning and the only difference is the regularization method. The histograms are obtained after 230 epochs, with accuracy above 95% on the MNIST dataset. The last pruning has been performed in epoch 210, when the network has been close to a local minimum. The L1 regularization appears to have caused a stronger concentration of the weights around zero. (Introduction)›



Related: TFIDF
[1504.04788] Compressing Neural Networks with the Hashing Trick[1810.02340] SNIP: Singleshot Network Pruning based on Connection Sensitivity[1507.06149] Datafree parameter pruning for Deep Neural Networks[1907.02051] SpatiallyCoupled Neural Network Architectures[1611.06211] NoiseOut: A Simple Way to Prune Neural Networks[1606.04333] Neither Quick Nor Proper  Evaluation of QuickProp for Learning Deep Neural Networks[1901.07066] On Compression of Unsupervised Neural Nets by Pruning Weak Connections[1702.06257] The Power of Sparsity in Convolutional Neural Networks[1412.1442] Memory Bounded Deep Convolutional Networks[1705.07565] Learning to Prune Deep Neural Networks via Layerwise Optimal Brain Surgeon

Related: TFIDF
[1504.04788] Compressing Neural Networks with the Hashing Trick[1810.02340] SNIP: Singleshot Network Pruning based on Connection Sensitivity[1507.06149] Datafree parameter pruning for Deep Neural Networks[1907.02051] SpatiallyCoupled Neural Network Architectures[1611.06211] NoiseOut: A Simple Way to Prune Neural Networks[1606.04333] Neither Quick Nor Proper  Evaluation of QuickProp for Learning Deep Neural Networks[1901.07066] On Compression of Unsupervised Neural Nets by Pruning Weak Connections[1702.06257] The Power of Sparsity in Convolutional Neural Networks[1412.1442] Memory Bounded Deep Convolutional Networks[1705.07565] Learning to Prune Deep Neural Networks via Layerwise Optimal Brain Surgeon