[1910.03151v1] ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
Experimental results demonstrate our ECA is an extremely lightweight plugand-play block to improve the performance of various deep CNN architectures, including the widely used ResNets and lightweight MobileNetV2
Abstract Channel attention has recently demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most existing methods dedicate to developing more sophisticated attention modules to achieve better performance, inevitably increasing the computational burden. To overcome the paradox of performance and complexity trade-off, this paper makes an attempt to investigate an extremely lightweight attention module for boosting the performance of deep CNNs. In particular, we propose an Efficient Channel Attention (ECA) module, which only involves k (k ≤ 9) parameters but brings clear performance gain. By revisiting the channel attention module in SENet, we empirically show avoiding dimensionality reduction and appropriate cross-channel interaction are important to learn effective channel attention. Therefore, we propose a local cross-channel interaction strategy without dimension reduction, which can be efficiently implemented by a fast 1D convolution. Furthermore, we develop a function of channel dimension to adaptively determine kernel size of 1D convolution, which stands for coverage of local crosschannel interaction. Our ECA module can be flexibly incorporated into existing CNN architectures, and the resulting CNNs are named by ECA-Net. We extensively evaluate the proposed ECA-Net on image classification, object detection and instance segmentation with backbones of ResNets and MobileNetV2. The experimental results show our ECANet is more efficient while performing favorably against its counterparts. The source code and models can be available at https://github.com/BangguWu/ECANet.
‹Figure 1: Comparison of various attention modules (i.e., SENet (Hu, Shen, and Sun 2018), CBAM (Woo et al. 2018), A2 -Nets (Chen et al. 2018) and ECA-Net) using ResNets (He et al. 2016a) as backbone models in terms of accuracy, network parameters and FLOPs. Sizes of circles indicate model computation (FLOPs). Clearly, our ECA-Net obtains higher accuracy while having less model complexity. (Introduction)Figure 2: Comparison of (a) SE block and (b) our efficient channel attention (ECA) module. Given the aggregated feature using global average pooling (GAP), SE block computes weights using two FC layers. Differently, ECA generates channel weights by performing a fast 1D convolution of size k, where k is adaptively determined via a function of channel dimension C. (Introduction)Figure 3: PyTorch code of our ECA module. (Local Cross-Channel Interaction)Figure 4: Results of our ECA module with various numbers of k using ResNet-50 and ResNet-101 as backbone models. Here, we also give the results of ECA module with adaptive selection of kernel size and compare with SENet as baseline. (Implementation Details)Figure 5: Example images of four random sampled classes on ImageNet, including hammerhead shark, ambulance, medicine chest and butternut squash. (Appendix A2. Visualization of Weights Learned by ECA Modules and SE Blocks)
Figure 6: Visualization of the values of global average pooling on activations in different convolution layers, where different images have similar trend in the same convolution layer. Meanwhile, these trends present a certain kind of local periodicities, and some of them are indicated by red rectangular boxes. Better view with zooming in. (Appendix A2. Visualization of Weights Learned by ECA Modules and SE Blocks)Figure 7: Visualization the channel weights of conv i j, where i indicate i-th stage and j is j-th convolution block in i-th stage. The channel weights learned by ECA modules and SE blocks are illustrated in bottom and top of each row, respectively. Better view with zooming in. (Appendix A2. Visualization of Weights Learned by ECA Modules and SE Blocks)›