[1910.11109] Attention-Guided Lightweight Network for Real-Time Segmentation of Robotic Surgical Instruments
In this paper, we propose an attention-guided lightweight network named LWANet for real-time segmentation of surgical instruments
Abstract: Real-time segmentation of surgical instruments plays a crucial role in
robot-assisted surgery. However, real-time segmentation of surgical instruments
using current deep learning models is still a challenging task due to the high
computational costs and slow inference speed. In this paper, an
attention-guided lightweight network (LWANet), is proposed to segment surgical
instruments in real-time. LWANet adopts the encoder-decoder architecture, where
the encoder is the lightweight network MobileNetV2 and the decoder consists of
depth-wise separable convolution, attention fusion block, and transposed
convolution. Depth-wise separable convolution is used as the basic unit to
construct the decoder, which can reduce the model size and computational costs.
Attention fusion block captures global context and encodes semantic
dependencies between channels to emphasize target regions, contributing to
locating the surgical instrument. Transposed convolution is performed to
upsample the feature map for acquiring refined edges. LWANet can segment
surgical instruments in real-time, taking few computational costs. Based on
960*544 inputs, its inference speed can reach 39 fps with only 3.39 GFLOPs.
Also, it has a small model size and the number of parameters is only 2.06 M.
The proposed network is evaluated on two datasets. It achieves state-of-the-art
performance 94.10% mean IOU on Cata7 and obtains a new record on EndoVis 2017
with 4.10% increase on mean mIOU.
‹Fig. 1. Challenges in semantic segmentation for surgical instruments. Different types of surgical instruments are marked by different colors. (INTRODUCTION)Fig. 2. The architecture of Attention-guided Lightweight Network and its components. (a) Attention-guided Lightweight Network: it adopts the encoder-decoder architecture. (b) Attention Fusion Block (c) Depth-wise Separable Convolution (Methodology)Fig. 3. Visualization results of LWANet on Cata7. Different types of surgical instruments are marked by different colors. (Cata7)Fig. 4. Visualization results of LWANet on EndoVis 2017. From top to bottom: image, prediction and ground truth. Different types of surgical instruments are marked by different colors. (EndoVis 2017)›
[1904.02216] DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation[1803.01207] Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning[1905.08663] RASNet: Segmentation for Tracking Surgical Instruments in Surgical Videos Using Refined Attention Segmentation Network[1606.02147] ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
Related: Semantic Math
[1901.08616] In Defense of the Triplet Loss for Visual Recognition[1807.11236] Semantic Labeling in Very High Resolution Images via a Self-Cascaded Convolutional Neural Network[1907.04325] Image based Eye Gaze Tracking and its Applications[1710.04089] Quantized Minimum Error Entropy Criterion[1711.06540] Learning a Robust Representation via a Deep Network on Symmetric Positive Definite Manifolds[1910.09182] Hadamard Codebook Based Deep Hashing[1709.08172] Can Image Retrieval help Visual Saliency Detection?[1902.08571] A Review, Framework and R toolkit for Exploring, Evaluating, and Comparing Visualizations[1808.07251] Genie: An Open Box Counterfactual Policy Estimator for Optimizing Sponsored Search Marketplace[1810.11408] Anytime Stereo Image Depth Estimation on Mobile Devices