[2001.02600] Deep Learning for Free-Hand Sketch: A Survey
Sincerely hope this survey will: (i) help the researchers to see the state of the free-hand sketch community within the deep learning background, and (ii) provide insights to the brave researchers to study the unsolved problems for sketch.
Abstract: Free-hand sketches are highly hieroglyphic and illustrative, which have been
widely used by humans to depict objects or stories from ancient times to the
present. The recent prevalence of touchscreen devices has made sketch creation
a much easier task than ever and consequently made sketch-oriented applications
increasingly more popular. The prosperity of deep learning has also immensely
promoted the research for the free-hand sketch. This paper presents a
comprehensive survey of the free-hand sketch oriented deep learning techniques.
The main contents of this survey include: (i) The intrinsic traits and
domain-unique challenges of the free-hand sketch are discussed, to clarify the
essential differences between free-hand sketch and other data modalities, e.g.,
natural photo. (ii) The development of the free-hand sketch community in the
deep learning era is reviewed, by surveying the existing datasets, research
topics, and the state-of-the-art methods via a detailed taxonomy. (iii)
Moreover, the bottlenecks, open problems, and potential research directions of
this community have also been discussed to promote the future works.
‹Fig. 1. Sketch-specific representations. Representations from left to right: picture (black background with white lines), picture (white background with black lines), graph, stroke sequence. For both graph and stroke sequence representations are based on the key stroke points. In stroke sequence, each key point is denoted as a four-bit vector, where the first two bits and the last two bits represent the coordinates and pen state, respectively. See details in text. (Background)Fig. 2. Illustrations of the domain-unique challenges of free-hand sketches. Each column is a photo-sketch pair. (Background)Fig. 3. Milestones of deep learning based free-hand sketch research in recent years, from the perspectives of task, dataset, and representation. Various representations for free-hand sketches are based on different deep neural network architectures, i.e., CNN, RNN, hybrid of CNN and RNN, GNN. (Background)Fig. 4. A tree diagram of the sketch dataset taxonomy. (Development History in Deep Learning Era)Fig. 5. A tree diagram of the sketch task taxonomy. Generative tasks are framed by dashed lines. (Representative Dataset Taxonomy)Fig. 6. Evolution of the existing deep learning-based sketch representations. Various modeling spaces are separated by dotted lines. The representative network architectures are provided. (Recognition)Fig. 7. Architecture of SketchRNN model. The dotted arrow line denotes the recurrent processing of LSTM decoder. For simplicity, the recurrent processing of bi-LSTM encoder is not shown here. (Generation)Fig. 8. Sketch samples of SPG dataset  (alarm clock , apple, butterfly, flower, airplane, ice cream). Semantically meaningful stroke groups are annotated by colors. Best viewed in color. (Grouping, Segmentation, and Parse)Fig. 9. Sketches (bus, car, cat) and ground truth annotations selected from sketch parse paper . The semantic parts and background are annotated by colors. Best viewed in color. (Grouping, Segmentation, and Parse)Fig. 12. Architecture of the discriminative-generative hybrid model for FG-SBIR proposed in . (Sketch-Photo Retrieval)Fig. 13. Illustration of a classical deep hashing pipeline, where the extractor backbone and hashing layer are alternatively optimized in two separate steps. (Sketch-Photo Retrieval)Fig. 14. Illustration of view matching across sketch and 3D shape. Images (from left to right: sketch, 3D shape, three random views of the 3D shape) are selected from . (Sketch-3D Retrieval)Fig. 15. Sketch samples randomly selected from sketch based video retrieval dataset TSF . The left sketch depicts that two persons are approaching each other. The right one depicts that a person is gliding up the hillside. (Sketch-video retrieval)
R(F_Theta) == int(int())
P(c__plus / D(F_Theta(_i(X)), F_Theta(_j(X)))) * P(c*- / D(F_Theta(_i(X)
), F_Theta(_j(X)))) * d**2 * z
D(F(_n(X)), F(_(n, +)(X))) < D(F(_n(X)), F(_(n, -)(X)))
L_triplet == sum(limits_(n == 1)**N max(0, Delta + norm_2(-_Theta(F ›
- norm_2(-_Theta(F _(n, -)(X)))**2))