[1910.11117] Graph Representation learning for Audio & Music genre Classification
Abstract Music genre is arguably one of the most important and discriminative information for music and audio content. Visual representation based approaches have been explored on spectrograms for music genre classification. However, lack of quality data and augmentation techniques makes it difficult to employ deep learning techniques successfully. We discuss the application of graph neural networks on such task due to their strong inductive bias, and show that combination of CNN and GNN is able to achieve state-of-the-art results on GTZAN, and AudioSet (Imbalanced Music) datasets. We also discuss the role of Siamese Neural Networks as an analogous to GNN for learning edge similarity weights. Furthermore, we also perform visual analysis to understand the field-of-view of our model into the spectrogram based on genre labels.
‹Figure 1: (a) Confusion Matrix for GTZAN and AudioSet datasets from GrahAM (b) t-SNE plots for 50% of the complex AudioSet data from (i) Siamese Network and (ii) GrahAM. (c) Illustration of the Siamese Training flow. (Introduction)Figure 2: (a) Visualization for the heatmaps generated from Grad-CAM method over example genere predictions from GrahAM. (b) Illustration of the Graph Network processing pipeline with reference to gradient taken from intermediate layers. (Our Approach)›