[1910.10912] Multi-channel Speech Separation Using Deep Embedding Model with Multilayer Bootstrap Networks
The core contribution of this paper is that DPCL++ introduces MBN into speech separation, which can be easily implemented
Abstract: Recently, deep clustering (DPCL) based speaker-independent speech separation
has drawn much attention, since it needs little speaker prior information.
However, it still has much room of improvement, particularly in reverberant
environments. If the training and test environments mismatch which is a common
case, the embedding vectors produced by DPCL may contain much noise and many
small variations. To deal with the problem, we propose a variant of DPCL, named
DPCL++, by applying a recent unsupervised deep learning method---multilayer
bootstrap networks(MBN)---to further reduce the noise and small variations of
the embedding vectors in an unsupervised way in the test stage, which
fascinates k-means to produce a good result. MBN builds a gradually narrowed
network from bottom-up via a stack of k-centroids clustering ensembles, where
the k-centroids clusterings are trained independently by random sampling and
one-nearest-neighbor optimization. To further improve the robustness of DPCL++
in reverberant environments, we take spatial features as part of its input.
Experimental results demonstrate the effectiveness of the proposed method.
‹Fig. 1. Diagram of the proposed DPCL++ system. (SYSTEM DESCRIPTION)Fig. 4. Logarithmic magnitude spectra of a mixed speech signal, its ground-truth components, and the estimated components produced by DPCL++ in the anechoic environment. (a) Mixed speech. (b) Clean speech of the first speaker. (c) Clean speech of the second speaker. (d) Estimated speech of the first speaker. (d) Estimated speech of the second speaker. (Datasets)›
[1707.03634] Speaker-independent Speech Separation with Deep Attractor Network[1806.09325] Single-channel Speech Dereverberation via Generative Adversarial Training
Related: Semantic Math
[1708.01402] Exploiting Redundancy, Recurrence and Parallelism: How to Link Millions of Addresses with Ten Lines of Code in Ten Minutes[1703.00565] Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ[1909.11707] Implementation of three LWC Schemes in the WiFi 4-Way Handshake with Software Defined Radio[1811.01233] Deep Ad-hoc Beamforming[1504.03183] Adaptive Randomized Dimension Reduction on Massive Data[1710.11431] Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling[1906.00402] Push and Pull Search Embedded in an M2M Framework for Solving Constrained Multi-objective Optimization Problems[1901.08241] Location reference identification from tweets during emergencies: A deep learning approach[1804.07514] An Approximate Shading Model with Detail Decomposition for Object Relighting[1712.00288] Prior and Likelihood Choices for Bayesian Matrix Factorisation on Small Datasets