For machine hearing in complex sences (i.e. reverberation, noise), sound localization either serves as the front-end or is implicitly encoded in speech enhancing models. However, it is suggested that there may be cross-talk between identification and localization streams in auditory system. Based on this idea, a multi-task based sound localization method is proposed in this study. The proposed model takes waveform as input, and simutaneously estimates the azimuth of sound source and the time-frequency (T-F) masks. Localization experiments were performed using binaural simulation in reverberant environment and the results show that comparing to single-task sound localization method, the presence of speech enhancement task can improve the localization performance.
Authors: Tao Song (Peking University), Tianshu Qu (Peking University) and Jing Chen (Peking University)