Experimenting with 1D CNN Architectures for Generic Audio Classification
During the recent years, convolutional neural networks have been the standard on audio semantics, surpassing traditional classification approaches which employed hand-crafted feature engineering as front-end and various classifiers as back-end. Early studies were based on prominent 2D convolutional topologies for image recognition, adapting them to audio classification tasks. After the surge of deep learning in the past decade, real end-to-end audio learning, employing algorithms that directly process waveforms are to become the standard. This paper attempts a comparison between deep neural setups on typical audio classification tasks, focusing on optimizing 1D convolutional neural networks that can be deployed on various audio in-formation retrieval tasks, such as general audio detection and classification, environmental sound or speech emotion recognition.
Authors: Lazaros Vrysis (Aristotle University of Thessaloniki), Iordanis Thoidis (Aristotle University of Thessaloniki), Charalampos Dimoulas (Aristotle University of Thessaloniki) and George Papanikolaou (Aristotle University of Thessaloniki)