The adoption of AI-driven algorithms continues to grow in modern audio, speech, and acoustic applications. Datasets are essential to the deep learning models at the heart of AI -- they provide the truth by which models learn and the tests that validate their performance. While researchers tend to reuse well-known datasets, engineers building real-world systems must create data that represent all scenarios in which the models are expected to operate. This is often an iterative process that requires application-specific resources, tools, and expertise.
In this session, we will explore the development of a well-known application: waking up voice-enabled devices using trigger phrases like "Hey Siri" or "OK Google". Using practical MATLAB examples, we will cover a number of widely relevant best practices across data labeling and annotation, data ingestion, data synthesis and augmentation, feature extraction, and domain transformations.