PhD Candidate AudioLab, University of York York, England, United Kingdom
Immersive productions place additional demands on sound design teams, specifically around the increased complexity of scenes, increased number of sound producing objects, and the need to spatialise sound in 360. This paper presents an initial feasibility study for a methodology utilising visual object detection within a simple 2D scene to detect, track, and match content for sound generating objects. Results show that while successful for a single moving object there are limitations within the current computer vision system used which causes complications for scenes with multiple objects. Results also show that the recommendation of candidate sound effect files is heavily dependent on the accuracy of the visual object detection system and the labelling of the audio repository used.
Authors: Daniel Turner (University of York), Damian Murphy (University of York) and Chris Pike (BBC R&D)