Camera multiviewer

When there is a problem, we also display a larger red icon denoting what type of audio has been detected. We initially created a web interface that closely mirrors the layout of camera sources that the monitoring team see in their multiviewer a box for each camera feed with either a green speaker symbol for clean audio or a red speaker when there is problem audio. This includes facilities to ingest and route video - and then generate a multiviewer displaying those camera feeds. Much of the remote production work makes use of a collection of cloud-based tools put together by colleagues in BBC News called the Remote Production Cluster. Remote Production Cluster integrationĪ key task for this tool was to have it integrate as naturally as possible into the existing monitoring workflow. Understanding these sensitivities and so being able to set detection thresholds is important because speech, for example, is far more likely to cause compliance issues and so should be picked up with far greater sensitivity than say, vehicles, which will not. We used these to test the tagging system, and to determine the sensitivities at which the various unsafe sounds would be picked up. To address this, we ran a set of experiments in conjunction with R&Ds audio team, in which we generated 4400 hours of hours of unsafe audio by mixing clean Springwatch audio with a selection of relevant sound effects from Freesound. A notable issue that we encountered here was a lack of suitable audio to test it with. If they have, we warn the monitoring team.īefore using our system live at Autumnwatch, we wanted to put it through its paces.

When we receive the results from the tagger, we examine the scores for different sounds and see if any of the problematic audio types have scored highly. This tool chunks the audio up into short clips of around a second before passing them onto the tagging system. We then take a live stream of the audio from this recording into our processing tool. Our system takes in streams from the cameras and puts them into our cloud-based media management system. This allows our audio monitoring system to be used for any sort of audio content, opening up the possibility for a range of applications in all sorts of productions and programmes. The AudioSet ontology contains a large selection of different sounds with a great deal of variety.

The classifier achieves state-of-the-art performance in AudioSet tagging, with a mean average precision (mAP) of 0.439. We chose the classifier because of its high accuracy in detecting a wide range of sounds, hierarchically described by the Google AudioSet ontology. For this, we use a machine learning-based audio classifier.

Our monitoring system warns of speech detected on two of the cameras Audio taggingīefore we can warn the production team about the audio, we need to determine what we’re hearing. This warning also remains on the screen for several seconds, so it’s easy to spot even if the problem sounds might have only been brief. So our system translates the problem audio into a visual warning on the operator’s screen. However, it is relatively easy to watch eight different videos at once. Fundamentally it is hard for a person to listen to eight different audio streams at once. Our tools can detect unsafe audio and alert the production team to its presence on a particular stream in a few seconds. They may become reliant on other members of the team discovering the problem and passing that information on. Additionally, if the operator is listening to a single source, they can miss problem sounds on all the other sources. This can potentially take several minutes if the problem sound is intermittent and so is difficult to track down. They are normally listening to a mix of several of the audio sources, so after hearing some unsafe audio on the mix, they may have to then go through all the sources one by one to try and locate the problem sound. A single member of the production team will often have to monitor the audio from up to eight feeds at once. One particular challenge for an operator is detecting the presence of unsafe audio and then working out which stream it is appearing on. Our system sets out to assist the team with this task. They try to ensure that the audio remains in keeping with the natural setting of the production and that man-made noises such as vehicle noise or speech are avoided. The Live Stream team manage this and always have an operator watching and listening to ensure that the video and audio are of acceptable quality and complies with the BBC’s editorial guidelines. Throughout the week of Autumnwatch the audience could watch live streams of a selection of the wildlife cameras for 12 hours each day.