History: Two-channel noisy recordings of a moving speaker within a limited area
Preview of version: 16
Two-channel noisy recordings of a moving speaker within a limited area
Scenario
The target is a loudspeaker that occurs within a 30x30cm area. The loudspeaker is (always) directed towards two microphones that are 2 meters distant from the center of the area. Details of the scenario are given in the following figure.Development dataset
Download dev16.zip (14 MB) (Development dataset, 16 kHz, 16 bits)Download dev44.1.zip (63 MB) (Development dataset, 44.1 kHz, 24 bits)
For training, the dataset contains noise-free recordings of utterances played by the loudspeaker when it was standing (without movement) in one of 16 fixed positions within the target area. The file names have the format dev_position_<xx>.wav, where <xx> is the index of the position.
Next, there are four recordings during that the loudspeaker was moved over four positions. A video of the first recording is available for illustration here . The file names have the format dev_<set>_<positions>_{sim,src,noi,mix}.wav, where <set> is the index of the recording (A, B,C, or D), <positions> contains indices of four positions passed during the movement, and {sim,src,noi,mix} denote, respectively, target source images, source signal of the target, noise, and the noisy recording (sim+noi).
Test dataset
Download test16.zip (3 MB) (Test dataset, 16 kHz, 16 bits)Download test44.1.zip (12 MB) (Test dataset, 44.1 kHz, 24 bits)
The dataset contains five noisy recordings of the moving loudspeaker within the area. The file names have the format test_<set>_x_x_x_x_mix.wav, where <set> is the index of the recording (A, B,C, D, or E). Here, the trajectory of the movement is not revealed.
Tasks
The participants are encouraged to submit- Enhanced (de-noised) testing as well as development recordings
- Estimated trajectories of the loudspeaker in terms of sequences of indices of positions (mandatory)
Submissions
Each participant should make his/her results available online in the form of a tarball called <YourName>_<dataset>.zip.The files containing the enhanced utterances should be named: <dataset>_<set>_x_x_x_x_enh.wav
where <dataset> is either dev or test, <set> is A, B, C, D, or E, and x_x_x_x are the estimated positions of the target during the movement.
Each participant should then send an email to "zbynek.koldovsky (at) tul.cz" providing:
- contact information (name, affiliation)
- basic information about his/her algorithm, including its average running time (in seconds per test excerpt and per GHz of CPU) and a bibliographical reference if possible
- the URL of the tarball(s)
Evaluation criteria
The evaluation will be done through the perceptual evaluation toolkit PEASS v.2.0.Licensing issues
All files are distributed under the terms of the Creative Commons Attribution-Noncommercial-ShareAlike 3.0 license. The files to be submitted by participants will be made available on a website under the terms of the same license.The recordings are authored by Emmanuel Vincent, Zbynek Koldovsky, and Jiri Malek.
Back to Audio source separation top