History: Asynchronous recordings of speech mixtures

Comparing version 15 with version 17

Asynchronous recordings of speech mixtures

Introduction

In recent years, many recording devices such as voice recorders, smart phones, tablet-type mobile devices, laptop PCs, etc, can be available in our surrounding environment easily, and exploiting them for array signal processing is one of the attractive scenarios. However, in most cases, recorded signals with different devices are not synchronous, which include unknown time offsets of recording start or sampling frequency mismatch. The aim of this task to evaluate source separation on such asynchronous channles.

Results

Results for development and test dataset (external link)

Description of the datasets

The datasets consist of synthetic asynchronous recordings of speech mixtures with three stereo recording devices which are not synchronized. Recording of static sources with synchronous channels are simulated by convolution of measured impulse responses and imposition of uncorrelated white noise whose SNR is 60 dB. Then random time offsets and slight sampling frequency mismatches were artificially applied to them.

The data has two different recording environments:
  • 150ms: all the microphone elements are spaced in a linear arrangement. The spacing of each stereo microphone pair is about 2.15 cm. The reverbaration time is about 150 ms.
  • 300ms: all the microphone elements are spaced in a radial fashion. The spacing of each stereo microphone pair is about 7.65 cm. The reverbaration time is about 300 ms.

Test data

Download test.zip (external link) (18.8 MB)
The data consist of 18 stereo WAV audio files that can be imported in Matlab using the wavread command. These files are named test_<srcset>_<cond>_mix_<ch>.wav, where
  • <srcset>: source sets male2, male3 and male4, which correspond to the mixture of two, three and four male skerkers' utterrances, respectively.
  • <cond>: the recording conditions 150ms and 300ms.
  • <ch>: the indexes of the stereo channels ch12, ch34 and ch56. The channels are synchronized within each file, but the files are not synchronized to each other.

Each combination of <srcset> and <cond> determines one source set. The source sets do not share the same time offsets, sampling frequency mismatches and the direction of the sources. The sampling frequency mismatches are samaller than 100 ppm (= 0.01 %).

Development data

Download dev.zip (external link) (75.5 MB)
The developement data consist of 66 stereo WAV audio files and 6 Matlab MAT files, which can be imported in Matlab using the commands load and wavread respectively. These files are named as follows:
  • dev_src_<src>.wav: single-channel speech signal, shared in whole the development data.
  • dev_<srcset>_<cond>_<src>_<ch>.wav: two-channel spatial image of each source.
  • dev_<srcset>_<cond>_mix_<ch>.wav: two-channel observed signal of each stereo channel pair.
  • dev_<srcset>_<cond>_src_<src>.wav: MAT file including the variable A of the room impulse responses, whose size is [number of the channels, number of the sources, number of samples]. Note that the recording time offset is included in the impulse responses.
Here the variables are determined as follows.
  • <srcset>: source set male2, male3 and male4, which correspond to the mixture of two, three and four male skerkers' utterrances.
  • <cond>: recording conditions 150ms and 300ms.
  • <ch>: indexes of the stereo channels ch12, ch34 and ch56. The channels are synchronized within each file, but The files are not synchronized each other.
  • "<src>": indexes of the source.
In this development data, the data sets share the same time offset and sampling frequency mismatch. All the channels are originally sampled at 16 kHz, and "ch34" and "ch56" are resampled at 15999 and 16001 Hz, respectively.

Task

The task is to estimate each source signal at the first channel from mixture. Because channels can include unknown offsets of recording start or sampling frequency mismatch, each participant is requested to align the separated source at the first channel.

Submission

Each participant is asked to submit the results of his/her algorithm for the task described above over all or part of the mixtures in the development dataset and the test dataset.

Evaluation criteria


We plan to use the criteria defined in the BSS_EVAL toolbox. The submitted results will be evaluated with SDR, SIR, and SAR using original sources at the first channel as "i" in bss_eval_sources.m.

The criteria and benchmarks are respectively implemented in

Licensing issues

All files are distributed under the terms of the Creative Commons Attribution-Noncommercial-ShareAlike? 3.0 license. The files to be submitted by participants will be made available on a website under the terms of the same license. The author are Yuya Sugimoto and Shigeki Miyabe.

Task proposed by Shigeki Miyabe and Nobutaka Ono.

Asynchronous recordings of speech mixtures

Introduction

In recent years, many recording devices such as voice recorders, smart phones, tablet-type mobile devices, laptop PCs, etc, can be available in our surrounding environment easily, and exploiting them for array signal processing is one of the attractive scenarios. However, in most cases, recorded signals with different devices are not synchronous, which include unknown time offsets of recording start or sampling frequency mismatch. The aim of this task to evaluate source separation on such asynchronous channles.

Results

Results for development and test dataset (external link)

Description of the datasets

The datasets consist of synthetic asynchronous recordings of speech mixtures with three stereo recording devices which are not synchronized. Recording of static sources with synchronous channels are simulated by convolution of measured impulse responses and imposition of uncorrelated white noise whose SNR is 60 dB. Then random time offsets and slight sampling frequency mismatches were artificially applied to them.

The data has two different recording environments:
  • 150ms: all the microphone elements are spaced in a linear arrangement. The spacing of each stereo microphone pair is about 2.15 cm. The reverbaration time is about 150 ms.
  • 300ms: all the microphone elements are spaced in a radial fashion. The spacing of each stereo microphone pair is about 7.65 cm. The reverbaration time is about 300 ms.

Test data

Download test.zip (external link) (18.8 MB)
The data consist of 18 stereo WAV audio files that can be imported in Matlab using the wavread command. These files are named test_<srcset>_<cond>_mix_<ch>.wav, where
  • <srcset>: source sets male2, male3 and male4, which correspond to the mixture of two, three and four male skerkers' utterrances, respectively.
  • <cond>: the recording conditions 150ms and 300ms.
  • <ch>: the indexes of the stereo channels ch12, ch34 and ch56. The channels are synchronized within each file, but the files are not synchronized to each other.

Each combination of <srcset> and <cond> determines one source set. The source sets do not share the same time offsets, sampling frequency mismatches and the direction of the sources. The sampling frequency mismatches are samaller than 100 ppm (= 0.01 %).

Development data

Download dev.zip (external link) (75.5 MB)
The developement data consist of 66 stereo WAV audio files and 6 Matlab MAT files, which can be imported in Matlab using the commands load and wavread respectively. These files are named as follows:
  • dev_src_<src>.wav: single-channel speech signal, shared in whole the development data.
  • dev_<srcset>_<cond>_<src>_<ch>.wav: two-channel spatial image of each source.
  • dev_<srcset>_<cond>_mix_<ch>.wav: two-channel observed signal of each stereo channel pair.
  • dev_<srcset>_<cond>_src_<src>.wav: MAT file including the variable A of the room impulse responses, whose size is [number of the channels, number of the sources, number of samples]. Note that the recording time offset is included in the impulse responses.
Here the variables are determined as follows.
  • <srcset>: source set male2, male3 and male4, which correspond to the mixture of two, three and four male skerkers' utterrances.
  • <cond>: recording conditions 150ms and 300ms.
  • <ch>: indexes of the stereo channels ch12, ch34 and ch56. The channels are synchronized within each file, but The files are not synchronized each other.
  • "<src>": indexes of the source.
In this development data, the data sets share the same time offset and sampling frequency mismatch. All the channels are originally sampled at 16 kHz, and "ch34" and "ch56" are resampled at 15999 and 16001 Hz, respectively.

Task

The task is to estimate each source signal at the first channel from mixture. Because channels can include unknown offsets of recording start or sampling frequency mismatch, each participant is requested to align the separated source at the first channel.

Submission

Each participant is asked to submit the results of his/her algorithm for the task described above over all or part of the mixtures in the development dataset and the test dataset.

Evaluation criteria


We plan to use the criteria defined in the BSS_EVAL toolbox. The submitted results will be evaluated with SDR, SIR, and SAR using original sources at the first channel as "i" in bss_eval_sources.m.

The criteria and benchmarks are respectively implemented in

Licensing issues

All files are distributed under the terms of the Creative Commons Attribution-Noncommercial-ShareAlike? 3.0 license. The files to be submitted by participants will be made available on a website under the terms of the same license. The author are Yuya Sugimoto and Shigeki Miyabe.

Task proposed by Shigeki Miyabe and Nobutaka Ono.

History

Legend: v=view, c=compare, d=diff
Date UserEdit Comment Version Action
Thu 01 of Aug., 2013 01:57 CEST admin   17
Current
 v
Tue 30 of July, 2013 06:48 CEST admin results removed temporarily ( !! Results Results for [http://www.onn.nii.ac.jp/sisec13/evaluation_result/ASY/ASY2013.html|development and test dataset]) 16  v  c  d  
Tue 30 of July, 2013 04:47 CEST admin   15  v  c  d  
Sun 31 of Mar., 2013 09:11 CEST admin link correction 14  v  c  d  
Sat 30 of Mar., 2013 18:41 CET admin ISR removed 13  v  c  d  
Sat 30 of Mar., 2013 12:16 CET admin evaluation criteria changed from images to sources 12  v  c  d  
Sat 30 of Mar., 2013 06:08 CET admin link to data, proposer 11  v  c  d  
Fri 29 of Mar., 2013 14:54 CET admin   10  v  c  d  
Fri 29 of Mar., 2013 14:52 CET admin title corrected 9  v  c  d  
Fri 29 of Mar., 2013 14:51 CET admin error corrected 8  v  c  d  
Fri 29 of Mar., 2013 14:47 CET admin errors corrected 7  v  c  d  
Fri 29 of Mar., 2013 14:45 CET admin Pre-release version completed. 6  v  c  d  

Menu

Google Search

 
sisec2013.wiki.irisa.fr
WWW