Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Commit

Permalink
Merge remote-tracking branch 'origin/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
Fabian-Robert Stöter committed May 6, 2018
2 parents 3bcca37 + 117ca93 commit b18fd5f
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 5 deletions.
40 changes: 38 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,45 @@ You can now run the command line script and process wav files

`python predict_audio.py examples/5_speakers.wav`

## Reproduce Paper Results
## Reproduce Paper Results using the LibriCount Dataset
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1216072.svg)](https://doi.org/10.5281/zenodo.1216072)

The full test dataset is available for download on Zenodo.

### LibriCount10 0dB Dataset

The dataset contains a simulated cocktail party environment of [0..10] speakers, mixed with 0dB SNR from random utterances of different speakers from the [LibriSpeech](http://www.openslr.org/12/) `CleanTest` dataset.

For each recording we provide the ground truth number of speakers within the file name, where `k` in, `k_uniquefile.wav` is the maximum number of concurrent speakers with the 5 seconds of recording.

All recordings are of 5s durations. For each unique recording, we provide the audio wave file (16bits, 16kHz, mono) and an annotation `json` file with the same name as the recording.

### Metadata

In the annotation file we provide information about the speakers sex, their unique speaker_id, and vocal activity within the mixture recording in samples. Note that these were automatically generated using [a voice activity detection method](https://github.com/wiseman/py-webrtcvad).

In the following example a speaker count of 3 speakers is the ground truth.

```json
[
{
"sex": "F",
"activity": [[0, 51076], [51396, 55400], [56681, 80000]],
"speaker_id": 1221
},
{
"sex": "F",
"activity": [[0, 51877], [56201, 80000]],
"speaker_id": 3570
},
{
"sex": "M",
"activity": [[0, 15681], [16161, 68213], [73498, 80000]],
"speaker_id": 5105
}
]
```

We will provide the full test dataset soon.

## License

Expand Down
3 changes: 0 additions & 3 deletions predict_audio.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,6 @@
# downmix to mono
audio = np.mean(audio, axis=1)

# max normalise output
audio /= np.max(audio, axis=0)

# compute STFT
X = np.abs(librosa.stft(audio, n_fft=400, hop_length=160)).T

Expand Down

0 comments on commit b18fd5f

Please sign in to comment.