Merge remote-tracking branch 'origin/master'

faroit · May 6, 2018 · b18fd5f · b18fd5f
2 parents 3bcca37 + 117ca93
commit b18fd5f
Show file tree

Hide file tree

Showing 2 changed files with 38 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -46,9 +46,45 @@ You can now run the command line script and process wav files
 
 `python predict_audio.py examples/5_speakers.wav`
 
-## Reproduce Paper Results
+## Reproduce Paper Results using the LibriCount Dataset
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1216072.svg)](https://doi.org/10.5281/zenodo.1216072)
+
+The full test dataset is available for download on Zenodo.
+
+### LibriCount10 0dB Dataset
+
+The dataset contains a simulated cocktail party environment of [0..10] speakers, mixed with 0dB SNR from random utterances of different speakers from the [LibriSpeech](http://www.openslr.org/12/) `CleanTest` dataset. 
+
+For each recording we provide the ground truth number of speakers within the file name, where `k` in, `k_uniquefile.wav` is the maximum number of concurrent speakers with the 5 seconds of recording.
+
+All recordings are of 5s durations. For each unique recording, we provide the audio wave file (16bits, 16kHz, mono) and an annotation `json` file with the same name as the recording.
+
+### Metadata
+
+In the annotation file we provide information about the speakers sex, their unique speaker_id, and vocal activity within the mixture recording in samples. Note that these were automatically generated using [a voice activity detection method](https://github.com/wiseman/py-webrtcvad).
+
+In the following example a speaker count of 3 speakers is the ground truth.
+
+```json
+[
+	{
+		"sex": "F", 
+		"activity": [[0, 51076], [51396, 55400], [56681, 80000]], 
+		"speaker_id": 1221
+	}, 
+	{
+		"sex": "F", 
+		"activity": [[0, 51877], [56201, 80000]], 
+		"speaker_id": 3570
+	}, 
+	{
+		"sex": "M", 
+		"activity": [[0, 15681], [16161, 68213], [73498, 80000]], 
+		"speaker_id": 5105
+	}
+]
+```
 
-We will provide the full test dataset soon.
 
 ## License
 

diff --git a/predict_audio.py b/predict_audio.py
@@ -39,9 +39,6 @@
     # downmix to mono
     audio = np.mean(audio, axis=1)
 
-    # max normalise output
-    audio /= np.max(audio, axis=0)
-
     # compute STFT
     X = np.abs(librosa.stft(audio, n_fft=400, hop_length=160)).T