Synthetic Data by Construction

The synthetic dataset consists of two classes, containing 2000 samples each. Each class is defined by 4 distinct, class specific sound objects that represent rhythmic, and melodic structures. Each generated audio sample is a superposition of up to 4 class specific audio objects, 5 random sounds and Gaussian noise with a noise strength of \(\sigma = 0.1\). Samples are generated as superpositions of periodic sine-waves with a time length of 1 second, and a synthetic sample rate of \(f_s = 16000Hz\). Randomness is introduced by randomizing amplitude, phase, frequency, and modulation frequency from predefined ranges of each sound object. A detailed description of the generation procedure is provided in Chapter 4.1.1. and Appendix D in the thesis report.

In the following, audio samples for one exemplary instance per class are presented, along with their class-specific sound objects, in the form of log-mel-spectrograms.


Synthetic Class 1


Synthetic Class 1

Synthetic Class 2


Synthetic Class 2