.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "gallery_1d/classif_keras.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_gallery_1d_classif_keras.py: Classification of spoken digit recordings ========================================= In this example we use the 1D scattering transform to represent spoken digits, which we then classify using a simple classifier. This shows that 1D scattering representations are useful for this type of problem. This dataset is automatically downloaded and preprocessed from https://github.com/Jakobovski/free-spoken-digit-dataset.git Downloading and precomputing scattering coefficients should take about 5 min. Running the gradient descent takes about 1 min. Results: Training accuracy = 99.7% Testing accuracy = 98.0% .. GENERATED FROM PYTHON SOURCE LINES 21-26 Preliminaries ------------- Since we're using TensorFlow and Keras to train the model, import the relevant modules. .. GENERATED FROM PYTHON SOURCE LINES 26-31 .. code-block:: default import tensorflow as tf from tensorflow.keras import layers .. GENERATED FROM PYTHON SOURCE LINES 32-34 To handle audio file I/O, we import `os` and `scipy.io.wavfile`. We also need `numpy` for some basic array manipulation. .. GENERATED FROM PYTHON SOURCE LINES 34-39 .. code-block:: default from scipy.io import wavfile import os import numpy as np .. GENERATED FROM PYTHON SOURCE LINES 40-44 Finally, we import the `Scattering1D` class from the `kymatio.keras` package and the `fetch_fsdd` function from `kymatio.datasets`. The `Scattering1D` class is what lets us calculate the scattering transform, while the `fetch_fsdd` function downloads the FSDD, if needed. .. GENERATED FROM PYTHON SOURCE LINES 44-48 .. code-block:: default from kymatio.keras import Scattering1D from kymatio.datasets import fetch_fsdd .. GENERATED FROM PYTHON SOURCE LINES 49-57 Pipeline setup -------------- We start by specifying the dimensions of our processing pipeline along with some other parameters. First, we have signal length. Longer signals are truncated and shorter signals are zero-padded. The sampling rate is 8000 Hz, so this corresponds to little over a second. .. GENERATED FROM PYTHON SOURCE LINES 57-60 .. code-block:: default T = 2 ** 13 .. GENERATED FROM PYTHON SOURCE LINES 61-63 Maximum scale 2**J of the scattering transform (here, about 30 milliseconds) and the number of wavelets per octave. .. GENERATED FROM PYTHON SOURCE LINES 63-66 .. code-block:: default J = 8 Q = 12 .. GENERATED FROM PYTHON SOURCE LINES 67-70 We need a small constant to add to the scattering coefficients before computing the logarithm. This prevents very large values when the scattering coefficients are very close to zero. .. GENERATED FROM PYTHON SOURCE LINES 70-72 .. code-block:: default log_eps = 1e-6 .. GENERATED FROM PYTHON SOURCE LINES 73-82 Loading the data ---------------- Once the parameter are set, we can start loading the data into a format that can be fed into the scattering transform and then a logistic regression classifier. We first download the dataset. If it's already downloaded, `fetch_fsdd` will simply return the information corresponding to the dataset that's already on disk. .. GENERATED FROM PYTHON SOURCE LINES 82-87 .. code-block:: default info_data = fetch_fsdd() files = info_data['files'] path_dataset = info_data['path_dataset'] .. GENERATED FROM PYTHON SOURCE LINES 88-90 Set up NumPy arrays to hold the audio signals (`x_all`), the labels (`y_all`), and whether the signal is in the train or test set (`subset`). .. GENERATED FROM PYTHON SOURCE LINES 90-95 .. code-block:: default x_all = np.zeros((len(files), T)) y_all = np.zeros(len(files), dtype=np.uint8) subset = np.zeros(len(files), dtype=np.uint8) .. GENERATED FROM PYTHON SOURCE LINES 96-102 For each file in the dataset, we extract its label `y` and its index from the filename. If the index is between 0 and 4, it is placed in the test set, while files with larger indices are used for training. The actual signals are normalized to have maximum amplitude one, and are truncated or zero-padded to the desired length `T`. They are then stored in the `x_all` array while their labels are in `y_all`. .. GENERATED FROM PYTHON SOURCE LINES 102-130 .. code-block:: default for k, f in enumerate(files): basename = f.split('.')[0] # Get label (0-9) of recording. y = int(basename.split('_')[0]) # Index larger than 5 gets assigned to training set. if int(basename.split('_')[2]) >= 5: subset[k] = 0 else: subset[k] = 1 # Load the audio signal and normalize it. _, x = wavfile.read(os.path.join(path_dataset, f)) x = np.asarray(x, dtype='float') x /= np.max(np.abs(x)) # If it's too long, truncate it. if len(x) > T: x = x[:T] # If it's too short, zero-pad it. start = (T - len(x)) // 2 x_all[k,start:start + len(x)] = x y_all[k] = y .. GENERATED FROM PYTHON SOURCE LINES 131-135 Log-scattering layer -------------------- We now create a classification model using the `Scattering1D` Keras layer. First, we take the input signals of length `T`. .. GENERATED FROM PYTHON SOURCE LINES 135-138 .. code-block:: default x_in = layers.Input(shape=(T)) .. GENERATED FROM PYTHON SOURCE LINES 139-140 These are fed into the `Scattering1D` layer. .. GENERATED FROM PYTHON SOURCE LINES 140-143 .. code-block:: default x = Scattering1D(J, Q=Q)(x_in) .. GENERATED FROM PYTHON SOURCE LINES 144-147 Since it does not carry useful information, we remove the zeroth-order scattering coefficients, which are always placed in the first channel of the scattering transform. .. GENERATED FROM PYTHON SOURCE LINES 147-157 .. code-block:: default x = layers.Lambda(lambda x: x[..., 1:, :])(x) # To increase discriminability, we take the logarithm of the scattering # coefficients (after adding a small constant to make sure nothing blows up # when scattering coefficients are close to zero). This is known as the # log-scattering transform. x = layers.Lambda(lambda x: tf.math.log(tf.abs(x) + log_eps))(x) .. GENERATED FROM PYTHON SOURCE LINES 158-160 We then average along the last dimension (time) to get a time-shift invariant representation. .. GENERATED FROM PYTHON SOURCE LINES 160-163 .. code-block:: default x = layers.GlobalAveragePooling1D(data_format='channels_first')(x) .. GENERATED FROM PYTHON SOURCE LINES 164-166 Finally, we apply batch normalization to ensure that the data is within a moderate range. .. GENERATED FROM PYTHON SOURCE LINES 166-169 .. code-block:: default x = layers.BatchNormalization(axis=1)(x) .. GENERATED FROM PYTHON SOURCE LINES 170-172 These features are then used to classify the input signal using a dense layer followed by a softmax activation. .. GENERATED FROM PYTHON SOURCE LINES 172-175 .. code-block:: default x_out = layers.Dense(10, activation='softmax')(x) .. GENERATED FROM PYTHON SOURCE LINES 176-177 Finally, we create the model and display it. .. GENERATED FROM PYTHON SOURCE LINES 177-181 .. code-block:: default model = tf.keras.models.Model(x_in, x_out) model.summary() .. GENERATED FROM PYTHON SOURCE LINES 182-186 Training the classifier ----------------------- Having set up the model, we attach an Adam optimizer and a cross-entropy loss function. .. GENERATED FROM PYTHON SOURCE LINES 186-191 .. code-block:: default model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) .. GENERATED FROM PYTHON SOURCE LINES 192-194 We then train the model using `model.fit`. The training data is given by those indices satisfying `subset == 0`. .. GENERATED FROM PYTHON SOURCE LINES 194-198 .. code-block:: default model.fit(x_all[subset == 0], y_all[subset == 0], epochs=50, batch_size=64, validation_split=0.2) .. GENERATED FROM PYTHON SOURCE LINES 199-201 Finally, we evaluate the model on the held-out test data. These are given by the indices `subset == 1`. .. GENERATED FROM PYTHON SOURCE LINES 201-203 .. code-block:: default model.evaluate(x_all[subset == 1], y_all[subset == 1], verbose=2) .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.000 seconds) .. _sphx_glr_download_gallery_1d_classif_keras.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: classif_keras.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: classif_keras.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_