.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "gallery_1d/plot_classif_torch.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_gallery_1d_plot_classif_torch.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_gallery_1d_plot_classif_torch.py:


Classification of spoken digit recordings
=========================================

In this example we use the 1D scattering transform to represent spoken digits,
which we then classify using a simple classifier. This shows that 1D scattering
representations are useful for this type of problem.

This dataset is automatically downloaded and preprocessed from
https://github.com/Jakobovski/free-spoken-digit-dataset.git

Downloading and precomputing scattering coefficients should take about 5 min.
Running the gradient descent takes about 1 min.

Results:
Training accuracy = 99.7%
Testing accuracy = 98.0%

.. GENERATED FROM PYTHON SOURCE LINES 21-25

Preliminaries
-------------

Since we're using PyTorch to train the model, import `torch`.

.. GENERATED FROM PYTHON SOURCE LINES 25-28

.. code-block:: default


    import torch


.. GENERATED FROM PYTHON SOURCE LINES 29-32

We will be constructing a logistic regression classifier on top of the
scattering coefficients, so we need some of the neural network tools from
`torch.nn` and the Adam optimizer from `torch.optim`.

.. GENERATED FROM PYTHON SOURCE LINES 32-36

.. code-block:: default


    from torch.nn import Linear, NLLLoss, LogSoftmax, Sequential
    from torch.optim import Adam


.. GENERATED FROM PYTHON SOURCE LINES 37-39

To handle audio file I/O, we import `os` and `scipy.io.wavfile`. We also need
`numpy` for some basic array manipulation.

.. GENERATED FROM PYTHON SOURCE LINES 39-44

.. code-block:: default


    from scipy.io import wavfile
    import os
    import numpy as np


.. GENERATED FROM PYTHON SOURCE LINES 45-47

To evaluate our results, we need to form a confusion matrix using
scikit-learn and display them using `matplotlib`.

.. GENERATED FROM PYTHON SOURCE LINES 47-51

.. code-block:: default


    from sklearn.metrics import confusion_matrix
    import matplotlib.pyplot as plt


.. GENERATED FROM PYTHON SOURCE LINES 52-56

Finally, we import the `Scattering1D` class from the `kymatio.torch` package
and the `fetch_fsdd` function from `kymatio.datasets`. The `Scattering1D`
class is what lets us calculate the scattering transform, while the
`fetch_fsdd` function downloads the FSDD, if needed.

.. GENERATED FROM PYTHON SOURCE LINES 56-60

.. code-block:: default


    from kymatio.torch import Scattering1D
    from kymatio.datasets import fetch_fsdd


.. GENERATED FROM PYTHON SOURCE LINES 61-69

Pipeline setup
--------------
We start by specifying the dimensions of our processing pipeline along with
some other parameters.

First, we have signal length. Longer signals are truncated and shorter
signals are zero-padded. The sampling rate is 8000 Hz, so this corresponds to
little over a second.

.. GENERATED FROM PYTHON SOURCE LINES 69-72

.. code-block:: default


    T = 2**13


.. GENERATED FROM PYTHON SOURCE LINES 73-75

Maximum scale 2**J of the scattering transform (here, about 30 milliseconds)
and the number of wavelets per octave.

.. GENERATED FROM PYTHON SOURCE LINES 75-79

.. code-block:: default


    J = 8
    Q = 12


.. GENERATED FROM PYTHON SOURCE LINES 80-83

We need a small constant to add to the scattering coefficients before
computing the logarithm. This prevents very large values when the scattering
coefficients are very close to zero.

.. GENERATED FROM PYTHON SOURCE LINES 83-86

.. code-block:: default


    log_eps = 1e-6


.. GENERATED FROM PYTHON SOURCE LINES 87-88

If a GPU is available, let's use it!

.. GENERATED FROM PYTHON SOURCE LINES 88-92

.. code-block:: default


    use_cuda = torch.cuda.is_available()
    device = torch.device("cuda" if use_cuda else "cpu")


.. GENERATED FROM PYTHON SOURCE LINES 93-94

For reproducibility, we fix the seed of the random number generator.

.. GENERATED FROM PYTHON SOURCE LINES 94-97

.. code-block:: default


    torch.manual_seed(42)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <torch._C.Generator object at 0x7f365969cf90>


.. GENERATED FROM PYTHON SOURCE LINES 98-107

Loading the data
----------------
Once the parameter are set, we can start loading the data into a format that
can be fed into the scattering transform and then a logistic regression
classifier.

We first download the dataset. If it's already downloaded, `fetch_fsdd` will
simply return the information corresponding to the dataset that's already
on disk.

.. GENERATED FROM PYTHON SOURCE LINES 107-112

.. code-block:: default


    info_data = fetch_fsdd()
    files = info_data['files']
    path_dataset = info_data['path_dataset']


.. GENERATED FROM PYTHON SOURCE LINES 113-115

Set up Tensors to hold the audio signals (`x_all`), the labels (`y_all`), and
whether the signal is in the train or test set (`subset`).

.. GENERATED FROM PYTHON SOURCE LINES 115-120

.. code-block:: default


    x_all = torch.zeros(len(files), T, dtype=torch.float32, device=device)
    y_all = torch.zeros(len(files), dtype=torch.int64, device=device)
    subset = torch.zeros(len(files), dtype=torch.int64, device=device)


.. GENERATED FROM PYTHON SOURCE LINES 121-127

For each file in the dataset, we extract its label `y` and its index from the
filename. If the index is between 0 and 4, it is placed in the test set, while
files with larger indices are used for training. The actual signals are
normalized to have maximum amplitude one, and are truncated or zero-padded
to the desired length `T`. They are then stored in the `x_all` Tensor while
their labels are in `y_all`.

.. GENERATED FROM PYTHON SOURCE LINES 127-158

.. code-block:: default


    for k, f in enumerate(files):
        basename = f.split('.')[0]

        # Get label (0-9) of recording.
        y = int(basename.split('_')[0])

        # Index larger than 5 gets assigned to training set.
        if int(basename.split('_')[2]) >= 5:
            subset[k] = 0
        else:
            subset[k] = 1

        # Load the audio signal and normalize it.
        _, x = wavfile.read(os.path.join(path_dataset, f))
        x = np.asarray(x, dtype='float')
        x /= np.max(np.abs(x))

        # Convert from NumPy array to PyTorch Tensor.
        x = torch.from_numpy(x).to(device)

        # If it's too long, truncate it.
        if x.numel() > T:
            x = x[:T]

        # If it's too short, zero-pad it.
        start = (T - x.numel()) // 2

        x_all[k,start:start + x.numel()] = x
        y_all[k] = y


.. GENERATED FROM PYTHON SOURCE LINES 159-163

Log-scattering transform
------------------------
We now create the `Scattering1D` object that will be used to calculate the
scattering coefficients.

.. GENERATED FROM PYTHON SOURCE LINES 163-167

.. code-block:: default


    scattering = Scattering1D(J, T, Q).to(device)


.. GENERATED FROM PYTHON SOURCE LINES 168-169

Compute the scattering transform for all signals in the dataset.

.. GENERATED FROM PYTHON SOURCE LINES 169-172

.. code-block:: default


    Sx_all = scattering.forward(x_all)


.. GENERATED FROM PYTHON SOURCE LINES 173-176

Since it does not carry useful information, we remove the zeroth-order
scattering coefficients, which are always placed in the first channel of
the scattering Tensor.

.. GENERATED FROM PYTHON SOURCE LINES 176-179

.. code-block:: default


    Sx_all = Sx_all[:,1:,:]


.. GENERATED FROM PYTHON SOURCE LINES 180-184

To increase discriminability, we take the logarithm of the scattering
coefficients (after adding a small constant to make sure nothing blows up
when scattering coefficients are close to zero). This is known as the
log-scattering transform.

.. GENERATED FROM PYTHON SOURCE LINES 184-187

.. code-block:: default


    Sx_all = torch.log(torch.abs(Sx_all) + log_eps)


.. GENERATED FROM PYTHON SOURCE LINES 188-190

Finally, we average along the last dimension (time) to get a time-shift
invariant representation.

.. GENERATED FROM PYTHON SOURCE LINES 190-193

.. code-block:: default


    Sx_all = torch.mean(Sx_all, dim=-1)


.. GENERATED FROM PYTHON SOURCE LINES 194-201

Training the classifier
-----------------------
With the log-scattering coefficients in hand, we are ready to train our
logistic regression classifier.

First, we extract the training data (those for which `subset` equals `0`)
and the associated labels.

.. GENERATED FROM PYTHON SOURCE LINES 201-204

.. code-block:: default


    Sx_tr, y_tr = Sx_all[subset == 0], y_all[subset == 0]


.. GENERATED FROM PYTHON SOURCE LINES 205-208

Standardize the data to have mean zero and unit variance. Note that we need
to apply the same transformation to the test data later, so we save the
mean and standard deviation Tensors.

.. GENERATED FROM PYTHON SOURCE LINES 208-213

.. code-block:: default


    mu_tr = Sx_tr.mean(dim=0)
    std_tr = Sx_tr.std(dim=0)
    Sx_tr = (Sx_tr - mu_tr) / std_tr


.. GENERATED FROM PYTHON SOURCE LINES 214-216

Here we define a logistic regression model using PyTorch. We train it using
Adam with a negative log-likelihood loss.

.. GENERATED FROM PYTHON SOURCE LINES 216-223

.. code-block:: default


    num_input = Sx_tr.shape[-1]
    num_classes = y_tr.cpu().unique().numel()
    model = Sequential(Linear(num_input, num_classes), LogSoftmax(dim=1))
    optimizer = Adam(model.parameters())
    criterion = NLLLoss()


.. GENERATED FROM PYTHON SOURCE LINES 224-225

If we're on a GPU, transfer the model and the loss function onto the device.

.. GENERATED FROM PYTHON SOURCE LINES 225-229

.. code-block:: default


    model = model.to(device)
    criterion = criterion.to(device)


.. GENERATED FROM PYTHON SOURCE LINES 230-232

Before training the model, we set some parameters for the optimization
procedure.

.. GENERATED FROM PYTHON SOURCE LINES 232-240

.. code-block:: default


    # Number of signals to use in each gradient descent step (batch).
    batch_size = 32
    # Number of epochs.
    num_epochs = 50
    # Learning rate for Adam.
    lr = 1e-4


.. GENERATED FROM PYTHON SOURCE LINES 241-242

Given these parameters, we compute the total number of batches.

.. GENERATED FROM PYTHON SOURCE LINES 242-246

.. code-block:: default


    nsamples = Sx_tr.shape[0]
    nbatches = nsamples // batch_size


.. GENERATED FROM PYTHON SOURCE LINES 247-248

Now we're ready to train the classifier.

.. GENERATED FROM PYTHON SOURCE LINES 248-277

.. code-block:: default


    for e in range(num_epochs):
        # Randomly permute the data. If necessary, transfer the permutation to the
        # GPU.
        perm = torch.randperm(nsamples, device=device)

        # For each batch, calculate the gradient with respect to the loss and take
        # one step.
        for i in range(nbatches):
            idx = perm[i * batch_size : (i+1) * batch_size]
            model.zero_grad()
            resp = model.forward(Sx_tr[idx])
            loss = criterion(resp, y_tr[idx])
            loss.backward()
            optimizer.step()

        # Calculate the response of the training data at the end of this epoch and
        # the average loss.
        resp = model.forward(Sx_tr)
        avg_loss = criterion(resp, y_tr)

        # Try predicting the classes of the signals in the training set and compute
        # the accuracy.
        y_hat = resp.argmax(dim=1)
        accuracy = (y_tr == y_hat).float().mean()

        print('Epoch {}, average loss = {:1.3f}, accuracy = {:1.3f}'.format(
            e, avg_loss, accuracy))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Epoch 0, average loss = 0.727, accuracy = 0.816
    Epoch 1, average loss = 0.508, accuracy = 0.879
    Epoch 2, average loss = 0.407, accuracy = 0.909
    Epoch 3, average loss = 0.347, accuracy = 0.919
    Epoch 4, average loss = 0.308, accuracy = 0.927
    Epoch 5, average loss = 0.280, accuracy = 0.938
    Epoch 6, average loss = 0.256, accuracy = 0.942
    Epoch 7, average loss = 0.236, accuracy = 0.947
    Epoch 8, average loss = 0.219, accuracy = 0.950
    Epoch 9, average loss = 0.207, accuracy = 0.952
    Epoch 10, average loss = 0.197, accuracy = 0.953
    Epoch 11, average loss = 0.185, accuracy = 0.956
    Epoch 12, average loss = 0.178, accuracy = 0.959
    Epoch 13, average loss = 0.165, accuracy = 0.962
    Epoch 14, average loss = 0.160, accuracy = 0.961
    Epoch 15, average loss = 0.152, accuracy = 0.965
    Epoch 16, average loss = 0.148, accuracy = 0.964
    Epoch 17, average loss = 0.141, accuracy = 0.970
    Epoch 18, average loss = 0.137, accuracy = 0.970
    Epoch 19, average loss = 0.131, accuracy = 0.971
    Epoch 20, average loss = 0.127, accuracy = 0.973
    Epoch 21, average loss = 0.122, accuracy = 0.977
    Epoch 22, average loss = 0.119, accuracy = 0.974
    Epoch 23, average loss = 0.115, accuracy = 0.978
    Epoch 24, average loss = 0.113, accuracy = 0.973
    Epoch 25, average loss = 0.109, accuracy = 0.977
    Epoch 26, average loss = 0.104, accuracy = 0.980
    Epoch 27, average loss = 0.106, accuracy = 0.981
    Epoch 28, average loss = 0.107, accuracy = 0.976
    Epoch 29, average loss = 0.098, accuracy = 0.980
    Epoch 30, average loss = 0.094, accuracy = 0.983
    Epoch 31, average loss = 0.093, accuracy = 0.981
    Epoch 32, average loss = 0.094, accuracy = 0.979
    Epoch 33, average loss = 0.091, accuracy = 0.983
    Epoch 34, average loss = 0.086, accuracy = 0.983
    Epoch 35, average loss = 0.085, accuracy = 0.985
    Epoch 36, average loss = 0.082, accuracy = 0.983
    Epoch 37, average loss = 0.081, accuracy = 0.986
    Epoch 38, average loss = 0.077, accuracy = 0.987
    Epoch 39, average loss = 0.077, accuracy = 0.984
    Epoch 40, average loss = 0.074, accuracy = 0.987
    Epoch 41, average loss = 0.072, accuracy = 0.988
    Epoch 42, average loss = 0.072, accuracy = 0.989
    Epoch 43, average loss = 0.071, accuracy = 0.989
    Epoch 44, average loss = 0.070, accuracy = 0.987
    Epoch 45, average loss = 0.068, accuracy = 0.989
    Epoch 46, average loss = 0.066, accuracy = 0.987
    Epoch 47, average loss = 0.065, accuracy = 0.991
    Epoch 48, average loss = 0.064, accuracy = 0.989
    Epoch 49, average loss = 0.062, accuracy = 0.991


.. GENERATED FROM PYTHON SOURCE LINES 278-282

Now that our network is trained, let's test it!

First, we extract the test data (those for which `subset` equals `1`) and the
associated labels.

.. GENERATED FROM PYTHON SOURCE LINES 282-285

.. code-block:: default


    Sx_te, y_te = Sx_all[subset == 1], y_all[subset == 1]


.. GENERATED FROM PYTHON SOURCE LINES 286-288

Use the mean and standard deviation calculated on the training data to
standardize the testing data, as well.

.. GENERATED FROM PYTHON SOURCE LINES 288-291

.. code-block:: default


    Sx_te = (Sx_te - mu_tr) / std_tr


.. GENERATED FROM PYTHON SOURCE LINES 292-294

Calculate the response of the classifier on the test data and the resulting
loss.

.. GENERATED FROM PYTHON SOURCE LINES 294-307

.. code-block:: default


    resp = model.forward(Sx_te)
    avg_loss = criterion(resp, y_te)

    # Try predicting the labels of the signals in the test data and compute the
    # accuracy.

    y_hat = resp.argmax(dim=1)
    accu = (y_te == y_hat).float().mean()

    print('TEST, average loss = {:1.3f}, accuracy = {:1.3f}'.format(
          avg_loss, accu))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TEST, average loss = 0.110, accuracy = 0.967


.. GENERATED FROM PYTHON SOURCE LINES 308-314

Plotting the classification accuracy as a confusion matrix
----------------------------------------------------------
Let's see what the very few misclassified sounds get misclassified as. We
will plot a confusion matrix which indicates in a 2D histogram how often
one sample was mistaken for another (anything on the diagonal is correctly
classified, anything off the diagonal is wrong).

.. GENERATED FROM PYTHON SOURCE LINES 314-328

.. code-block:: default


    predicted_categories = y_hat.cpu().numpy()
    actual_categories = y_te.cpu().numpy()

    confusion = confusion_matrix(actual_categories, predicted_categories)
    plt.figure()
    plt.imshow(confusion)
    tick_locs = np.arange(10)
    ticks = ['{}'.format(i) for i in range(1, 11)]
    plt.xticks(tick_locs, ticks)
    plt.yticks(tick_locs, ticks)
    plt.ylabel("True number")
    plt.xlabel("Predicted number")
    plt.show()


.. image-sg:: /gallery_1d/images/sphx_glr_plot_classif_torch_001.png
   :alt: plot classif torch
   :srcset: /gallery_1d/images/sphx_glr_plot_classif_torch_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 2 minutes  2.280 seconds)


.. _sphx_glr_download_gallery_1d_plot_classif_torch.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_classif_torch.py <plot_classif_torch.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_classif_torch.ipynb <plot_classif_torch.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_