Yay! My First Neural Network!

This post could also be called "Multiclass Classification with a Neural Network using PyTorch" but then you probably wouldn't have opened it...

I'm glad to say, as time goes on, my blog post titles are going to seem increasingly incomprehensible to anyone outside of the field of machine learning. It's exhilarating to be able to understand and write about things that would have baffled me just a week ago. However, I'll do my best to describe everything in a non-technical way as much as possible.

Having learned the fundamentals of Python and ploughed through precalculus maths in the first six weeks of my intensive studies, I was ready to dive into the heart of machine learning and deep learning: neural networks.

Machine learning seems quite intimidating. The theory videos are full of daunting mathematical concepts like backpropagation and gradient descent. I knew that there are code libraries that would help, though, so I decided to start with a course in one of those. I wanted to get hands on with machine learning as soon as possible.

 

PyTorch

PyTorch is a powerful Python library that does a lot of the heavy lifting for you when creating neural networks. It does most of the maths behind the scenes without you needing more than a high level sense of what's going on. I was extremely fortunate to stumble upon a brilliant, in-depth Python course from freecodecamp.org by Daniel Bourke. He does a fantastic job of making these heavyweight technical concepts accessible. It eased me into progressive complexity one step at a time and had me practice the fundamentals of creating a neural network model and writing a training and testing loop over and over again.

I'm only halfway through the course, but wanted to write a blog post demonstrating what I've learned so far. There is a link to my Colab notebook if you want to see the code and run it yourself.

So far, as part of the course, I've created neural networks to solve three different 'toy dataset' problems. A linear regression problem, a binary classification problem and a multiclass classification problem. A week ago I wouldn't have known what any of those were, yet alone how to create a neural network to solve them!

This write-up describes using deep learning to solve a multiclass classification problem.

 

What is a multiclass classification problem?

This is where you start with a dataset of items that can be sorted into a predefined number of categories. A classic example is the identification of handwritten numbers. A neural network model is trained on a large dataset of images of individual handwritten digits, 0 to 9. It is given the training data (the images) plus the training 'labels' (the correct classification of each, e.g. one is a '4' and another is a '9').

Without ever explictly being told how to differentiate between different digits, the model iterates through the training data, making predictions and calculating how good they are, until finally it can be given test data that it has never seen before, and hopefully identify which of the ten categories (0 to 9) each image is most likely to be.

I haven't got to the image processing part of the PyTorch course (that's next), so for this section of the course, the course had us use a dataset from the Scikit-learn library called make_blobs.

Here is a visualisation of the data it created. I'm arbritrarily using four categories as the output choice, represented by the different colours.

The goal is for the neural network to take this data and figure out rules, such that, when given new data points, it can predict which output choice (colour in the above scatter plot) each is most likely to belong to.

It successfully did so, and here is the result, showing the 'decision boundaries'. (Training data on the left; test data on the right.)

The beauty of deep learning with a neural network is that it figured out these boundaries for itself, based on the training data.


High level steps

Here are the steps in full for achieving this. A week ago I wouldn't have understood any of it, yet alone have been able to write or implement this:

  1. Data preparation
    1. Set a manual seed so results are reproduceable
    2. Create a dataset
    3. Visualise the data to make sure it's what we want 
    4. Convert the data from Numpy arrays into Tensors
    5. Split the data into train and test samples 
  2. Device agnostic code
    1. Note whether the device is using the CPU or a GPU
    2. Put the data on the target device
  3. Create a neural network model
    1. Write a custom class or use an existing one (custom in this case)
    2. Consider what layers the class should consist of, linear and/or non-linear
    3. Instantiate an instance of the class to create our model and send to the target device
  4. Choose a loss function and an optimiser
    1. Choose a loss function. For multiclass classification, we want Cross Entropy Loss
    2. Choose an optimiser. Stochastic Gradient Descent in this case.
    3. Perform a forward pass using the random untrained data
    4. Check (visualise) the untrained predictions to make sure they are as we would expect
  5. Perform the main training and testing loop.
    1. For each epoch:
      1. TRAINING: Do the forward pass (make predictions based on the training data)
      2. Calculate the loss (how inaccurate the predictions were)
      3. Zero the optimiser gradients
      4. Do backpropagation
      5. Do gradient descent to nudge the parameters (weights and biases) and hopefully improve them
      6. TESTING: Do the forward pass but with test data this time which it hasn't seen before
      7. Calculate the loss (how inaccurate the new predictions were)
      8. Log/show the results

When the model is successfully learning, then over many iterations (epochs), the loss should get lower and the accuracy should get higher. We can see this here:

 

 

Thanks to the practical nature of Daniel's fantastic course, I can pretty much code those steps now in my sleep. A lot of it can be functionised but he's deliberatly got us to write the code over and over again in full, to really drill it in and give us the confidence to solve problems on the fly.

As a result, it's a long course. People in the comments are proudly proclaiming that they finished it in a month and a half. I'm halfway through it in just five days!

On to image processing with neural networks next. Thanks for reading.

 

How to View and Run the Code in Colab Notebook

You view and even run the code yourself using Google Colab if you're interested. Here are some helpful instructions, thanks to ChatGPT:

  1. Open the Notebook Link: Click on the following link to open my Colab notebook: Open Colab Notebook

  2. Sign in to a Google Account: If you’re not already signed in, you will need to sign in with your Google account to view and interact with the notebook.

  3. Save a Copy to Your Drive:

    • Once the notebook is open, click on "File" in the top menu.
    • Select "Save a copy in Drive". This will save the notebook to your own Google Drive, allowing you to edit and run it.
  4. Run the Notebook:

    • To run the notebook, click on the play icon (▶️) next to each code cell, or click "Runtime" in the top menu and select "Run all" to execute all cells from top to bottom.
    • If prompted, allow the notebook to access your Google Drive (this is necessary for saving any changes or outputs to your Drive).
  5. Modify and Experiment (Optional):

    • Feel free to modify the code and experiment with different inputs. Since you have your own copy, you won’t affect the original shared notebook.

Comments