Udacity Deep Learning Nanodegree Notes and Thoughts [Lesson 3]
The primary focus of this lesson was convolutional neural networks (CNNs or ConvNets), which are deep neural networks that specialize in image analyzation and pattern detection. They use convolutional layers that detect patterns to take an image input and get the final output as what the image contains. In this case (look at the notes on the left), the image contains the number 7. The filters in convolutional neural networks learn how to detect patterns by themselves, which is pretty amazing. If you take the filters and visualize them, you can see which patterns each filter identifies.
We also learned how to tune various hyper-parameters in order to better train our network. Some of these include the number of filters; which can make the neural net more or less sensitive to pattern detection, padding; where you pad the outside of the image with 0’s if the filter extends outside the image, as well as various other parameters like strides and kernel size. Tuning the parameters can prove to be tricky, and it was a challenge to decide what numbers to use to get the most out of your network.
Unless you are using a particular activation function, the output of the neural network will always be probabilistic, and the way the ConvNet decides these probabilities is based on how many types of these patterns it detects. For example, if your CNN specializes in identifying if a dog is present in the image, the results will be P(dog) and 1-P(dog) or P(not a dog). It calculates this probability by observing how many of the filters detect the features that an image of a dog would contain, like having ears or a snout.
Then it would give the probability of how well the network’s filter matches the attributes of the given image. Based on this probability it can output if the image contains a dog or if it doesn’t. Now, the weights of the filters are learned by the machine; they aren’t manually set. The original weights of the filter are set to some random numbers based on the type of probability distribution you use. Then you give it the training data with the labels, and it starts detecting the patterns in the images as it starts classifying better. The way the filter is able to learn is truly phenomenal.
The convolutional layers are a replacement for the hidden layers, and the filters act as the weights. However, since there are multiple convolutional layers (feature maps), the filters all come together to produce the final output in the output layer. Like how I explained earlier, depending on how well each filter fits the image, it generates a probability. In the final layer, also known as the output layer, you combine all of those probabilities with the respective weights that each filter carries. Each filter carries it’s own weight because some patterns are more important than others. The filter that detects the legs of the dog has a higher importance than the filter that identifies the eyelids of the dog, so it carries a larger weight.
I thought that this lesson was straightforward and easy to understand because the videos were mostly visual. Drawing out the diagrams helped in my comprehension why RGB images were stacked in matrices, how tuning each hyper-parameter affected different parts of the neural network, and how the filters were able to learn by themselves. These are tricky concepts to understand, and because of all the fantastic diagrams and visualizations, the course wasn’t as challenging.