
In today’s blog post, we are going to implement our first Convolutional Neural Network (CNN) ― LeNet ― using python and the Keras deep learning package.
The LeNet architecture was first introduced by LeCun et al. in their 1998 paper, Gradient-Based Learning Applied to Document Recognition . As the name of the paper suggests, the authors’ implementation of LeNet was used primarily for OCR and character recognition in documents.
The LeNet architecture is straightforward and small , (in terms of memory footprint), making it perfect for teaching the basics of CNNs ― it can even run on the CPU (if your system does not have a suitable GPU), making it a great “first CNN”.
However, if you do have GPU support and can access your GPU via Keras, you will enjoy extremely fast training times (in the order of 3-10 seconds per epoch, depending on your GPU).
In the remainder of this post, I’ll be demonstrating how to implement the LeNet Convolutional Neural Network architecture using Python and Keras.
From there, I’ll show you how to train LeNet on the MNIST dataset for digit recognition.
To learn how to train your first Convolutional Neural Network, keep reading.
Looking for the source code to this post?
Jump right to the downloads section. LeNet Convolutional Neural Network in PythonThis tutorial will be primarilycode oriented and meant to help you get your feet wet with Deep Learning and Convolutional Neural Networks. Because of this intention, I am not going to spenda lot of time discussing activation functions, pooling layers, or dense/fully-connected layers ― there will be plenty of tutorials on the PyImageSearch blog in the future that will cover each of these layer types/concepts in lots of detail.
Again, this tutorial is meant to be your first end-to-end example where you get to train a real-life CNN (and see it in action). We’ll get to the gory details of activation functions, pooling layers, and fully-connected layers later in this series of posts (although you should already know the basics of how convolution operations work ); but in the meantime, simply follow along, enjoy the lesson, and learn how to implement your first Convolutional Neural Network with Python and Keras.
The MNIST dataset
Figure 1:MNIST digit recognitiondataset.
You’ve likely already seen the MNIST dataset before, either here on the PyImageSearch blog, or elsewhere in your studies. In either case, I’ll go ahead and quickly review the dataset to ensure you know exactly what data we’re working with.
The MNIST dataset is arguably the most well-studied, most understood dataset in the computer vision and machine learning literature, making it an excellent “first dataset” to use on your deep learning journey.
Note:As we’ll find out, it’s also quite easy to get > 98% classification accuracy on this dataset with minimal training time, even on the CPU.
The goal of this dataset is to classify the handwritten digits 0-9. We’re given a total of 70,000 images, with (normally) 60,000 images used for training and 10,000 used for evaluation; however, we’re free to split this data as we see fit. Common splits include the standard 60,000/10,000, 75%/25%, and 66.6%/33.3%. I’ll be using 2/3 of the data for training and 1/3 of the data for testing later in the blog post.
Each digit is represented as a 28 x 28 grayscale image (examples from the MNIST dataset can be seen in the figure above). These grayscale pixel intensities are unsigned integers, with the values of the pixels falling in the range [0, 255]. All digits are placed on a black background with a lightforeground (i.e., the digit itself) being white and various shades of gray.It’s worth noting that many libraries (such as scikit-learn ) have built-in helper methods to download the MNIST dataset, cache it locally to disk, and then load it. These helper methods normally represent each image as a 784-d vector.
Where does the number 784 come from?
Simple. It’s just the flattened 28 x 28 = 784 image.
To recover our original image from the 784-d vector, we simply reshape the array into a 28 x 28 image.
In the context of this blog post, our goal is to train LeNet such that we maximize accuracy on our testing set.
The LeNet architecture
Figure 2:The LeNet architecture consists of two sets of convolutional, activation, and pooling layers, followed by a fully-connected layer, activation, another fully-connected, and finally a softmax classifier ( image source ).
The LeNet architecture is an excellent “first architecture” for Convolutional Neural Networks (especially when trained on the MNIST dataset, an image dataset for handwritten digit recognition).
LeNet is small and easy to understand ― yet large enough to provide interesting results. Furthermore, the combination of LeNet + MNIST is able to run on the CPU, making it easyfor beginners to take their first step in Deep Learning and Convolutional Neural Networks.
In many ways, LeNet + MNIST is the “Hello, World” equivalent of Deep Learning for image classification.
The LeNet architecture consists of the following layers:
INPUT => CONV => RELU => POOL => CONV => RELU => POOL => FC => RELU => FCInstead of explaining the number of convolution filters per layer, the size of the filters themselves, and the number of fully-connected nodes right now, I’m going to save this discussion until our “Implementing LeNet with Python and Keras” section of the blog post where the source code will serve as an aid in the explantation.
In the meantime, let’s took at our project structure ― a structure that we are going to reuse many times in future PyImageSearch blog posts.
Note:The original LeNetarchitectureused TANH activation functions rather than RELU . The reason we use RELU here is because it tends to give much better classification accuracy due to a number of nice, desirable properties (which I’ll discuss in a future blog post). If you run into any other discussions on LeNet, you might see that they use TANH instead ― again, just something to keep in mind.
Our CNN project structureBefore we dive into any code, let’s first review our project structure:
|--- output |--- pyimagesearch ||--- __init__.py ||--- cnn |||--- __init__.py |||--- networks ||||--- __init__.py ||||--- lenet.py |--- lenet_mnist.py To keep our code o