A popular demonstration of the capability of deep learning techniques is object recognition in image data.
The “hello world” of object recognition for machine learning and deep learning is the MNIST dataset for handwritten digit recognition.
In this post you will discover how to develop a deep learning model to achieve near state of the art performance on the MNIST handwritten digit recognition task in python using the Keras deep learning library.
After completing this tutorial, you will know:
How to load the MNIST dataset in Keras. How to develop and evaluate a baseline neural network model for the MNIST problem. How to implement and evaluate a simple Convolutional Neural Network for MNIST. How to implement a close to state-of-the-art deep learning model for MNIST.Let’s get started.

Handwritten Digit Recognition using Convolutional Neural Networks in Python with Keras
Photo by Jamie , some rights reserved.
Description of the MNIST Handwritten Digit Recognition ProblemThe MNIST problem is a dataset developed by Yann LeCun, Corinna Cortes and Christopher Burges for evaluating machine learning models on the handwritten digit classification problem.
The dataset was constructed from a number of scanned document dataset available from the National Institute of Standards and Technology (NIST). This is where the name for the dataset comes from, as the Modified NIST or MNIST dataset.
Images of digits were taken from a variety of scanned documents, normalized in size and centered. This makes it an excellent dataset for evaluating models, allowing the developer to focus on the machine learning with very little data cleaning or preparation required.
Each image is a 28 by 28 pixel square (784 pixels total). A standard spit of the dataset is used to evaluate and compare models, where 60,000 images are used to train a model and a separate set of 10,000 images are used to test it.
It is a digit recognition task. As such there are 10 digits (0 to 9) or 10 classes to predict. Results are reported using prediction error, which is nothing more than the inverted classification accuracy.
Excellent results achieve a prediction error of less than 1%. State-of-the-art prediction error of approximately 0.2% can be achieved with large Convolutional Neural Networks. There is a listing of the state-of-the-art results and links to the relevant papers on the MNIST and other datasets on Rodrigo Benenson’s webpage .
Get Started in Deep Learning With Python
Deep Learning getsstate-of-the-art resultsand Python hosts the most powerful tools.
Get started now!
PDF Download and Email Course.
FREE 14-Day Mini-Course onDeep Learning WithPython Download Your FREE Mini-CourseDownload your PDF containing all 14 lessons.
Get your daily lesson via email with tips and tricks.
Loading the MNIST dataset in KerasThe Keras deep learning library provides a convenience method for loading the MNIST dataset.
The dataset is downloaded automatically the first time this function is called and is stored in your home directory in ~/.keras/datasets/mnist.pkl.gz as a 15MB file.
This is very handy for developing and testing deep learning models.
To demonstrate how easy it is to load the MNIST dataset, we will first write a little script to download and visualize the first 4 images in the training dataset.
# Plot ad hoc mnist instances fromkeras.datasetsimportmnist importmatplotlib.pyplotas plt # load (downloaded if needed) the MNIST dataset (X_train, y_train), (X_test, y_test) = mnist.load_data() # plot 4 images as gray scale plt.subplot(221) plt.imshow(X_train[0], cmap=plt.get_cmap('gray')) plt.subplot(222) plt.imshow(X_train[1], cmap=plt.get_cmap('gray')) plt.subplot(223) plt.imshow(X_train[2], cmap=plt.get_cmap('gray')) plt.subplot(224) plt.imshow(X_train[3], cmap=plt.get_cmap('gray')) # show the plot plt.show()You can see that downloading and loading the MNIST dataset is as easy as calling the mnist.load_data() function. Running the above example, you should see the image below.

Examples from the MNIST dataset
Baseline Model with Multi-Layer PerceptronsDo we really need a complex model like a convolutional neural network to get the best results with MNIST?
You can get very good results using a very simple neural network model with a single hidden layer. In this section we will create a simple multi-layer perceptron model that achieves an error rate of 1.74%. We will use this as a baseline for comparing more complex convolutional neural network models.
Let’s start off by importing the classes and functions we will need.
importnumpy fromkeras.datasetsimportmnist fromkeras.modelsimportSequential fromkeras.layersimportDense fromkeras.layersimportDropout fromkeras.utilsimportnp_utils
It is always a good idea to initialize the random number generator to a constant to ensure that the results of your script are reproducible.
# fix random seed for reproducibility seed = 7 numpy.random.seed(seed)
Now we can load the MNIST dataset using the Keras helper function.
# load data (X_train, y_train), (X_test, y_test) = mnist.load_data()
The training dataset is structured as a 3-dimensional array of instance, image width and image height. For a multi-layer perceptron model we must reduce the images down into a vector of pixels. In this case the 28×28 sized images will be 784 pixel input values.
We can do this transform easily using the reshape() function on the NumPyarray. We can also reduce our memory requirements by forcing the precision of the pixel values to be 32 bit, the default precision used by Keras anyway.
# flatten 28*28 images to a 784 vector for each image num_pixels = X_train.shape[1] * X_train.shape[2] X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32') X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')The pixel values are gray scale between 0 and 255. It is almost always a good idea to perform some scaling of input values when using neural network models. Because the scale is well known and well behaved, we can very quickly normalize the pixel values to the range 0 and 1 by dividing each value by the maximum of 255.
# normalize inputs from 0-255 to 0-1 X_train = X_train / 255 X_test = X_test / 255
Finally, the output variable is an integer from 0 to 9. This is a multi-class classification problem. As such, it is good practice to use a one hot encoding of the class values, transforming the ve