Python Hyperopt Finding the optimal hyper parameters

This comes up as a very common question in ##machinelearning chat room How do you chose the right parameters for your Neural Network model? How do you chose how many hidden layers to use, or what your drop out parameter should be, or what the learning rate should be, and so on?

Or in other fields, say you’re doing an engineering or science experiment, how do you chose the right parameters to test?

This problem is called hyperoptimization .

One possible way is to do a “grid search”. You keep all the parameters constant, then vary one by one and see what effect it has. You adjust a different parameter, then repeat on the first parameter, trying to either to test out every possible combination. The trouble is that the number of combinations blows dramatically, to the power of the number of parameters.

But never fear there are smarter ways to do this, including random search, Bayesian Search, and Tree of Parzen Estimators (TPE). The good thing is that you don’t actually need to know any of this is, in order to actually use it.

Let’s take a toy example, and since it’s valentines at the time of writing, let’s imagine instead our error space is:

Python Hyperopt Finding the optimal hyper parameters

Toy hyperparameter space function. I spent way too long making this pretty :wink:

Where the axes are filling in for our hyperparameters (e.g. learning rate, size of hidden layer), and the z value is our error value (e.g. percentage of cat photos incorrectly identified as dogs, etc. Something that we want to minimize).

This diagram was generated with the notebook python code:

%matplotlib notebook import matplotlib.pyplot as plt import numpy as np def heart(x, y): return -((x**2 + y**2 - 1)**3 - x**2 *y**3); x = y = np.arange(-2, 2, 0.01) X, Y = np.meshgrid(x,y) Z = heart(X, Y) plt.figure() CS = plt.contour(X, Y, Z, levels=[-20, -1, 0, 0.1, 0.2, 0.5], colors=["0.9", "0.9", "red", "0.9", "0.9", "0.9"]) plt.clabel(CS, inline=1, fontsize=10)

So, say you’ve written a Tensorflow, or keras, or scipy, etc program in Python. For example, tensorflow/mnist.py . Have a quick look at that see how they’ve got various hyperparameters that need tuning. There’s some complicated error function that we can’t directly visualize, in some high dimensional hyperparameter space. And we’re trying to find the parameters that minimize this error.

Here’s my dummy program that will stand in for your complex mnist or cat recognition program:

# Our hyperparameters class HyperParameters: def __init__(self): self.learning_rate = 1 self.size_of_hidden_layer = 1 def runAndGetError(self): """This function should call your tensorflow etc code which does the whole training procedure and returns the TRAINING error. Lower is better!""" return heart(self.learning_rate, self.size_of_hidden_layer) hp = HyperParameters(); def main(): pass # This is where you do all the normal training code etc if __name__ == "__main__": main()

Note that we’ve got a class for all our hyperparameters. I recommend you having this! Note also that we don’t actual run the training unless __name__ == “__main__”. The point here is to make sure that our program runs as normal if we just run it directly, but we can also create a second program that uses hyperopt to tune this without having to touch our main program at all. This sort of separation is really nice.

Don’t put any hyperopt code in your main program!

Now, without touching your main program at all, we create our hyperopt tuner:

Latest Images

Trending Articles

Latest Images