Implementing perceptron in Keras

Let's look at how we can implement a perceptron in Keras, introducing some simple concepts.

The main objective of Keras is to make the model creation more Pythonic and model-centric.

There are two ways to create a model, using either the Sequential or Model class. The easiest way to create a Keras model is by using the Sequential API. There are some limitations that come with using that class; for example, it is not straightforward to define models that may have multiple different input or output sources, but it fits our purpose.

We can start by initializing the sequential class:

my_perceptron = Sequential()

Then, we will have to add our input layer and specify the dimensions and a few other parameters. In our case, we will add a Dense layer, which means that all of the neurons have a connection with all of the neurons from the next layer.

This Dense layer is fully connected, meaning that all of the neurons have one connection with the neurons from the next layer. It performs the product between the input and our set of weights, which is also called the kernel, of course, adding the bias if specified. Then, the result will pass through the activation function.

To initialize it, we need to specify the number of neurons (1), the input dimension (2, as we have two variables), the activation function (linear), and the initial weight value (zero). To add the layer to the model, it's possible to use the add () method, like in the in/out example:

input_layer = Dense(1, input_dim=2, activation="sigmoid", kernel_initializer="zero")
my_perceptron.add(input_layer)

Now, it's necessary to compile our model. In this phase, we will simply define the loss function and the way we want to explore the gradient, our optimizer. Keras does not supply the step function that we used before, as it's not differentiable and therefore will not work with backpropagation. If we want to use it, it's possible to define custom functions using keras.backend, and in this case, we will also have to define the derivative ourselves, but we will leave this as an exercise for the reader.

We will use the MSE instead, for simplicity. Also, as a gradient descent strategy, we are going to use Stochastic Gradient Descent (SGD), which is an iterative method to optimize a differentiable function. When defining the SGD, we can also specify a learning rate, which we will set as 0.01:

my_perceptron.compile(loss="mse", optimizer=SGD(lr=0.01))

After this, we only need to train our network with the fit method. For this phase, we need to provide the training data and its labels.

We can also provide the number of epochs that we want. An epoch is a pass forward and backward through the network of the entire dataset. In this simple case, one is enough, but for more complex neural networks, we will need many more.

We also specify the batch size, which is the portion of our training set that will be used for one gradient iteration. To make the gradient process less noisy, it's common to batch the data before updating the weights. The size will depend on how much memory is required, but, in general, it will be between 32 and 512 data points. There are a lot of variables that play into that batch size, but in general, a large batch size tends to converse to a pretty sharp minimizer and lose the ability to generalize the learnings outside the training set. To help avoid being stuck in a local minima, we also want to shuffle the data. In this case, every iteration will change the batches, and it will make being stuck in a local minima more difficult:

my_perceptron.fit(train_x.values, train_y, nb_epoch=2, batch_size=32, shuffle=True)

Now, we can easily compute the AUC score, as follows:

from sklearn.metrics import roc_auc_score

pred_y = my_perceptron.predict(test_x)

print(roc_auc_score(test_y, pred_y))

While the model is training, we will see some information printed on the screen. Keras shows the progress of the model, giving us an ETA for each epoch while running. It also has an indication about the loss that we can use to see whether the model is actually improving its performance:

Epoch 1/30
1618/1618 [==============================] - 1s 751us/step - loss: 0.1194 Epoch 2/30 1618/1618 

[==============================] - 1s 640us/step - loss: 0.0444 Epoch 3/30 1618/1618