Implementing a perceptron

We will now look at how to build a perceptron from scratch ,in order to make sure we understand the concepts as we will use them to build complex networks.

Single-layer perceptrons are only capable of learning patterns that are linearly separable. The learning part is the process of finding the weights that minimize the error of the output.

First of all, let's create a dataset. We will do so by sampling from two distinct normal distributions that we created, labeling the data according to the distribution. After that, we will train our perceptron to distinguish them:

import numpy as np
import pandas as pd
import seaborn as sns; sns.set()
from sklearn.metrics import confusion_matrix

# initiating random number
np.random.seed(11)

#### Creating the dataset

# mean and standard deviation for the x belonging to the first class
mu_x1, sigma_x1 = 0, 0.1 

# constat to make the second distribution different from the first
x2_mu_diff = 0.35

# creating the first distribution
d1 = pd.DataFrame({'x1': np.random.normal(mu_x1, sigma_x1 , 1000),
                   'x2': np.random.normal(mu_x1, sigma_x1 , 1000),
                   'type': 0})

# creating the second distribution
d2 = pd.DataFrame({'x1': np.random.normal(mu_x1, sigma_x1 , 1000) + x2_mu_diff,
                   'x2': np.random.normal(mu_x1, sigma_x1 , 1000) + x2_mu_diff,
                   'type': 1})

data = pd.concat([d1, d2], ignore_index=True)


ax = sns.scatterplot(x="x1", y="x2", hue="type",
                      data=data)

To visualize the preceding code, we can run it in a Jupyter Notebook with the %matplotlib inline option set, obtaining the following plot:

The two distributions, colored according to their type labels

As we can observe, the two distributions are linearly separable, so it's an appropriate task for our model.

Now, let's create a simple class to implement the perceptron. We know that our input data has two pieces of input (the coordinates in our graph) and a binary output (the type of the data point), distinguished by different colors:

class Perceptron(object):
    """
    Simple implementation of the perceptron algorithm
    """

    def __init__(self, w0=1, w1=0.1, w2=0.1):

        # weights
        self.w0 = w0 # bias
        self.w1 = w1
        self.w2 = w2

We need two weights, one for each input, plus an extra one to represent the bias term of our equation. We will represent the bias as a weight that always receives an input equal to 1. This will make optimization easier.

We now need to add the methods to calculate the prediction to our class, which refers to the part that implements the mathematical formula. Of course, at the beginning, we don't know what the weights are (that's actually why we train the model), but we need some values to start, so we initialize them to an arbitrary value.

We will use the step function as our activation function for the artificial neuron, which will be the filter that decides whether the signal should pass:

     def step_function(self, z):
        if z >= 0:
            return 1
        else:
            return 0

The input will be then summed and multiplied by the weights, so we will need to implement a method that will take two pieces of input and return their weighted sum. The bias term is indicated by the term self.w0, which is always multiplied by the unit:

  def weighted_sum_inputs(self, x1, x2):
        return sum([1 * self.w0, x1 * self.w1, x2 * self.w2])

Now, we need to implement the predict function, which uses the functions we defined in the preceding code block to calculate the output of the neuron:

    def predict(self, x1, x2):
        """
        Uses the step function to determine the output
        """
        z = self.weighted_sum_inputs(x1, x2)

        return self.step_function(z)

Later on in this book, you will see that it is better to choose activation functions that are easily derivable, as gradient descent is the more convenient way to train a network.

The training phase, where we calculate the weights, is a simple process that is implemented with the following fit method. We need to provide this method with the input, the output, and two more parameters: the number of epochs and the step size.

An epoch is a single step in training our model, and it ends when all training samples are used to update the weights. For DNNs, it's often required to train with hundreds of epochs, if not more, but in our example, one will be fine.

The step size (or learning rate) is a parameter that helps to control the effect of new updates on the current weights. The perceptron convergence theorem states that a perceptron will converge if the classes are linearly separable, regardless of the learning rate. On the other hand, for NNs, the learning rate is quite important. When using gradient descent, it determines the speed of convergence and might determine how close to the minima of the error function you will be able to get. A large step size might make the training jump around the local minima, while a too small step size will make the training too slow.

In the following code block, it's possible to find the code for the method that we need to add to the perceptron's class to do the training:

def predict_boundary(self, x):
        """
        Used to predict the boundaries of our classifier
        """
        return -(self.w1 * x + self.w0) / self.w2

    def fit(self, X, y, epochs=1, step=0.1, verbose=True):
        """
        Train the model given the dataset
        """
        errors = []

        for epoch in range(epochs):
            error = 0
            for i in range(0, len(X.index)):
                x1, x2, target = X.values[i][0], X.values[i][1],  
                y.values[i]
                # The update is proportional to the step size and 
                the error
                update = step * (target - self.predict(x1, x2))
                self.w1 += update * x1
                self.w2 += update * x2
                self.w0 += update
                error += int(update != 0.0)
            errors.append(error)
            if verbose:
                print('Epochs: {} - Error: {} - Errors from all epochs: 
                {}'\.format(epoch, error, errors))

The training process calculates the weight update by multiplying the step size (or learning rate) by the difference between the real output and the prediction. This weighted error is then multiplied by each input and added to the corresponding weight. It's a simple update strategy that will allow us to divide the region in two and classify our data. This learning strategy is known as the Perceptron Learning Rule, and it's possible to demonstrate that if the problem is linearly separable, then the Perceptron Learning Rule will find a set of weights that solves the problem in a finite number of iterations.

We also added some error log functionality, so it's possible to test it with more epochs and see how the error is affected by it.

The perceptron class is now complete; we need to create a training and a test set to train the network and validate its results. It's best practice to also use a validation set, but in this example, we will skip it, as we want to focus on the training process. It's also a good practice to use cross-validation, but we will skip that, as well, as we will only use one training and one test set, for simplicity:

# Splitting the dataset in training and test set
msk = np.random.rand(len(data)) < 0.8

# Roughly 80% of data will go in the training set
train_x, train_y = data[['x1','x2']][msk], data.type[msk]
# Everything else will go into the valitation set
test_x, test_y = data[['x1','x2']][~msk], data.type[~msk]

Now that we have everything we need for the training, we will initialize the weights to a number close to zero and perform the training:

my_perceptron = Perceptron(0.1,0.1)

my_perceptron.fit(train_x, train_y, epochs=1, step=0.005)

To check the algorithm's performance, we can use the confusion matrix, which shows all of the correct predictions and the misclassifications. As it's a binary task, we will have three possible options for the result—correct, false positive, or false negative:

pred_y = test_x.apply(lambda x: my_perceptron.predict(x.x1, x.x2), axis=1)

cm = confusion_matrix(test_y, pred_y, labels=[0, 1])

print(pd.DataFrame(cm,
                   index=['True 0', 'True 1'], 
                   columns=['Predicted 0', 'Predicted 1']))

The preceding code block will produce the following output:

We can also visualize these results on the input space by drawing the linear decision boundary. To accomplish that, we need to add the following method in our perceptron class:

    def predict_boundary(self, x):
        """
        Used to predict the boundaries of our classifier
        """
        return -(self.w1 * x + self.w0) / self.w2

To find the boundary, we need to find the points that satisfy the equation: x2*w2 + x1*w1 + w0 = 0.

Now, we can plot the decision line and the data with the code:

# Adds decision boundary line to the scatterplot

ax = sns.scatterplot(x="x1", y="x2", hue="type",
                      data=data[~msk])
ax.autoscale(False)
x_vals = np.array(ax.get_xlim())
y_vals = my_perceptron.predict_boundary(x_vals)
ax.plot(x_vals, y_vals, '--', c="red")

Now, you should see the following diagram:

It's also possible to compute continuous output, not just binary; it's sufficient to use a continuous activation function, such as, for example, the logistic function. With this choice, our perceptron becomes a logistic regression model.