Feedforward neural networks

One of the main drawbacks of the perceptron algorithm is that it's only able to capture linear relationships. An example of a simple task that it's not able to solve is the logic XOR. The logic XOR is a very simple function in which the output is true only when its two pieces of binary input are different from each other. It can be described with the following table:

The preceding table can be also represented with the following plot:

The XOR problem visualized

In the XOR problem, it's not possible to find a line that correctly divides the prediction space in two.

It's not possible to separate this problem using a linear function, so our previous perceptron would not help here. Now, the decision boundary in the previous example was a single line, so it's easy to note that in this case, two lines would be sufficient to classify our input.

But now, we have a problem: if we feed the output of our previous perceptron to another one, we will still only have a linear combination of the input, so in this way, we will not be able to add any non-linearity.

You can easily see that if you add more and more, you will be able to separate the space in a more complex way. That's what we want to achieve with Multilayer Neural Networks:

It's possible to correctly separate the space with two distinct linear functions

Another way to introduce non-linearity is by changing the activation function. As we mentioned before, the step function is just one of our options; there are also non-linear ones, such as Rectified Linear Unit (ReLU) and the sigmoid. In this way, it's possible to compute continuous output and combine more neurons into something that divides the solution space.

This intuitive concept is mathematically formulated in the universal approximation theorem, which states that an arbitrary continuous function can be approximated using a multilayer perceptron with only one hidden layer. A hidden layer is a layer of neurons in between the input and output. This result is true for a variety of activation functions, for example, RELU and sigmoid.

A Multilayer Neural Network is a particular case of a feedforward neural network (FFNN), which is a network that has only one direction, from the input to the output.

One of the main differences is how you train an FFNN; the most common way is through backpropagation.