Examples

nn.zip

Reading

  • The Computational Beauty of Nature, Gary William Flake, Chapter 22 — Neural Networks and Learning
  • A Simple Perceptron

    We begin our discussion of Neural Networks with a simple Perceptron. A Perceptron is a computational model of a neuron. It consists of one or more inputs, a processor, and a single output. Each input is weighted and the processor creates the output via the following steps:

  • For every input, multiply that input by its weight.
  • Sum all of the weighted inputs.
  • Compute the output of the Perceptron based on that sum passed through an activation function.
  • A simple activation function is the sign of the sum. In other words, if the sum is a positive number, the output is 1, if it is negative, the output is -1. Here is some code that assumes an array of inputs, and an array of input weights.

    // Sum all values
    float sum = 0;
    int result = 0;
    for (int i = 0; i < inputs.length; i++) {
      sum += inputs[i]*weights[i];
    }
    if (sum > 0) result = 1;
    else result = -1;
    

    11162006165
    A Perceptron is typically used for classification. For example, let’s say a Perceptron has 2 inputs. Using a sign activation function, the output will either be a -1 or 1, i.e. the input data is classified according to the sign of the output. It belongs to either roup A (-1) or group B (1).

    Consider a line in 2 dimensional space. Points in that space can be classified as living on either one side of the line or another. While this is a somewhat trivial example (since we can determine which side a point lies on with some simple algebra), a Perceptron can be trained to learn how to recognize points on one side vs. another.

    In this example, there are two inputs, the x-coordinate of the point and the y-coordinate. However, in order for the Perceptron to function properly, we must also incorporate a “bias” input. A bias input always has the value of 1. Why is a bias necessary? Without a bias, if all inputs are 0, the only output ever possible will be a zero (because the sum of a whole bunch of zeros is just zero!). Therefore, the structure of our Perceptron is as follows:

    11162006166

    Based on the above structure, we can create a class to describe the Perceptron. The Perceptron will include an array of input weights as well as a learning constant (the learning constant will be used in the training algorithm described below.)

    class Perceptron {
      float[] weights;  // Array of weights for inputs
      float c;          // learning constant
    

    The constructor will create the weights array based on the number of inputs and set the learning constant.

      // Perceptron is created with n weights and learning constant
      Perceptron(int n, float c_) {
        weights = new float[n];
        // Start with random weights
        for (int i = 0; i < weights.length; i++) {
          weights[i] = random(-1,1);
        }
        c = c_;
      }
    

    The guessing algorithm is exactly what we developed above, the sum of all inputs multiplied by the weights. In the example below, the inputs arrive as the “vals” argument (an array of floating points) to the function. The last element of the array is therefore the bias and should have the value 1.

      // Guess -1 or 1 based on input values
      int guess(float[] vals) {
        // Sum all values
        float sum = 0;
        for (int i = 0; i < weights.length; i++) {
          sum += vals[i]*weights[i];
        }
        // Result is sign of the sum, -1 or 1
        int result = 1;
        if (sum < 0) result = -1;
        return result;
      }
    

    The point of doing all this, of course, is to be able to train the network to be able to make the right guess. This is accomplished by feeding the network inputs with a known “correct” output, and adjusting the weights according to the output’s error. The error is calculated as follows:

    ERROR = DESIRED OUTPUT - GUESS OUTPUT

    In this simple example, the error can only be one of three values: 0 (if the desired and guess are equal), 2 (if desired = 1 and guess = -1) and -2 (if desired = -1 and guess = 1). The following formula is then used to determine how much to change the weights. Note that if the error equals zero, the weights will not be changed since deltaweight will equal zero. This is how things should work since if the error equals zero, we got the right answer — don’t fix it if it ain’t broke!

    DELTAWEIGHT = LEARNINGCONSTANT * ERROR * INPUT
    NEWWEIGHT = OLDWEIGHT + DELTAWEIGHT

    In code, this translates to a training function:

      // Function to train the Perceptron
      // Weights are adjusted based on "desired" answer
      void train(float[] vals, int desired) {
        // Guess the result
        int result = guess(vals);
        // Compute the factor for changing the weight based on the error
        // Error = desired output - guessed output
        // Note this can only be 0, -2, or 2
        // Multiply by learning constant
        float weightChange = c*(desired - result);
        // Adjust weights based on weightChange * input
        for (int i = 0; i < weights.length; i++) {
          weights[i] += weightChange * vals[i];
        }
      }
    

    To get a sense of how the learning constant works, download the Perceptron example and change the value in the constructor:

    ptron = new Perceptron(3,0.004);
    

    A higher constant will cause the Perceptron to learn faster, but it may overshoot the optimal weights. A lower constant will cause it to learn more slowly, but it will likely find the optimal solution more accurately.

    Now that the Perceptron class is built, we can train it to classify points in space as described above. Let’s pick an arbitrary forumla for a line:

    y = 0.9x - 0.2

    We can make it into a function:

    // The function to describe a line
    float f(float x) {
      return x*0.9-0.2;
    }
    

    And then use it to compute a known result. The inputs are x, y, and 1 (bias) and the known result is the variable answer.

    float x = random(-2,2);
    float y = random(-2,2);
    int answer = 1;
    if (y < f(x)) answer = -1;
    

    Multi-Layered Neural Network

    The perceptron is good at solving simple problems that have a linearly separable solution. In other words, all of the points for output 1 fall on one side a line in 2D space and all of the points for output -1 fall on the other side. Perceptrons, however, cannot solve linearly inseparable problems.

    The most classic example of a linearly inseparable problem is XOR. XOR is the “exclusive” or, meaning if A or B, but not both A and B.

    AND
    ----
    FALSE FALSE --> FALSE
    TRUE  FALSE --> FALSE
    FALSE TRUE  --> FALSE
    TRUE  TRUE  --> TRUE
    
    OR
    ----
    FALSE FALSE --> FALSE
    TRUE  FALSE --> TRUE
    FALSE TRUE  --> TRUE
    TRUE  TRUE  --> TRUE
    
    XOR
    ---
    FALSE FALSE --> FALSE
    TRUE  FALSE --> TRUE
    FALSE TRUE  --> TRUE
    TRUE  TRUE  --> FALSE
    

    11192006175Consider the points (0,0), (1,0), (0,1), and (1,1) in space. For XOR, there is no way to draw a single line that separates the TRUE points: (1,0) and (0,1) from the FALSE points: (0,0) and (1,1). XOR is linearly inseparable! Therefore, a Perceptron cannot be trained to develop weights that will produce the right output (true or false) given two inputs.

    In order to solve XOR, we must create a multi-layered Perceptron with two inputs, a “hidden” layer, and an output. This hidden layer amounts to having multiple perceptrons attack the same problem, i.e. if OR is true (perceptron #1) and AND is not true (perceptron #2), then fire the XOR output.

    Multi-Layered NN

    The question now becomes one of training the multi-layered network to have the right weights. Unfortunately, this involves a bit of calculus. This page will offer only a brief overview of the solution (hopefully in friendly, non-threatening intuitive terms.) For a more involved explanation, read Chapter 22 in The Computational Beauty of Nature. You might also simply google multi-layered neural network backpropgation.

    Back Propogation

    One solution for training a multi-layered neural network is known as backpropogation. Certainly, there are other options for training a neural network. For example, a genetic algorithm could be used to search for optimal weights!

    In order to generate the output of the network, the inputs multiplied by the weights are summed and fed forward through the network. This is what we did with the perceptron, only there is an additional layer between the input and the output. Training the network involves taking the error (desired - guess) and feeding it backwards through the network. So that the final error informs both the weights between the hidden layer and the output as well as the weights between the inputs and the hidden layer.

    Let’s first look at how we feed forward through the network. Our Neural Network will be composed of the following classes:

  • Neuron — each Neuron object consists of a list of Connection objects (i.e. indicating what other Neurons it is connected to.) It has a function that calculates its output based on its inputs multiplied by relevant connection weights. It also knows whether it is a “bias” Neuron or just a regular ol’ Neuron. The example also has the subclasses InputNeuron, HiddenNeuron, and OutputNeuron for any functionality specific to those neurons
  • Connection — each Connection object contain two Neurons, the “from” Neuron and the “to” Neuron. It also has a variable of type double for its weight.
  • Network — the Network object consits of all the Neurons. An array of input neurons, hidden neurons, and one output neuron (although this could, and should, conceivably be an array.) It will also have functions that feed the network forward and train the network backwards.
  • So, the first step in feeding forward the network is to calculate the output of all of the neurons in the hidden layer. In the simple perceptron, the output was the sign of the sum of the inputs * connection weights. If that sum was positive, the output was +1, if it was negative, the output was -1. The sign of the sum was our activation function.

    In order for backpropogation to work well, we will need a fancier activation functio. We’ll use a sigmoid function. Here is an applet that graphs the sigmoid function. Looking at the graph, we can see that the sigmoid activation function tells us a lot more about our output than the sign activation function does. When the neuron fires close to the 1/2 point (the middle of the graph), the slope of the sigmoid curve is very steep. When it fires close to the the outer edges (left or right sides of the image on the right), the slope is level.
    The code for sigmoid is as follows:

    // Sigmoid function
    float f(float x) {
      return 1.0 / (1.0 + exp(-x));
    }
    

    The slope of the sigmoid curve is the value we want to know about since it tells us where we are on the activation continuum — close to the center or close to the edge? The slope is computed by the derivative (the rate of change of a function) of sigmoid. This should be a familiar concept to us — the rate of change (i.e. derivative) of location is velocity, of velocity is acceleration. The derivative of the sigmoid function is f(x) * (1-f(x)) and this value will be used in the backpropogation algorithm to change the weights.

    Ok, back to feeding forward! Now that we know the activation function is the sigmoid function, we can calculate the output for any given neuron as the sum of all the input values * the connection weights passed through sigmoid. Note our neurons may have connections coming in and going out, but to get our output, we only want the connections coming in! Also, if the neuron is a bias neuron, this calculation won’t be necessary since the output is always 1.

    for (int i = 0; i < connections.size(); i++) {
      Connection c = (Connection) connections.get(i);
      Neuron from = c.getFrom();
      Neuron to = c.getTo();
      // Is this connection moving forward to us
      if (to == this) {
          sum += from.getOutput()*c.getWeight();
        }
      }
    }
    output = f(sum);
    

    Now that the neurons know how to calculate their output, we can take input values and feed forward. First, we feed in the inputs, then we loop through all the hidden neurons and calculate their output, and then we calculate the output of the output neuron itself! Here is a snippet from the example (in the Network) class:

    public double feedForward(double[] inputVals) {
      // First put the input values into the network
      //System.out.println("\n\nFeeding input layer");
      for (int i = 0; i < inputVals.length; i++) {
        input[i].input(inputVals[i]);
      }
      // Then calculate the output for hidden layer
      //System.out.println("\n\nFeeding hidden layer");
      for (int i = 0; i < hidden.length-1; i++) {
        //System.out.println("\nNeuron " + i);
        hidden[i].calcOutput();
      }
    
      //System.out.println("\n\nFeeding output layer");
      output.calcOutput();
      return output.getOutput();
    }
    

    Since we have the output of the neural network, we can compare that output to a known answer (from XOR) and adjust the weights accordingly.

    With the simple perceptron, we calculated the error as:

    ERROR = DESIRED - GUESS

    We will now use the fact that the derivative of the sigmoid function tells us more information about the error and calculate the it as follows:

    ERROR = GUESS*(1-GUESS) * (DESIRED-GUESS)

    This error is applied to the weights between the output and the hidden layer.

    DELTAWEIGHT = LEARNINGCONSTANT * ERROR * HIDDEN OUTPUT
    NEWWEIGHT = OLDWEIGHT + DELTAWEIGHT

    The next step is to compute the error for the hidden neuron’s output. This is the hard part since we don’t have a known answer! We only know the what the final output of the entire network should be! The formula for the hidden layer’s error is therefore a function of the derivative of its own output and the overall error of the whole network.

    HIDDENERROR = HIDDEN OUTPUT*(1-HIDDEN OUTPUT) * (ERROR* WEIGHT TO FINALOUTPUT)

    Note that if there are multiple outputs we sum the errors multiplied by the weights to the outputs, but with XOR we don’t have to worry about this. Finally, we adjust the weights coming into the hidden layer with that HIDDENERROR.

    DELTAWEIGHT = LEARNINGCONSTANT * HIDDENERROR * INPUT NEURON
    NEWWEIGHT = OLDWEIGHT + DELTAWEIGHT

    And this is it! The key here is realizing that we must first compute the error for the entire network’s output. And then use that error to determine how much each connection along the way back to the beginning contributed to that error, and adjust the weights accordingly.

    Here’s the function to perform all of these calculations:

    public double train(double[] inputs, double answer) {
      double result = feedForward(inputs);
    
      double deltaOutput = result*(1-result) * (answer-result);
    
      // This is easier b/c we just have one output
      ArrayList connections = output.getConnections();
      for (int i = 0; i < connections.size(); i++) {
        Connection c = (Connection) connections.get(i);
        Neuron neuron = c.getFrom();
        double output = neuron.getOutput();
        double deltaWeight = output*deltaOutput;
        c.adjustWeight(LEARNING_CONSTANT*deltaWeight);
      }
    
      // ADJUST HIDDEN WEIGHTS
      for (int i = 0; i < hidden.length; i++) {
        connections = hidden[i].getConnections();
        double sum  = 0;
        for (int j = 0; j < connections.size(); j++) {
          Connection c = (Connection) connections.get(j);
          // FIRST SUM ALL THE CONNECTIONS TO OUTPUTS*deltaOutput (just one output)
          if (c.getFrom() == hidden[i]) {
            sum += c.getWeight()*deltaOutput;
          }
        }
        // Then adjust the weights coming in
        for (int j = 0; j < connections.size(); j++) {
          Connection c = (Connection) connections.get(j);
          if (c.getTo() == hidden[i]) {
            double output = hidden[i].getOutput();
            double deltaHidden = output * (1 - output);
            deltaHidden *= sum;   // Would sum for all outputs if more than one output
            Neuron neuron = c.getFrom();
            double deltaWeight = neuron.getOutput()*deltaHidden;
            c.adjustWeight(LEARNING_CONSTANT*deltaWeight);
          }
        }
      }
    
      return result;
    }
    

    One Response to “Neural Networks”  

    1. 1 Walter Kempf

      Thank you very much for this page!

      I am a final year undergraduate Computer Science student at the University of Pretoria (South Africa) and, although we have a very capable and knowledgeable lecturer, it is not always possible to arrange a meeting with him in a timely fashion to discuss the subtleties of neural networks. That is where this page filled in the gaps beautifully and unambiguously.

      I am very glad to have found this page and would, once again, like to thank you heartily for, not only creating it, but making it available to the public. It has certainly saved me countless extra hours searching for the “missing links” in my understanding of neural networks.

    Leave a Reply