Codility Ferrum 2018 Challenge

This post isn’t really on topic, but anyone with a general interest in algorithms might find it useful.

The following is my entry for Codility’s LongestNonNegativeSumSlice challenge.  The temporal and spatial complexity are both O(N). The solution uses a combination of dynamic programming and prefix summation.

The problem definition is that for a given array input parameter — a sequence of integers with possible elemental values of -1, 0 and 1 — return the size of the longest sequence in the array with a sum larger than or equal to zero. For example, the longest sequence in [-1, -1, 0, 0, 1] is four, because the sum of the second to fifth elements is zero.

This problem really highlights the importance of considering the specific details of the problem carefully before implementing a solution. It’s very easy to implement a brute force solution with temporal complexity on the order of N^2 or even chase red herrings by looking at the solutions for other similar looking problems, such as those looking for the size of the slice with the maximum sum or the same problem allowing input values from a wider range of values.

Ultimately, the key to the problem lies in the restricted range of possible array element values. Because the smallest and largest values that can be held in the array are -1 and 1 respectively, the sum can only change in either direction by a maximum of 1 in each iteration. Therefore, if you encounter a particular negative value more than once in the prefix sum, the distance between the element following the index recorded the first time you encountered that value and that of any subsequent time you encounter it, necessarily has a sum of 0.

If the input values are [-1,-1, 0, 0, 0, 1], the prefix sum sequence runs as -1, -2, -2, -2, -2, -1. From this sequence, you can see that the element following the first -1 prefix-sum value is the second and the element presenting the subsequent -1 prefix-sum value is the sixth. The size of the sub-array (slice) contributing to this sum is 5 (the number of elements between the second and sixth inclusive). We can also see that the sum of the values from the third to fourth and third to fifth are zero, using the same technique.

There is, of course, an edge case where the prefix-sum value does not fall below zero before the largest sub-array with a sum >= 0 is found. For example: [ 0, 0, 0, 0, 0,-1], [ 0, 0, 0, 0, 0, 0] or [ 1,-1, 1,-1, 1,-1,-1]. For this reason, we also need to keep track of the edge case where the prefix sum is >= 0.

The solution — written in Java here — is relatively simple. A hash map keeps track of the first index location in which each of the negative prefix-sum values are encountered. Each time we re-encounter a previously recorded negative prefix-sum value, we know we have found a sum of zero, so we check whether the size of the sub-array that produces the zero sum value is greater than the incumbent maximum slice and if it is, it becomes the incumbent maximum slice. If the prefix sum is >= 0, then the current index (plus one if the array is zero indexed) becomes the maximum slice.

import java.util.HashMap;
import java.util.Map;
class Solution {
    public int solution(int[] A) {
        int sum = 0;
        int maxslice = 0;
        Map<Integer,Integer> sumindex = new HashMap<Integer,Integer>();
         
        for(int i = 0; i<A.length; i++) { 
            sum += A[i];
            if(sum >= 0)
                maxslice = i + 1;
            else if(sumindex.containsKey(sum))
                maxslice = Math.max(maxslice, i - sumindex.get(sum));
            else
                sumindex.put(sum,i);
        }
        return maxslice;
    }
}

This implementation iterates over the list of array elements in A once (N iterations). The containsKey and get methods of the HashMap have O(1) temporal complexity, so the temporal complexity of this implementation is therefore O(N) despite the challenge statement predicting worst-case complexity of O(N log N). Spatial complexity is O(N), which is in line with Codility’s expectation for worst-case space complexity. The entry received a score of 100% for correctness and scalability.

Post Edit – March 28, 2018:
Beck has pointed out that, since the number of elements in sumindex can’t exceed the size of the input array, we could replace the HashMap with a regular array. In doing so, we just have to index by the absolute value of the sum minus one (to account for the zero indexation of the array) rather than the sum itself. The actual sum will be negative any time we want to use the array.

The following is an updated implementation:

class Solution {
    public int solution(int[] A) {
        int sum = 0;
        int maxslice = 0;
        int[] sumindex = new int[A.length];
         
        for(int i = 0; i<A.length; i++) { 
            sum += A[i];
            int idx=Math.abs(sum)-1;
            if(sum >= 0)
                maxslice = i + 1;
            else if(sumindex[idx]!=0)
                maxslice = Math.max(maxslice, i - sumindex[idx]);
            else
                sumindex[idx]=i;
        }
        return maxslice;
    }
}

Post Edit – April 16, 2018:
Alternatively, to remove the need to calculate the idx variable, negate the sum and use that for indexation:

class Solution {
    public int solution(int[] A) {
        int sum = 0;
        int maxslice = 0;
        int[] sumindex = new int[A.length];
          
        for(int i = 0; i<A.length; i++) { 
            sum -= A[i];
            if(sum <= 0)
                maxslice = i + 1;
            else if(sumindex[sum-1]!=0)
                maxslice = Math.max(maxslice, i - sumindex[sum-1]);
            else
                sumindex[sum-1]=i;
        }
        return maxslice;
    }
}

Artificial Neural Networks – Part 2: MLP Implementation for XOr

Introduction

As promised in part one, this second part details a java implementation of a multilayer perceptron (MLP) for the XOr problem.  Actually, as you will see, the core classes are designed to implement any MLP implementation with a single hidden layer.

First, it will help to introduce a quick overview of how MLP networks can be used to make predictions for the XOr problem.  For a more detailed explanation, please review part one of this post.

The image at the top of this article depicts the architecture for a multilayer perceptron network designed specifically to solve the XOr problem.  It contains integer inputs that can each hold the value of 1 or 0, a hidden layer with two units (not counting the dashed bias units which we will ignore for for now for the sake of simplicity) and a single output unit.  Once the network is trained, the output unit should predict the output you would expect if you were to parse the input values through an XOr logic gate.  That is, if the two input values are not equal (1,0 or 0, 1), the output should be 1, otherwise the output should be 0.

Forward propagation is completed first, which parses the input values through the network and eventually to the output unit, making a prediction of what the output should be, given the input.  This is then compared to the expected output and the prediction error is calculated.  Part of the process of passing through the network involves multiplying the values by small weights.  If the weights are correctly configured, the output prediction will be correct.

This configuration is not a manual process, but rather something that occurs automatically through a process known as backward propagation.  Backward propagation considers the error of the prediction made by forward propagation and parses those values backwards through the network, adjusting the weights to values that slightly improve prediction accuracy.

Forward and backward propagation are repeated for each training example in the dataset many times until the weights of the network are tuned to values that result in forward propagation producing accurate output.

The following sections will provide more detail on how the process works by detailing a fully functional Java implementation.

Class Structure and Dependencies

The implementation consists of four classes.  These include ExecuteXOr, which is the highest level class containing all of the XOr specific code; MultiLayerPerceptron, which contains all of the code used to implement a generic multilayer perceptron with with a single hidden layer; iActivation, the activation interface and SigmoidActivation, which is a concrete activation class specifically implementing the sigmoid activation function.

MLP Implementation UML Class Diagram Showing Dependencies and Methods

Class: ExecuteXOR

The first class we will review is ExecuteXOr.  This is the highest level class, containing the main method used to kick off the the experiment.  Here the user defines the dimensions of the input and output values, instantiates the activation class and MultiLayerPerceptron class.  Note that, for the XOr problem, we have two input units, two hidden units and one output unit.  As such, X is a two dimensional array, y is a one dimensional array and the first three input parameters for the MultilayerPerceptron class denote the dimensions of the neural network. These are set to 2 (input layer), 2 (hidden layer) and 1 (output layer).

package Experiments;

import NeuralNets.MultilayerPerceptron;
import Activation.SigmoidActivation;

public class ExecuteXOr {
	public static double[][] X; // Matrix to contain input values
	public static double[] y; // Vector to contain expected values
	public static MultilayerPerceptron NN; // Neural Network

The main method instantiates the dimension of the X and y arrays, along with the multilayerPerceptron class, before kicking off the methods used to initialise the dataset, train and test the network. We have chosen sigmoid as the activation function in this case.  However, as implied by the existence of the Activation interface, there are many other possible activation functions.

The first three parameters of the MultilayerPerceptron constructor define the dimensions of the network. In this case we have defined two input units, two hidden units and one output unit, as is required for this architecture.

The input parameters for trainNetwork define the maximum number of iterations to complete during training and the target error rate. In this case, no more than 200 thousand iterations (epochs) will be completed and training will also stop if the error rate drops to 0.01 or less.

	public static void main(String[] args) {
		X=new double[4][2]; // Input values
		y=new double[4]; // Target values
		
		// Instantiate the neural network class
		NN = new MultilayerPerceptron(2,2,1,new SigmoidActivation());
			
		initialiseDataSet();
		trainNetwork(200000,0.01);
		testNetwork();
	}

The initialiseDataSet method provides the values for the X and y arrays where X contains all possible input values and y contains the expected output, given each set of input values. See part one of this post for more details on the structure of the XOr problem to understand why these values are as they are, but it is important to understand that this is an exhaustive list of possible inputs and outputs, rather than a sample of possible values.  Because we’re not using a sample of possible values we need not be concerned about overfitting and thus do not require separate training and testing datasets.

	private static void initialiseDataSet(){
		X[0][0] = 0;
		X[0][1] = 0;
		y[0] = 0;
		
		X[1][0] = 0;
		X[1][1] = 1;
		y[1] = 1;
		
		X[2][0] = 1;
		X[2][1] = 0;
		y[2] = 1;

		X[3][0] = 1;
		X[3][1] = 1;
		y[3] = 0;
	}

For each iteration (epoch), the trainNetwork method iterates over every training example, completing a forward propagation and backward propagation step for every training example before updating the weights. Weights are updated with a learning rate of 0.1, which slows the rate at which the weights of the network are updated. This helps to prevent the system from overshooting the ideal weight setup during updates.

Another potential issue to rectify is the tendency, depending on how the weights are initialised, for the algorithm to occasionally enter a search space that is difficult to exit, resulting in very slow progress in training. To avoid these scenarios, I have borrowed a method from combinatorial optimisation known as random restart. After every 500th epoch, a check is performed to see if the error score is improving too slowly. If progress is extremely slow, the weights are reinitialised, effectively kicking the incumbent solution into a new neighbourhood from which we will hopefully achieve better training performance.

At each epoch a check is also performed to see if the current error exceeds the target. If so, weights we have identified are close enough to the optimal configuration to produce highly accurate predictions, so the method is exited.

	private static void trainNetwork(int epochs, double targetError){
		double error=0;
		double baseError=Double.POSITIVE_INFINITY;

		// Iterate over the number of epochs to be completed
		for(int epoch = 0; epoch&amp;amp;lt;epochs; epoch++){
			error=0;
			
			// Iterate over all training examples
			for(int i = 0; i&amp;amp;lt;X.length; i++){
				// Feed forward
				NN.forwardProp(X[i]); 
				
				// Run backpropagation and add the squared error to the sum of error for this epoch
				error+=NN.backProp(y[i]); 
				
				// update the weights
				NN.updateWeights(0.1); 
			}
			
			// Every 500th epoch check whether progress is too slow and if so, reset the weights
			if(epoch % 500==0) {
				// If not
				if(baseError-error&amp;amp;lt;0.00001) {
					NN.kick(); // Kick the candidate solution into a new neighbourhood with random restart
					baseError=Double.POSITIVE_INFINITY;
				}
				else
					baseError=error; // Record the base error
			}			
			
			// Print the sum of squared error for the current epoch to the console
			System.out.println(&amp;amp;quot;Epoch: &amp;amp;quot; + epoch + &amp;amp;quot; - Sum of Squared Error: &amp;amp;quot; + error);
			
			// If the error is smaller than 0.01 stop training
			if(error&amp;amp;lt;targetError)
				break;
		}
		   
	}

Once the network has been trained, we can test its accuracy.   The testNetwork method iterates over each of the examples in X and runs forward propagation, which returns the neural network’s prediction for what the output should be. The prediction is in the form of a value falling between something very close to zero and something very close to one. Any value of 0.5 or higher is deemed to predict 1, while anything lower than 0.5 is deemed to predict 0. This is compared to the actual expected output and the proportion of correct predictions (prediction accuracy) is returned to the console. If the network is working correctly, the returned value should always be 1.0.

	private static void testNetwork(){
		double correct=0;
		double incorrect=0;
		double output=0;
		
		// Iterate over the testing examples (which happen to double as training examples)
		for(int i = 0; i&amp;amp;lt;X.length; i++){ output=NN.forwardProp(X[i])[0]; // Feed forward to get the output for the current example // If the output is &amp;amp;gt;= 0.5, we deem the output to be 1.0
			if(output&amp;amp;gt;=0.5)
				output=1.0;
			else // Otherwise it is 0.0
				output=0.0;
			
			// If the output value matches the target
			if(output==y[i])
				correct++; // Increment the number of successful classifications
			else
				incorrect++; // Increment the number of unsuccessful classifications
		}
		
		// Print the test accuracy to the console
		System.out.println(&amp;amp;quot;Test Accuracy: &amp;amp;quot; + String.valueOf((correct/(incorrect+correct))));
	}
}

Interface: iActivation

The iActivation interface defines the methods that must be implemented by any activation class.  The only two methods required are getActivation and getDerivative.

An implementation of the getActivation method takes as input, the sum of values input into a single unit and outputs the activation value for the unit.  This is used in forward propagation to compress the sum of values received by  a unit, usually into a value of 0 or 1, or something very close to 0 or 1.  However, there are exceptions such as the tanh activation function which compresses the input value into a value of between -1 and 1.

The getDerivative method represents the computed derivative of the activation function.  This is used by backward propagation to influence how the weights are to be updated in order to change them to a value that slightly improves the accuracy of the network.  The function determines the direction in which the weights should be changed.  How the partial derivative is computed is beyond the scope of this post, but anyone interested may wish to review this report.

package Activation;

public interface iActivation {
	double getDerivative(double x);
	double getActivation(double x);
}

Class: SigmoidActivation

The SigmoidActivation class is an implementation of iActivation, implementing the activation and derivative Sigmoid functions.

The Sigmoid activation function is g(x)=\frac{1}{1+e^{-x}}, while its derivative is {g'(x)=x(1-x)}.

Plotting the Sigmoid function clearly shows how it converts almost all presented values to something very close to 0 or 1.  The value 0 is converted to 0.5, but any values larger or smaller quickly approach the plateaus toward the left and the right of the plot.

Sigmoid Plot

The derivative calculates the direction of the slope of this line (gradient), making it possible to force the weight updates into a direction that improves the accuracy of the classification process.  Because different activation functions compress their input values in different ways, they require different derivative functions to calculate their respective gradients.  The use of polymorphism here allows for the potential implementation of any activation function to be encapsulated alongside its own derivative function.

package Activation;

public class SigmoidActivation implements iActivation {

	@Override
	public double getDerivative(double x) {
		
		return (x * (1.0-x));
	}

	@Override
	public double getActivation(double x) {
		double expVal=Math.exp(-x);
	
		return 1.0/(1.0+expVal);
	}
	
}

Class: MultilayerPerceptron

The MultilayerPerceptron class encapsulates all of the data for the neural network and the methods required to intialise and propagate the network.

Data held by the class include the dimensions of the network (InputCount, hiddenCount and outputCount), the first and second layer of weights (W1 and W2), The first and second layer of delta weights (DW1 and DW2), the pre-activation values for each of the units (Z1 and Z2) the input values (InputValues) and the activation values output by each subsequent layer of the network (HiddenValues, OutputValues).

package NeuralNets;

import java.util.Random;
import Activation.iActivation;

public class MultilayerPerceptron {
	int inputCount; // Number of input units
	int hiddenCount; // Number of hidden units
	int outputCount; // Number of output units
	
	// Weight matrices
	double[][] W1; // First Layer of Weights
	double[][] W2; // Second Layer of Weights
	
	// Delta weight matrices
	double[][] DW1; // First Layer of Delta Weights
	double[][] DW2; // Second Layer of Delta Weights
	
	// The values at the time of activation (the values input to the hidden and output units)
	double[] Z1; // First Layer pre-activation values
	double[] Z2; // Second Layer pre-activation values
	
	iActivation activation=null; // Activation Class

	// The values actually output by the units in each layer after being squashed by the activation function
	double[] InputValues; // Inputs layer Values
	double[] HiddenValues; // Hidden layer Values
	double[] OutputValues; // Output layer Values

The constructor method for the class receives the dimensions of the network, along with the activation function, as input and proceeds to use that information to initialise the variables that are global to the class.

	public MultilayerPerceptron(int inputCount, int hiddenCount ,int outputCount, iActivation activation){
		this.inputCount=inputCount;
		this.hiddenCount=hiddenCount;
		this.outputCount=outputCount;
		this.activation=activation;
		
		// Initialise the first layer weight and delta weight matrices (accounting for bias unit)
		W1 = initialiseWeights(new double[hiddenCount][inputCount+1]);
		DW1 =initialiseDeltaWeights(new double[hiddenCount][inputCount+1]);
		
		// Initialise the second layer weight and delta weight matrices (accounting for bias unit)
		W2 = initialiseWeights(new double[outputCount][hiddenCount+1]);
		DW2 = initialiseDeltaWeights(new double[outputCount][hiddenCount+1]);
		
		// Initialise the activation vectors
		Z1 = initialiseActivations(new double[hiddenCount]);
		Z2 = initialiseActivations(new double[outputCount]);
		
		// Initialise the hidden and output value vectors (same dimensions as activation vectors)
		OutputValues=Z2.clone();
		HiddenValues=Z1.clone();
	}

The initialisation methods randomly initialise the weight arrays (W1 and W2) within a reasonable range of small values relative to the size of the InputValues array. All other arrays are initialised with zero values, while the kick method exists to accommodate the random restart process, essentially reinitialising the weight arrays with random values. This is called when the network is stuck in a search neighbourhood where training occurs at an extremely slow rate.

	private double[][] initialiseWeights(double w[][]){
		Random rn = new Random();

		double offset = 1/(Math.sqrt(inputCount));
		
		for(int i=0; i&amp;lt;w.length; i++){
			for(int j=0; j&amp;lt;w[i].length;j++){ // No bias unit
				w[i][j]=offset-rn.nextDouble();

			}
		}
		
		return w;
	}

	private double[][] initialiseDeltaWeights(double w[][]){
	
		for(int i=0; i&amp;lt;w.length; i++){
			for(int j=0; j&amp;lt;w[i].length;j++){ 
								
				w[i][j]=0;
			}
		}
		
		return w;
	}
	
	private double[] initialiseDelta(double d[]){
		for(int i=0; i&amp;lt;d.length; i++){
			d[i]=0;
		}
		
		return d;
	}

	private double[] initialiseActivations(double z[]){
		for(int i=0; i&amp;lt;z.length; i++){
			z[i]=0;
		}
		return z;
	}

	public void kick(){
		// Kick the candidate solution into a new neighbourhood (random restart)
		W2 = initialiseWeights(new double[outputCount][hiddenCount+1]); // Account for bias unit
		W1 = initialiseWeights(new double[hiddenCount][inputCount+1]); // Account for bias unit
	}

The forwardProp method completes forward propagation. First a bias unit is added to the input values and hidden unit values. The non-bias pre-activation values (Z values) are calculated by summing the input values after multiplying them by their weights. The hidden unit activations are then calculated by parsing the resulting Z values through the activation function.

The hidden layer activations then serve as input to the output layer and the process is repeated for that layer with the activation for the output unit serving as the prediction.

	public double[] forwardProp(double[] inputs){
		this.InputValues = new double[(inputs.length+1)];
		
		// Add bias unit to inputs
		InputValues[0]=1;
		for(int i = 1; i&amp;lt;InputValues.length; i++){
			this.InputValues[i]=inputs[i-1];
		}
	
		HiddenValues = new double[hiddenCount+1];
		HiddenValues[0]=1; // Add bias unit
		
		// Get hidden layer activations
		for(int i = 0; i&amp;lt;Z1.length;i++){
			Z1[i]=0;
			for(int j = 0; j&amp;lt;InputValues.length; j++){
				// Hidden Layer Activation
				Z1[i]+=W1[i][j] * InputValues[j]; 
			}

			// Hidden Layer Output value
			HiddenValues[i+1]= activation.getActivation(Z1[i]);
		}
		
		// Get output layer activations
		for(int i = 0; i&amp;lt;Z2.length;i++){
			Z2[i]=0;
			for(int j=0; j&amp;lt;HiddenValues.length; j++){
				// Get output layer Activation
				Z2[i] += W2[i][j] * HiddenValues[j]; 
			}
			// Get output layer output Value
			OutputValues[i]=activation.getActivation(Z2[i]); 
		}

		return OutputValues;
	}

There are two backProp methods for completing backward propagation, the first accommodating the use of a single target value and the second accommodating an array of target values, for architectures containing more than one output unit. Delta weights are the values by which the weights can be updated to improve the performance of the network. The outputs of the backward propagation process include the delta weights for the first and second layer of weights in the network and the half of the sum of the outputs of the cost function, otherwise known as the error.

As the name suggests, backward propagation begins with the output layer and propagates backwards through the network. The error is calculated using the classic squared error cost function, in which half of the squared sum of the difference between the target values and the actual predictions is calculated. Squaring the error ensures that all error values are both positive in value and are magnified, exaggerating the degree of error.

The deltas for the second layer of weights are calculated by taking the difference between the target values and predictions, multiplying that by the activation derivative of the predictions.

Once the second layer delta weights have been calculated, the first layer delta weights are also updated. These are calculated by multiplying the second layer deltas by the second layer of weights and multiplying that by the activation derivative of the hidden layer activation values. These values are then multiplied by the input layer values.

Worth noting, is that the backProp method does not actually update the weights, it merely provides the delta weight values that can later be used to perform the update.

	// Support a single double value as the target
	public double backProp(double targets){
		double[] t = new double[1];
		t[0]=targets;
		return backProp(t);
	}
	
	public double backProp(double[] targets){
		double error=0;
		double errorSum=0;
		double[] D1; // First Layer Deltas
		double[] D2; // Second Layer Deltas
		D1 = initialiseDelta(new double[hiddenCount]);
		D2 = initialiseDelta(new double[outputCount]);
		
		// Calculate Deltas for the second layer and the error
		for(int i = 0;i&amp;lt;D2.length; i++){
			D2[i]=(targets[i]-OutputValues[i]) * activation.getDerivative(OutputValues[i]);
			
			errorSum+= Math.pow((targets[i]-OutputValues[i]),2); // Squared Error
		}
		error = errorSum / 2;
		
		// Update Delta Weights for second layer
		for(int i = 0; i&amp;lt;outputCount; i++){
			for(int j = 0; j&amp;lt;hiddenCount+1; j++){ 
				DW2[i][j] += D2[i] * HiddenValues[j];
			}
		}
		
		// Calculate Deltas for first layer of weights
		for(int j = 0; j&amp;lt;hiddenCount; j++){
			for(int k = 0; k&amp;lt;outputCount; k++){
				D1[j] += (D2[k] * W2[k][j+1]) * activation.getDerivative(HiddenValues[j+1]);
			}
		}
		
		// Update first layer deltas
		for(int i=0; i&amp;lt;hiddenCount;i++){
			for(int j=0; j&amp;lt;inputCount+1; j++){ // Account for bias unit
				DW1[i][j] += D1[i] * InputValues[j];

			}
		}
		return error;
	}

In this particular implementation of ExecuteXOr, the weights are updated in every epoch. However, in some circumstances it can be advantageous to update the weights less regularly. The updateWeights method allows for updating of the weights to occur at any interval as directed by the calling method. Each time the weights are updated, the DW1 and DW2 arrays are reset to zero values. For as long as the weights are not updated, the delta weights accumulate.

The only input parameter for the method is the learningRate. When learningRate is set to 1, the entire delta weight is used in the update. When it is set to a fractional value, the rate at which the weights are updated is reduced. ExecuteXor is using a learning rate of 0.1, ensuring that only 10% of the delta weight value is used to update the weights at each epoch.

	public void updateWeights(double learningRate){
		// Update first layer of weights
		for(int i = 0; i&amp;lt;W2.length; i++){
			for(int j = 0; j&amp;lt;W2[i].length; j++){
				W2[i][j] += learningRate * DW2[i][j];
			}
		}
		
		// Update second layer of weights
		for(int i = 0; i&amp;lt;W1.length; i++){
			for(int j = 0; j&amp;lt;W1[i].length; j++){
				W1[i][j] += learningRate * DW1[i][j];
			}
		}
		
		// Reset delta weights
		DW1 = initialiseDeltaWeights(new double[hiddenCount][inputCount+1]);
		DW2 = initialiseDeltaWeights(new double[outputCount][hiddenCount+1]);
	}

}

Conclusion

This post is the second part of an article on multilayer perceptron (MLP) artificial neural networks for the XOr problem.  It has demonstrated a Java implementation of an MLP.  The article began with a brief recap of the XOr problem and a summary of the processes used to train an MLP, including a high-level discussion of forward propagation and backward propagation.

The class structure and dependencies for the implementation were then detailed before stepping through each class in the implementation, complete with Java code and detailed explanations of each of the methods.

Approaches to Big Combinatorial Optimisation Problems

Combinatorial optimisation is a problem category in which the goal is to find an optimal combination of entities.  A classic example is the travelling salesman problem, in which the goal is to plot the most efficient route a salesman can take to visit all of the towns in scope, given the locations of towns and distances between them.

Problems such as this can be solved to global optimality using deterministic methods only when the search space involved is very small.  However, it is almost always the case that any real-world problem instance will have a search space much too large to expect to find global optimum (Talbi et al, 2006).  So before a viable approach can be postulated for a combinatorial optimisation problem, it is important to understand the nature of the problem complexity through the lens of computational complexity theory.

For any given problem under consideration, computational complexity theory asks whether there is an algorithm that can effectively solve it (Ogihara, 1999).  Attempts are then made to categorise the problem on the basis of its computational difficulty relative to other problems (Hromkovic, 2010).

Temporal complexity (or time complexity) is a measure of complexity based on the relationship between an algorithm’s running time and the size of its input.  In other words, an algorithm’s temporal complexity is a function of its input-size (Hromkovic, 2010).  ‘Time’ in this instance refers to the number of steps that the algorithm must take in order to run to completion (Mayordomo, 2004).  In order for an algorithmic solution to be considered viable, its temporal complexity must be polynomial (Papadimitriou et al, 1998).  In such cases, the temporal-complexity is said to be “polynomial-time” or P (Demaine, 2011).

For many problems, there is no known algorithm in P, but individual candidate solutions can nonetheless be verified in polynomial-time (Mayordomo, 2004).  That is, although a non-deterministic process might be used to come up with potential sub-optimal solutions, the process of scoring each individual candidate for comparison can be completed deterministically.  This is known as deterministic verification (Hromkovic, 2010).  When formulated as decision problems — by ensuring the output is binary — such problems are regarded as nondeterministic-polynomial–time problems or NP (Mayordomo, 2004).

All P problems also fall into NP because they too have nondeterministic solutions (Demaine, 2011).  At present, it is not known whether the reverse is the case and all NP problems have solutions in P (Mayordomo, 2004).  The question of whether P=NP is one of the most important problems in mathematics and computer science today (Sipser, 1992).  Rather than attempt to answer this millennium prize question, in most contexts it makes sense to assume that P≠NP.  This approach assumes that not all problems in NP have polynomial-time solutions waiting to be discovered.

Two additional temporal complexity classes to consider are the NP-Hard and NP-Complete classes.  NP-Hard is the set of all problems that are at least as difficult as the hardest problems in NP (Demaine, 2011).  This includes problems that are not in NP, such as combinatorial optimisation problems.  The NP-Complete class on the other hand, is the set of all NP-Hard problems that are also in NP and thus are formulated as decision problems (Demaine, 2011).

Relationships between the relevant classes of temporal-complexity (assuming P≠NP)

All NP-Complete problems can be solved by exhaustively checking every candidate solution in the search space.  However, the search spaces tend to be far too large for this approach to be considered practical (Woeginger, 2003).  NP-Hard and NP-Complete heuristics must therefore use more efficient methods to explore the solution space.

The first thing to go is the idea of finding the globally optimal solution.  Instead we search for any high-quality sub-optimal solution.   Furthermore, with no known polynomial-time algorithm that can solve them, approaches to these problems tend to use deterministic verification.   Most commonly by applying a standard metaheuristic.

Metaheuristics are generalised (high-level) heuristics.  They fall into the major categories of local search and evolutionary algorithms (Talbi et al, 2006).  Examples of local search-based metaheuristics include local search itself, tabu search and simulated annealingGenetic algorithms on the other hand, are examples of evolutionary algorithms (Talbi et al, 2006).

Local search algorithms explore the search space one solution at a time, improving on the incumbent solution by making a single change at each iteration until the locally optimal solution is found.  This creates the problem of the heuristic getting stuck when it converges on the locally optimal solution.  For this reason, some local search metaheuristics include means by which to break away from the local optima, allowing them to explore more than one locally optimal solution.  These tend to produce better results.

The evolutionary approach involves evaluating many different candidate solutions all at once and includes some means of excluding poor solutions and splitting and re-combining good solutions in the hope of improving on them in the next generation.  Genetic algorithms are the most common example.

Metaheuristics have the advantage of applicability to a variety of combinatorial optimisation problems.  For this reason, the metaheuristic approach is viewed as a generic approach to problem solving.  Furthermore, metaheuristic development is relatively simple and considerably cheaper than bespoke development (Hromkovic, 2010), though the cost saving comes at the expense of performance (Dowsland, 1998; Talbi, 2006).

In recent years a new category of metaheuristic has emerged in an attempt to create a heuristic that is both generalisable and does not suffer from the deminished performance of classic metaheuristics.  These are known as hyper-heuristics.

The term ‘hyper-heuristic’ was coined to describe metaheuristics that invoke other heuristics (Cowling et al, 2001), a concept that has existed since the 1960s (Ross, 2005), however, the definition was recently expanded to include algorithms that generate heuristics (Burke et al, 2010).

A hyper-heuristic algorithm approaches a problem by calling on a series of low-level heuristics to generate solutions (Burke et al, 2009). While a metaheuristic search space is made up of problem solutions, the hyper-heuristic search space is composed of heuristic algorithms (Burke et al, 2009).  Hyper-heuristics have been found to exhibit generality well beyond that of being able to provide state-of-the-art results for multiple problem instances within a given problem category.  In fact, there are examples of individual hyper-heuristic implementations that can actually solve many distinct problem types.  That is, the same implementation deployed to schedule employees to rosters can be deployed to solve the travelling salesman, knapsack, vehicle routing and many other problem types.

Future posts in this series will explore each of the most commonly used metaheuristics in detail.

Sources

Burke, E. K. Hyde, M. Kendall, G. Ochoa, G. Ozcan, E. Qu, R. (2009). A survey of hyper-heuristics. Computer Science Technical Report No. NOTTCS-TR-SUB-0906241418-2747, School of Computer Science and Information Technology, University of Nottingham.

Burke, E. K. Hyde, M. Kendall G., Ochoa, G. Özcan, E. Woodward J. R. (2010). A Classification of Hyper-heuristic Approaches. In Handbook of Metaheuristics, International Series in Operations Research & Management Science, 146, 449-468.

Cowling, P. Kendall, G. Soubeiga, E. (2001). A hyper-heuristic approach to scheduling a sales summit. In Practice and Theory of Automated Timetabling III, 176-190.

Demaine, E. (2011). Introduction to Algorithms: Lecture 23 – Computational Complexity. Accessible from <http://www.youtube.com/watch?v=moPtwq_cVH8> [Accessed on 22 June 2013]. Massachusetts Institute of Technology.

Dowsland, K. A. (1998). Off-the-Peg or Made-to-Measure? Timetabling and Scheduling with SA and TS. In Practice and Theory of Automated Timetabling II, 37-52

Hromkovic, J. (2010). Algorithmics for Hard Problems: Introduction to Combinatorial Optimization, Randomization, Approximation and Heuristics. Zurich Switzerland, Springer.

Mayordomo, E. (2004). P versus NP. Monografías de la Real Academia de Ciencias Exactas, Físicas, Químicas y Naturales de Zaragoza, (26), 57-68.

Papadimitriou, C. H. Steiglitz, K. (1998). Combinatorial Optimization: Algorithms and Complexity, New York, Dover Publications.

Ross, P. (Ed). (2005). Hyper-heuristics. In Search methodologies, 529-556.

Sipser, M. (1992). The history and status of the P versus NP question. In Proceedings of the twenty-fourth annual ACM symposium on Theory of computing, 603-618.

Talbi, E. (Ed). (2006). Parallel Combinatorial Optimization, Villeneuve d’Ascq, France, Wiley

Ogihara, M 1999, ‘COMPUTATIONAL COMPLEXITY THEORY’, Encyclopaedia Of Electrical & Electronics Engineering, 3, 618-628.

Woeginger, G. J. (2003). Exact algorithms for NP-hard problems: A survey. In Combinatorial Optimization—Eureka, You Shrink! (pp. 185-207).