Thomas Ip

Strongly typed uniform crossover

Adventures in Genetic Programming #1
Abstract art of dna double helix

In genetic algorithms, crossover is the primary method of combining genetic materials from two parents to create a new offspring. The uniform crossover operator in particular is frequently used in genetic programming (GP) due to its ability to perform global search1 while having minimal bias with regard to node selection between parents. Despite these advantages much of the development of strongly typed GP (STGP) had been done using the older one-point crossover, which is a biased operator. There are also no information on the internet regarding the implementation of a typed uniform crossover operator, so here is one of my design.

Original algorithm

The uniform crossover operator works as follows. Two parent trees are aligned at the root and jointly traversed to identify a common region - if the pair of nodes have the same arity then their children are visited, otherwise we have reached the boundary of the common region. For each pair of nodes in said region, we perform an inner swap (ie only swap the node label) with uniform probability. If we are at the boundary then we instead swap the entire subtree.

Example of uniform crossover in genetic programming

In other words, for each node we visit a coin flip determines whether a swap should be performed. At the boundary a swap consists of an exchange of subtrees whereas only node labels are touched for the other parts of the common region.

Strongly-typed variant (STGP uniform crossover)

Adapting the original operator to work under type constrained environments require some rethinking, but the algorithm remains very simple to understand. Lets first examine why the original one fails to operate on a typed expression tree.

Type errors caused by using untyped uniform crossover in a typed environment

As seen above each children contains a type error. This illustrates the closure property that traditional GP relies on – that each non-terminal can accept arguments of any data type, which often just means using the same type for all terminals and non-terminals. STGP on the other hand is all about constraining the types of arguments and return value. To satisfy this limit, (inner) swapping can only be done when two nodes have the exact same signature. This drastically reduce the number of possible swaps, hence less genetic materials can be exchanged. We mitigate this problem by doing away with the concept of a common region and inner swaps. Instead we recursively perform a subtree swap at every pair of nodes with uniform probability, on the condition that their return types are compatible. An optional improvement here is to visit the children of a pair of nodes even if their arities are different. For example, if operator a and operator b have arities 2 and 3 respectively, then we perform uniform crossover on the first two argument subtrees.

Here is the operator in pseudo-code:

uniform_crossover(parent_a, parent_b):
    Q <- new Queue()
    push (parent_a, parent_b) to Q

    while Q not empty:
        pop (node_a, node_b) from Q

        if node_a and node_b have the same return type
          and a coin flip of probability p=0.5 is true:
            swap node_a and node_b

        if both node_a and node_b are non-terminals:
            for each (child_a, child_b) of (node_a, node_b):
                push (child_a, child_b) to Q

    return (parent_a, parent_b)

This variant can be thought of as an one-point crossover operator on steroids, with the unbiased property of uniform crossover. As with both of the original operator, STGP uniform crossover is a global search operator at the start of a GP run, then over time restricts the search space to more local regions.

  1. At least initially.