^{1}

^{1}

The black box functions found in computer experiments often result in multimodal optimization programs. Optimization that focuses on a single best optimum may not achieve the goal of getting the best answer for the purposes of the experiment. This paper builds upon an algorithm introduced in [1] that is successful for finding multiple optima within the input space of the objective function. Here we introduce an alternative cluster search algorithm for finding these optima, making use of clustering. The cluster search algorithm has several advantages over the earlier algorithm. It gives a forward view of the optima that are present in the input space so the user has a preview of what to expect as the optimization process continues. It employs pattern search, in many instances, closer to the minimum’s location in input space, saving on simulator point computations. At termination, this algorithm does not need additional verification that a minimum is a duplicate of a previously found minimum, which also saves on simulator point computations. Finally, it finds minima that can be “hidden” by close larger minima.

The global minimum, which is the traditional focus of optimization, may not be as robust as another minimum, since a small change in the input may lead to a large change in the output, resulting in an inferior output. An example of this is shown in

while the other is rather flat. The region of interest is between the two vertical lines. Although the function with higher curvature has the lower minimum, it rises above the other curve in the region of interest. Choosing the one with the flat curve will give the user a more stable output within the region of interest. In statistical terms, it is more robust. We want to avoid a knife’s edge solution that will rapidly become suboptimal with small changes in inputs. [

Our framework is that of statistical emulation [

Section 2 reviews the topological search algorithm. Section 3 explains our new cluster search algorithm. Sections 4, 5, and 6 present illustrative examples comparing the two algorithms.

We start with a review of the search algorithm introduced in [

The topological algorithm uses a hybrid approach similar to [

The algorithm is discussed in detail in [

Next an LHS (Latin Hypercube Sample) of input points is evaluated and the emulator uses these points to model the simulator function and predict a large number of points. The lowest valued predicted point location is sent to pattern search to find the first minimum. This becomes the approximation to the global minimum,

At each of the next steps of the algorithm, adaptively sampled points are added to the evaluated points, the emulator models the simulator function predicting a large number of points, and distances of predicted points are computed from the minima that have been found. At some point the minimum point in the input space region explored for the next minimum is greater than

The cluster search algorithm uses of the fact that if you make a slice at a given height through a surface, and include only those points that are below the given height, the remaining points compose the part of the surface that has values lower than the given height. These points form clusters which indicate approximate locations of some minima for that surface. While a single slice at a given value gives some information about the minima of the surface, it is not sufficient to reveal all the minima. A cluster formed by this slice may include more than one minimum and may not include other minima with higher values (

added, points can fill in the spaces between clusters, but, new clusters can appear that represent other minima. Provided the surface is accurately represented by the emulator, all important minima can be found. The determination of which minima are important for the user to examine further is based on user inputs. This section explains the details of the cluster search algorithm.

In common with the topological search algorithm above, our new cluster search algorithm requires that the user decide to what level, relative to the global minimum and the mean value, the search is to be extended. Let the global minimum be

For each stage of the algorithm, the treed Gaussian process emulator uses the current training point set to predict a large gridded sample of points. In the same way as the topological search algorithm, the first minimum is found by searching from the predicted point having the minimum value. This search is done by a local direct optimization routine (pattern search). Also in the same way as the topological search algorithm, as the cluster search algorithm proceeds, new function evaluations are added to the training point set via a sequential experimental design to improve the statistical model. At the first stage, we only have one identified minimum, so we use its value as the initial

After the search for the first minimum, the algorithms are much different in the way in which each determines new starting points to find the other minima in the input space. The cluster search algorithm starts each search step by partitioning the surface from the bottom to the top into sets of predicted points with approximately equally estimated function values within a given increment,

As more partitions are added, their points might fill in between cluster “N” and cluster “ABC” and “N” might become assimilated into a larger cluster. But, the new partition points might have a point unreachable from any of the current clusters and thus represent an approximate point for another minimum. The idea is, that as one adds the points for each partition successively, minima show up as unreachable points from the established clusters. These minima point locations along with their estimated function values are saved by the algorithm. At each search step, the lowest approximate minimum point that has not yet been searched by pattern search, becomes the next search point for pattern search.

How is this “minimum distance”

minimum distance is

A specific illustration of this algorithm is given for a case where there is a test function with two minima, one being lower than the other. The negative of the test function is shown with minima represented as peaks for better visualization. Its surface is shown in

The surface is divided into 25 partitions. The first four nonempty partitions are plotted in

In this illustration the actual function value is used to evaluate the points. In a computer experiment, the function value is estimated by the emulator given a set of sampled points, usually starting with an LHS. The predicted points have the estimated function values that are used to create the partitions. Just as in the case of the topological search algorithm, a relatively close matching of the simulator surface is required for obtaining all the minima within the desired range. The treed Gaussian process usually provides such a matching provided enough function points have been evaluated in regions where the function is variable.

As this algorithm proceeds, the user is informed, both by a graphical display and by a print out, of the list of predicted minima that have been found and perspective minima yet to be found. The lowest valued minimum yet

to be found in this list becomes the next search point for pattern search. When there are no more minima in the list to be found, the search is ended and the utility of the minima is estimated as in [

The step size is determined differently than it is for the topological search algorithm. Although the starting step size used in pattern search starts at the step size used by the topological search algorithm, the step size is less than that of the topological search algorithm for subsequent steps since the cluster search algorithm, on average, locates the search point closer to the actual minimum. This results in fewer pattern search point evaluations.

An overview of the algorithm is given by the following pseudo code. The algorithm starts by evaluating an LHS of training points, the size of which varies depending on the variability of the simulator function, then it initializes the base B and the level parameter r, the loop count (

In this example, the cluster search algorithm and the topological search algorithm are compared for the modified Schubert test function. They are compared both as to their ability to find the minima specified by the user and the number of point evaluations needed to find the minima. It will also be shown that the user can anticipate the results far more readily when using the cluster search algorithm. The function, taken from [

where

The differences from the original Schubert function [

For both the cluster search algorithm run and the topological search algorithm run, the run parameters were as follows:

・ The value of

1.20 | 0.68 | −9.684 |

0.68 | 1.20 | −9.584 |

0.17 | 0.68 | −6.225 |

0.68 | 0.17 | −6.225 |

1.20 | 1.72 | −4.446 |

1.72 | 1.20 | −4.446 |

0.17 | 1.72 | −2.934 |

1.72 | 0.17 | −2.934 |

・ Initially, 100 points were sampled using the R function “improvedLHS” from the R package “lhs” [

・ For each successive step four training points are selected by the sequential experimental design based on their standard deviations (large standard deviations given preference) and distance from previous training points. This is done to improve the statistical model as processing progresses.

・ The predicted point size for each search step is 2000. The cluster search algorithm makes use of gridded points, whereas the topological search algorithm typically uses a LHS sample.

・ A minimum search was conducted at each step. For other examples that are compared, minima searches occur at different step intervals so that additional training points can be added to improve the statistical model.

Each illustrative example (this being the first) is compared in these ways: 1) The minimum found; 2) The number of training points computed to find the minima; 3) The look ahead data provide by the cluster search algorithm (not available for the topological search algorithm).

The minimum found by both algorithms are in

The number of training points computed by the cluster search algorithm was 313 as compared with the number computed by the topological search algorithm which was 405. The percent difference is 29%. Insight into this difference can be deduced from

The cluster search algorithm has the capability to “look ahead” and show the user possible minima that will be included in the output. The user may then decide if he/she wishes to continue the search for minima or be satisfied with those currently found. There are two outputs that support this capability: A graphical output and a printed output which are explained below. The topological search algorithm has no such capability. It is “blind” in regards to what minima might exist and what their potential locations and values are.

The graph in

1.202 | 0.682 | −9.687 |

0.684 | 1.205 | −9.590 |

0.164 | 0.683 | −6.229 |

0.683 | 0.165 | −6.229 |

that meet the requirements of the user. The black square dots are the gridded prediction point locations, the values of which are used by the cluster search algorithm to estimate the minima locations.

Regarding the printed output, the cluster search algorithm outputs the estimated number of minima that are compatible with the user’s input requirements and lists their estimated locations and values. It identifies those that have been already found by a distance computation (a small distance (

In this example, the cluster search algorithm and the topological search algorithm are compared for a function with six distributed minima shaped like Gaussian curves which are relatively close together. In this case both algorithms find the six minima. The function is given in equation below:

where

For both the cluster search algorithm run and the topological search algorithm run, the run parameters were as

Distance | Found (Yes/No) | |||
---|---|---|---|---|

1.1818 | 0.6818 | −9.5315 | 0.02 | Yes |

0.6818 | 1.1818 | −9.4249 | 0.72 | No |

0.1818 | 0.6818 | −6.1589 | 1.02 | No |

0.6818 | 0.1818 | −6.1505 | 0.72 | No |

0.25 | 0.25 | −1.004 |

0.50 | 0.25 | −1.006 |

0.75 | 0.25 | −1.004 |

0.25 | 0.50 | −1.004 |

0.5, | 0.50 | −1.006 |

0.75 | 0.50 | −1.004 |

follows:

・ The value of

Because the minima are close together, allowing for variation in the minimum values is more likely to resolve some minima that are not defined as well by the statistical model.

・ Initially, 150 points were sampled using the R function “improved LHS” from the R package “lhs”.

・ For each successive Step 10 training points are selected by a sequential experimental design based on their standard deviations (large standard deviations given preference) and distance from previous training points. The closeness of the minima were expected to test the statistical model accuracy, so more points were needed to make the statistical model match the function.

・ The predicted point size for each search step is 2000. The cluster search algorithm makes use of gridded points, whereas the topological search algorithm typically uses a LHS sample.

・ A minimum search was conducted every other step in order to have more improvement is the statistical model for search steps.

The minima found by both algorithms are in

The number of training points computed by the cluster search algorithm was 566 as compared with the number computed by the topological search algorithm which was 716. The percent difference is 27%. Insight into this difference can be deduced from

0.749 | 0.499 | −1.004 |

0.500 | 0.500 | −1.006 |

0.500 | 0.250 | −1.006 |

0.250 | 0.500 | −1.004 |

0.750 | 0.249 | −1.004 |

0.251 | 0.251 | −1.004 |

points) are the reason for the increase in the number of training points computed by the topological search algorithm. This increase is significant for a computer experiment when the Black Box function takes significant time to return a computed point.

This example also shows that the cluster search algorithm is dependent on how well the statistical model matches the function surface. In the third search step, only three minima were detected. In the fourth, fifth, and final search steps, the cluster search algorithm detected all six minima. This is a result of the fact that, as training points are added via the sequential experimental design, the statistical model is a better match to the function surface. This is more evident as the function surface is more variable. The program has the capability to suspend search steps until the errors in added training points show, on average, values below a user specified limit. However, in the cases presented herein, the accuracy of the statistical model is managed by the number of training points added between search steps. The two graphs in

The printed output for the second illustrative example for search Step 4 is shown in

Distance | Found(Yes/No) | |||
---|---|---|---|---|

0.5 | 0.5 | −1.0058 | 0.0 | Yes |

0.5 | 0.2727 | −0.9591 | 0.02 | Yes |

0.7273 | 0.5 | −0.9572 | 0.02 | Yes |

0.2273 | 0.5 | −0.9521 | 0.27 | No |

0.7273 | 0.2273 | −0.9078 | 0.23 | No |

0.2727 | 0.2273 | −0.9078 | 0.23 | No |

The information conveyed here is that six minima have been detected by the cluster search algorithm. Their locations and minimum value are estimated. The first one is within 0 units distance from a found minimum and is therefore associated with a minimum already found by the program. Minima 2 and 3 are within 0.02 units of found minima and are associated with minima already found. Minima 4, 5, and 6 are a significant distance away from found minima and therefore still need to be located by pattern search. Search Step 4 starts at the location

In this example, the cluster search algorithm and the topological search algorithm are compared for a function with one global skewed minimum covering a large area of the input space and a smaller normally distributed minimum on the edge of the input space. This example shows that the cluster search algorithm can detect the smaller minimum although the topological search algorithm does not find it. This is another advantage of the cluster search algorithm. The function given in equation below:

where

For both the cluster search algorithm run and the topological search algorithm run, the run parameters were as follows:

・ The value of

the example for the six close minima, allowing for variation in the minimum values is more likely to resolve some minima that are not defined as well by the statistical model.

0.36 | 1.00 | −19.483 |

0.10 | 1.00 | −13.263 |

・ Initially, 150 points were sampled using the R function “improved LHS” from the R package “lhs”.

・ For each successive processing step, 20 training points are selected based on their standard deviations (large standard deviations given preference) and distance from previous training points. The closeness of the minima and the variability of the function are expected to test the statistical model accuracy, so more points are needed to make the surface match the function.

・ The predicted point size for each search step is 2000. The cluster search algorithm makes use of gridded points, whereas the topological search algorithm typically uses a LHS sample.

・ A minimum search is conducted every third processing step in order to have more improvement in the statistical model for search steps. This function has a very steep decrease near the large skewed minimum as can be seen in the perspective which shows this as a sharp increase. It is nearly discontinuous along in the vicinity of line of

The minima found by both algorithms are in

The number of training points computed by the cluster search algorithm was 443 as compared with the number computed by the topological search algorithm which was 416. This comparison is not meaningful since the topological search algorithm did not find all the qualified minima. Insight as to why the topological search algorithm failed to find the smaller minimum can obtained by seeing the final training points for both search algorithms shown in

cluster search algorithm | topological search algorithm | |||
---|---|---|---|---|

0.357 | 0.999 | −15.851 | yes | yes |

0.100 | 1.000 | −13.263 | yes | no |

This example shows that the cluster search algorithm detected the two minima in search Step 2. This indicates that the training points yield a statistical model that is reasonably accurate. The experimental sequential design added training points near the most variable input space of the test function. The graph in

The printed output for the third illustrative example for search Step 2 is shown in

Distance | Found (Yes/No) | |||
---|---|---|---|---|

0.4091 | 1.0 | −15.08 | 0.05 | Yes |

0.0909 | 1.0 | −11.8545 | 0.27 | No |

The information conveyed here is that two minima have been detected by the cluster search algorithm. Their locations and minimum value are estimated. The first one is within 0.05 units distance from a found minimum and is therefore associated with a minimum already found by the program. Minimum 2 is within .27 units of a found minimum and therefore still needs to be located by pattern search. Search Step 2 starts at the location

We propose a new cluster search algorithm for efficiently exploring the whole input space, which is a significant improvement over earlier multi-optimum search algorithms such as that in [

John Guenther,Herbert K. H. Lee, (2016) Cluster Search Algorithm for Finding Multiple Optima. Applied Mathematics,07,736-752. doi: 10.4236/am.2016.77067