Paper Evolving Neural Networks with Interval Weights by Means of Genetic Algorithm

: In this paper, the author proposes an extension of genetic algorithm (GA) for evolving neural networks with interval-valued weights and biases. In the proposed extension, genotype values are not real numbers but intervals. Evolutionary processes in GA are extended so that the processes can handle interval-valued genotypes. Experimental results show that interval neural networks evolved by the proposed method can model target interval functions well despite the fact that no training data is explicitly provided.


Introduction
A multi-layered feed forward neural network (NN) with interval-valued weights and biases was proposed in literature [1]. A supervised learning method for the interval NN (INN) was also proposed [1] as an extension of the traditional back propagation (BP). The INN approximately models an interval function Y = F(x), where Y is an interval number and x is a real vector, by learning data (x q , Y q ), q = 1, 2, . . . The INN can learn the data in which {Y q } include both of real numbers and interval numbers, because a real number can be specified as an interval number with zero width (i.e., with the same value of upper and lower limits). As the learning method for the INN, the extended back propagation was proposed but no method has been proposed which does not require training data.
Besides, evolutionary algorithms (EAs) have recently been applied to the reinforcement learning of NNs, known as neuroevolution (NE) [2]∼ [5]. In NE, weights and biases are tuned by evolutionary operations, not by the BP algorithm. Because NE does not utilize BP, NE does not require errors between NN output values and their target signals. Thus, NE is applicable for problems in which the error function is difficult or impossible to be determined. EAs have been applied to NE of traditional NNs with realvalued weights and biases, where the genotypes (chromosomes) consist of real numbers or bit strings that encode real numbers. The ordinary EAs have not employed interval numbers as their genotype values because their evolutionary operators are designed to handle genotypes with crisp values.
This paper proposes an extension of genetic algorithm (GA) for handling interval-valued genotypes. The extended GA can be applied directly to interval-valued optimization problems by employing interval variables in the optimization problem as genotype values. The author applies the extended GA to the evolution of INNs. Accuracy of evolved INNs in modeling hidden target interval functions is experimentally evaluated. Besides, an interval value can be specified by its lower and upper limits or its center and width. The extended GA can adopt either of the two model, the lower and upper model or the center and width model, to specify interval genotype values. These two models are compared to investigate which one will contribute more for the extended GA to evolve better INNs.

Neural Networks with Interval Weights and Biases
The INN employed in our research is the same as in the literature [1], which is a three-layered feed forward NN with interval weights and biases. Figure 1 shows its structure. An INN receives an input real vector x and calculates its output interval value O (for the sake of simplicity, the output layer includes a single unit) as follows. Input Layer: Hidden Layer: Output Layer:  the unit activation function which is typically the sigmoidal one: f (x) = 1/(1 + e −x ). The feed-forward calculation of the INN is based on the interval arithmetic [6] (for more detail, see the literature [1]). The sigmoidal function maps an input interval number to an output interval number as shown in Fig. 2.
The INN includes mn + m weights (i.e., mn weights between n input units and m hidden units, and m weights between m hidden units and an output unit) and m + 1 biases (= the total number of units in the hidden and output layers). Thus, the INN includes mn + 2m + 1 interval variables in total. The interval GA (IGA) handles these interval variables as a genotype X = (X 1 , X 2 , . . . , X D ) where X i is an interval number and D = mn + 2m + 1. Each interval X i can be denoted by either of the two parameters: i and x w i denote the lower, upper, center and width of X i respectively. In this paper, the author denotes the former (latter) as LU (CW) model.

Genetic Algorithm with Interval-valued Genotypes
The IGA includes the same processes as those in the ordinary GA (Fig. 3), where initialization of population, fitness evaluation, crossover and mutation are extended so that these processes can handle interval-valued genotypes.

Initialization of Population
In the initialization process, X 1 , X 2 , . . . , X P are randomly initialized where P is the population size. Because the elements in X a (i.e., X a,1 , X a,2 , . . . , X a,D ) are weights and biases in an INN in this research, smaller absolute values are preferable as initial values for X a,i . Thus, the initial values are randomly sampled from the normal distribution N(0, ε) or uniformly from an interval [−ε, ε] where ε is a small positive number. In the case of employing the LU model, two values are sampled per X a,i : the smaller (larger) one is set to x L a,i (x U a,i ). In the case of employing the CW model, two values are sampled per X a,i : one of the two values is set to x c a,i and the absolute value of the other is set to x w a,i .

Fitness Evaluation
To evaluate fitness of an INN as a phenotype instance of the corresponding genotype instance X a = (X a,1 , X a,2 , . . . , X a,D ) where X a ∈ X 1 , X 2 , . . . , X P , the INN is supplied with several samples of input real vectors and calculates output values. The input values are sampled within the variable domain of application problem. Fitness of the genotype instance X a is evaluated based on the output values. The method for scoring the fitness based on the output values depends on the problem to which the INN is applied. For example, in a case where the INN is applied to controlling an automated system, some performance measure of the system can be used as the fitness score of the genotype instance corresponding to the INN.

Crossover
Let us denote genotypes of two parents as X a , X b and an offspring genotype as X z . X a and X b can be sampled from the population in the same manner as the ordinary GA. In the case of employing the LU model, Values of x L z,i and x U z,i in the offspring X z can be determined by applying a crossover operator for the ordinary real GA. Suppose the operator is the blend crossover [7]. In this case, x L z,i is uniformly randomly sampled from the where min(x, y) and max(x, y) is the smaller and the larger of x, y respectively. Similarly, x U z,i is uniformly ran- Note that x U z,i must not be smaller than x L z,i because x L z,i and x U z,i are the lower and upper limits of the interval X z,i . If x U z,i becomes smaller than x L z,i as the result of applying the blend crossover, then x L z,i and x U z,i must be repaired so that X z,i is valid. The repair method can be either of the followings: • the mean value of x L z,i and x U z,i is calculated and assigned to both of x L z,i and x U z,i , or • the two values for x L z,i and x U z,i are switched.
In the case of employing the CW model, Values of x c z,i and x w z,i in the offspring X z can be determined in the same manner as those for the LU model: Note again that x w z,i must not be negative because x w z,i is the width of the interval X z,i . If x w z,i becomes negative as the result of applying the blend crossover, then x w z,i must be repaired so that X z,i is valid. The repair method can be either of the followings: • the value of x w z,i is assigned to 0, or • the absolute value of x w z,i is assigned to x w z,i .

Mutation
Values in the offspring genotypes are mutated under the predetermined mutation probability. In the IGA, each offspring X z is a vector (X z,1 , X z,2 , . . . , X z,D ) where X z,i is an interval number specified by the two real parameters: The two parameter values of X z,i which is selected under the probability are mutated by being added (or replaced) with random real numbers to the current values where the random numbers are sampled from the normal distribution N(0, δ) or uniformly from an interval [−δ, δ] and δ is also a small positive number as . After the mutation of X z,i , x U z,i may become smaller than x L z,i with the LU model or x w z,i may become negative with the CW model. Such invalid interval numbers are repaired by the same method applied in the crossover process.

Experimental Evaluation
In this section, the author experimentally evaluates the ability of the IGA by applying it to evolution of INNs. The

Accuracy in Modeling Interval Functions
In the experiment, the author designs three functions and employs them as the modeling targets for INNs. For simplicity, the input x of the functions is not a real vector but a real scalar (so that the INN includes only a single input unit) and 0 ≤ x ≤ 1, as in the literature [1]. The output values of the functions are interval numbers. The three func- ] are as follows: • INN: • #units: 1 input, 10 hidden, 1 output.
• Initial values for x L a,i , x U a,i , x c a,i : uniformly random within [-0.01, 0.01].
• Mutation probability: 0.01 for each of the elements X z,1 , X z,2 , . . . , X z,D in a genotype instance X z .
• Random values for mutation: N(0, 1) for x L z,i , x U z,i , x c z,i and |N(0, 1)| for x w z,i . • Elitism: best 10 elite genotype instances are copied to the next generation.
• Tournament size for sampling two parent genotype instances: 5% of the population.
The number of generations is 10,000 (or 2,000) for IGA with 100 (or 500) solutions so that the total number of INNs evolved is consistently 1,000,000.
Genotype instances X 1 , X 2 , . . . , X P are ranked as follows. An INN which corresponds to a genotype instance X a is supplied with a real input value x r and calculates its output interval number O r . x r is sampled within the input domain [0, 1] as x r = 0.0, 0.01, 0.02, . . . , 1.0. Besides, each value of x r is supplied to the target function F(x) and the output interval number F(x r ) is obtained. Then, the error e r for the input x r is calculated as: where, Figure 7: Output interval function of the best INN evolved by IGA for modeling F 1 (x). The error score was 5.7E-3.  For each genotype instance X a , e r is calculated 101 times (e 0 , e 1 , . . . , e 100 ) for the 101 input values x r = 0.0, 0.01, 0.02, . . . , 1.0, and the sum of e r is used for ranking X a . An instance with a smaller sum of e r is ranked better. Note that e r scores are not utilized for calculating the values of updating the weights and biases but only for ranking the genotype instances: any output value of the target functions is completely hidden from the IGA reproduction process. Figures 7∼9 show the results of this experiment. Figure  7 shows the output interval function of the best INN among the total 20,000,000 INNs (= [1,000,000 INNs in each run] * [five runs] * [two variations for population sizes] * [two variations for the interval model]) evolved by the IGA for modeling F 1 (x). Figures 8 and 9 shows those for modeling F 2 (x) and F 3 (x) respectively in the same manner as Fig. 7 These results shown in Figs. 7∼9 reveal that the best INNs evolved by the IGA approximate well to their target functions, despite the fact that no training data is explicitly provided.

Comparison of Two Models for Interval Genotype Values
As described in Subsection 3.3, the constraints for the two real parameters of an interval number (i.e., the lower and upper limits or the center and width) are different, and thus the methods for repairing constraint-violating values are also different between the LU and CW models. This difference may affect the performance of the IGA in searching solutions because the repairs restrict changes of genotype values. In this section, the author compares the two models to investigate which model contributes better for the IGA to find better solutions, based on the result of numerical experiment in the last section. Figure 10 shows the error values of the best INN for F 1 (x) among each number of INNs evolved (e.g., 500,000 INNs are evolved in total at the 5,000th generation with the population size of 100). In this figure, LU (100) denotes the result with the LU model and the population size of 100. LU (500) , CW (100) and CW (500) denote their results in the same manner as LU (100) . The error values are the averaged ones over the five runs. Figure 11 and 12 show the error values for F 2 (x) and F 3 (x) respectively, in the same manner as Fig. 10. Figures 10∼12 reveal that, for all of the three target functions, the CW model contributed better than the LU model did with the population sizes of both 100 and 500, i.e., after the 1,000,000 INNs were evolved, the dotted curves for the CW model went below the solid curves for the LU model. This finding suggest that the CW model is better for the IGA to employ as the model for specifying interval genotype values.
To investigate the reason why the IGA could evolve better INNs with the CW model, the author counts the number of repairs for the invalid genotype values. As described in Subsections 3.3, x U z,i must not be smaller than x L z,i with the LU model and x w z,i must not be negative with the CW model. In the crossover and mutation processes, if new values of  x L z,i , x U z,i or x w z,i violate the constraints then the new values are repaired. Such repairs may interfere with the evolution of INNs because the repairs restrict modification of weights and biases. Thus, a smaller number of repairs will be better in evolving INNs. Table 1 shows the numbers of repairs where the values in the table are the averaged ones over the five runs under each condition. For example, the IGA with the LU model and the population size of 100 required 1.64E + 6(1.64 × 10 6 ) repairs for F 1 (x), while the IGA with the CW model and the population size of 100 required 3.65E + 5 repairs for F 1 (x). Table 1 clearly shows that the CW model required less repairs than the LU model did, which will be a reason for the fact that the CW model could contribute to evolve better INNs.

Conclusion
In this paper, the author proposed the interval-valued GA (IGA), an extension of the GA, and applied it to the neuroevolution of interval-valued neural networks. In the proposed extension, values in the genotypes are not real numbers but interval numbers. To handle the interval-valued genotypes, the IGA extends its processes of initialization of populations, crossover and mutation.
The IGA was challenged to evolve INNs which modeled hidden interval functions. The experimental results showed that the neural networks evolved by the IGA approximated the target functions well, despite the fact that no training data was explicitly provided. In addition, the results revealed that the CW model contributed slightly better to the IGA than the LU model did. The reason would be because the IGA with the CW model required less number of repairs for invalid genotype values than the IGA with the LU model did.
In the future work, the author will further evaluate the ability of the IGA by applying it to problems other than neuroevolution.