A Modified Recurrent Neural Network with Parametric Bias and its Application to Action Learning of a Humanoid Robot

Tani et al.'s recurrent neural network with parametric bias (RNNPB) is able to learning different time series patterns, and it has been successfully applied in action learning of robots. In this paper, we propose a novel type RNNPB using Elman type model instead of Jordan type in the conventional model. The proposed structure makes it easy to use error back-propagation (BP) learning algorithm which has lower computational cost than back-propagation time through (BPTT) method used in Tani et al.'s model. The effectiveness of the modified RNNPB was confirmed by its application to gesture learning experiment using a humanoid robot.


Introduction
Artificial neural networks (ANNs) have been studied since 1950s and many models of them have been successfully applied to adaptive control, time series forecasting, pattern recognition, and many other fields.Among models of ANNs, recurrent neural networks (RNNs) are suitable to simulate dynamic systems or control unknown systems as "inverse models".Jordan type RNN (1) (2) and Elman type RNN (3) (4) are the most well-known feed forward multi-layer models of RNNs.However, as supervised learning models, these RNNs are usually train to be identified models to certain systems.In other words, different dynamics need to be described by different connectionist models.To deal with a complex system containing multiple attractor dynamics, Tani et al. proposed a RNN with parametric bias (RNNPB) ( 5)- (8) , and the novel RNN has been widely applied to behavior learning of robots recently (9) (10) (11) .
In RNNPB, units named "parametric bias" (PB) are added in to a Jordan type RNN (1) (2) and they play a role of the bifurcation vectors of nonlinear dynamical systems.After the RNNPB is trained by different teacher signals using different PB values, the network is able to generate different time series patterns according to the PB values.The original RNNPB uses Jordan type RNN which has a connected "context layer"."back-propagation through time" (BPTT) (12) (13) , which is a learning algorithm usually used in the training of RNN, is used.
In this paper, we propose to modify RNNPB using Elman type RNN (3) (4) instead of Jordan type used in the original RNNPB.The main merit of this modification is the simplicity of Elman model and the training method can be the well-known "error back-propagation" (BP) algorithm (12) , which is more simple than BPTT.
The modified RNNPB was applied to multiple actions learning of a humanoid robot PALRO (product of Fujisoft, Inc., 2010).Experiment results showed the learning performance of the proposed model was higher than the conventional RNNPB.

Structure
The structure of Tani et al.'s RNNPB and a modified RNNPB proposed here is shown in Fig. 1.The original RNNPB shown in Fig. 1 (a) is a Jordan-type recurrent feed forward neural network with three kinds of internal layers: Middle layer, context layer and parametric bias (PB) layer.Units of Middle layer and Context layer output values by sigmoid function: Where αis a positive constant deciding the gradient of the function, and z is the input vector.
Specially, the input vector z h for the Middle layer: Where u i = x(t), u pb , u c are the input to the Middle layer from the input vector, PB units, and units of Context layer respectively, v i , v pb , v c are the connection weights.
The input vectors z o and z c for Output layer and Context layer are given by: Where u h = f (z h ) is the output of Middle layer given by Eq. ( 1), and w o , w c are the connection weights between the Middle layer and the Output layer, the Context layers, respectively.
The learning rule, i.e., the modification to connection weights is given by the back-propagation through time (BPTT) (12) (13) For the units of PB layer, the internal state u pb changes according to the delta errors ) (t pb v  during a period (a time series window) l ( 5)-( 8) : Where In BPTT learning algorithm, it can be imagined that units of RNNPB at time t are connected to units of RNNPB at time t-1, and the same structure in the case of t-1 and t-2, …, till the period l.So error back-propagation (BP) learning rule (13) is used repeatedly in BPTT.
In the proposed RNNPB, the output of Middle units become to the input of Context units without connection weights.So the learning rule for connections between Input layer with PB layer, context and Middle layer, and connections between Middle layer and Output layer can be given by standard back-propagation (BP) algorithm (13) as (a) A RNNPB proposed by Tani et al.
(b) A modified RNNPB proposed here.same as it is used in multi-layer perceptron (MLP).
When BP learning algorithm is adopted into the proposed Elman type RNNPB, except Eq. ( 8) and Eq. ( 10) for the PB units, the modification of connections between Output layer and Middle layer, Middle layer and Input layer with PB and Context layer can be executed offline, i.e., batch learning.Additionally, the modification of connections between Middle layer and Context layer is not needed.So the computational cost of the modified RNNPB is lower than the original model.
To verify the learning performance of the modified RNNPB, experiment of multiple actions learning of a humanoid robot was executed and the results are reported in the next Section.

Experiments
PALRO, a humanoid robot (Fujisoft Inc., 2010 product) was used in the experiment (Fig. 2).Three kinds of actions, i.e., (a) waving a hand; (b) raising two hands; (c) clapping hands, were learning as different dynamic patterns (Fig. 3).The actions were realized by changing angles of 8 joints of robot: 2 joints of neck, 6 joints of 2 arms.Table 1 shows the values of parameters used in the experiment.
In Fig. 4, a case of angels of right shoulder pitch joint in Y direction is shown to compare the difference of learning results, generation results between different RNNPBs.It can be confirmed that the proposed method resulted more precise time series patterns than the conventional model, especially comparing the generation results.
The generation errors of different RNNPBs are compared in Table 2. Average errors of 3 actions showed that the proposed method (3.72 radian) had a better results than the conventional RNNPB (6.15 radian).
Table 1.Parameters used in the experiment.

Description Symbol Value
The number of units in Input layer N 8 The number of units in Output layer N 8 The number of units in Middle layer H 30 The number of units in PB layer P 1 The number of units in Context layer M 30 Learning rates of output, context

Conclusions
A modified recurrent neural network with parametric bias (RNNPB) was proposed.Instead of the Jordan type RNN proposed by Tani et al., Elman type RNN, which has a more simple structure, was used in the proposed model.The training algorithm of the modified RNNPB adopted error back-propagation (BP) method which has lower computational cost than back-propagation through time (BPTT) used in the conventional network.Actions realized by different time series patterns of joint angles of a humanoid robot were learned using RNNPBs, and the experiment results showed the priority of the proposed model.
The future work of this study is to apply the model to more complex actions learning of the robot.In fact, there are 20 joints in the humanoid robot PALRO, and it is possible to generate or to learn more time series patterns to deal with more difficult works, such as dancing, guiding, etc.The learning time also needs to be reduced to realize these works.
learning coefficient, learning rate, and internal coefficient of PB units.

Fig. 2 .
Fig. 2. A humanoid robot PALRO was used in the experiment to verify the effectiveness of the proposed method.

Fig. 5 .
Fig. 5. PB values of different actions after learning in different RNNPBs."Original" means the case of conventional RNNPB, and "Modified" indicates result of RNNPB proposed here. algorithm.Let dis the teacher signal:

Table 2 .
Comparison of generation error after learning for different RNNPBs.Additionally, the PB values of different actions learned by different RNNPBs are shown in Fig. 5. Similar values of the PB unit in different models were observed.