Eye Shape Detection Based on Texture Modeling in Log-Polar Domain Using Principal Component Analysis and Recurrent Neural Network

This paper presents a method to detect the eye contour using recurrent neural network. Log-Polar transform is used to convert the changes of eye rotations and scales into translations. Principal component analysis is applied on the eye RGB color channels in order to obtain a stable texture representation regardless lighting conditions. A recurrent neural network is trained to discern the patterns of texture values around the eye contour as well as to model the eye contour appearance at different eye states. The experimental results have verified that the recurrent network is able to detect the eye contour robustly. The proposed method outperforms the conventional methods in terms of accuracy and detection time.


Introduction
Human-Machine Interface (HCI) systems have become one of the most attractive fields in image processing [1] [2]. Eye is a very promising interactive component for implementing such a system. It has been proven that using eyes in interface systems is faster than hands [3]. Eye features such as eye corners, gaze point, pupil location, eye contour and blinking frequency can be employed in many applications. Nowadays, car companies try to increase the human safety by tracking the blinking frequency in order to detect the driver fatigue [4]. The handicapped people who have problems in controlling wheel chair by hands can be helped using eye gaze estimation techniques [5].
Eye contour is considered as the key-point to extract the eye features. Many methods have been proposed for feature extraction in the literature [6]∼ [9].
Active shape model (ASM) is one of the most popular techniques that can be used for eye contour detection [6]. Active Appearance Model (AAM) has been presented as an advanced version of ASM by modeling the object's shape and texture [7]. Yuile and et al. have proposed Deformable Template (DT) to detect the eye contour using an adaptive template [8].
Most of the above methods suffer from the difficulties of detecting the eye contour under different rotation angles, scales, eye states, lighting conditions and translations. Moreover, the essence of these algorithms is to search on the eye contour iteratively. The number of iterations is not fixed and depends on the complexity degree of the eye structure. Therefore, detection time is not fixed and these meth- * Corresponding: maddib@yahoo.com † Department of Mechanical Engineering, Toyohashi University of Technology, Toyohashi 441-8580, Japan ods should be improved for using in real time applications such as interactive systems.
In this paper, the proposed method is inspired by our previous work [10]. The eye contour is represented by a set of points that form the eye shape. The distribution of these points represents the lower and upper eyelids as well as the two eye corners. The eye image is transformed into Logpolar domain in order to make the proposed method scalerotation invariant [11]. A stable representation of the eye contour against wide range of lighting conditions is guaranteed using Principal Component Analysis (PCA). PCA is applied to highlight the variance between eye RGB color channels using the first eigenvector [12]. Each row in the first eigenvector image is called pixel profile and can be indicated by the corresponding θ coordinate. Each pixel profile represents the eye contour by a few pixels. A recurrent neural network is trained to model and analyze the patterns of these pixels. The trained network tries to predict the location of the eye contour in pixel profiles using some previous/past input-output status.
The experimental results have verified a better performance of the proposed method compared to ASM and DT in terms of accuracy, stability and detection time.
The reminder of this paper is organized as follows: In section 2, a brief explanation on the proposed solution to treat the eye contour under different scales, rotation angles and lighting conditions is provided. Section 3 describes the structure of the recurrent neural network in order to model the pixel values in the Log-Polar eye images. Setup and experimental results are declared in section 4. The obtained results are discussed in section 5 and the essence of the proposed method with highlighting the yielded advantages are concluded in section 6.

Log-Polar transform
Log-Polar Transform (LPT) has been employed in many image processing applications such as optical flow [13], active vision systems [14], pattern recognition [15],...etc. An image in Cartesian coordinate system is exponentially sampled around the image center. Meanwhile, the eye image is radially divided into sectors. Based on the sector's radius and angle, a pixel I = (x, y) T is transformed into a pixelÌ = (r, θ ) T in the Log-polar image using Eq. (1). The width of the Log-Polar image determines the ring sampling rate whereas the height indicates the number of radial sectors. One can observe that a Log-Polar eye image can be divided into two areas; eyeball and skin. The eye contour is the boundary between the two areas and fully spans along θ axis. Consequently, detecting the eye contour is simplified to search on log r coordinates instead of searching on (x, y) coordinates in Cartesian coordinate system.

Eye color interpretation using PCA
Detecting the log r coordinates of the eye contour must be achieved under different lighting conditions. The eyeball area which consists of iris, pupil and sclera can be represented approximately by fixed colors; black and white. These two colors are decomposed in RGB color space into (0,0,0) and (255,255,255); respectively. Therefore, the variance between the color channels in the eyeball area is very low whereas skin color cannot be represented by a fixed color and corresponding variance is considerably large.
In order to discern the variance between eye color channels, PCA is applied and the first eigenvector which indicates the direction of the most variation is used. Figure 1(c) shows the first eigenvector images of the images in Fig. 1(b). The pixel profiles (rows) in a first eigenvector image can uniformly be classified into two subsectors. Figure 1(d) represents an enlarged pixel profile that covered by a yellow line in Fig. 1(c). The first subsector may contain parts of sclera, iris and pupil. The corresponding pixel values are approximately small indicating low variance. The second subsector represents skin and it may contain eyebrow. The corresponding pixel values indicate high variance and change in an ascending grayscale order. The length of the two subsectors changes in a pixel profile according to the eye topological structure and θ with preserving the same order.

Eye contour detection 3.1 Detection strategy requirements
Based on the uniform classification of pixel profiles in first eigenvector images, the eye contour can be extracted at the beginning of the second subsector which has an ascending order of pixel values. On the other hand, some noise may occur in the first subsector because of iris color or sclera red nerves. The noise appears with the same characteristics of the beginning of the second subsector as illustrated by the second left image in Fig. 1(c). In order to discriminate the second subsector and model the different occurrences of its beginning along log r axis, a pixel profile should fully be scanned with taking into account some previous scanned pixel values. Therefore, the detection strategy should have memorization property as well as generalization property. In order to meet these requirements, a recurrent neural network is used as detailed in the next sections.

Recurrent neural network
Neural network can be classified into recurrent and non-recurrent networks [16]. In non-recurrent (feedforward) network, the output is calculated by propagating the input values through the feedforward connections. On the other hand, the recurrent networks use the previous input-output states as a part of current inputs. This makes recurrent networks more powerful to model time series signals [17] and more robust in terms of generalization and prediction [18].
The recurrent neural network consists of a set of neurons that are fully or partially connected to each other. The strength between neurons is expressed by a set of weights. Each neuron receives inputs from the other neurons and transmits its activation to the others as well. Figure 2 shows the general structure of the recurrent network. It consists of three layers; input, hidden and output. The input layer receives the current input and oblique exter- nal previous inputs. The neurons in the hidden and output layers are fully connected. The output layer has one neuron without any direct connection to the input layer.
In order to train a recurrent network to model a function, training database and learning technique are used [19]. The error E between the desired output D and the actual network output O is minimized using Eq. (2). This can be optimized by updating the network weights using gradient descent algorithm and Chain rule [19]; Eq. (3).
where η determents the descent step on the error surface. The term dE(t)/dw represents the error sensitivity to a small change in w at time step t. In this paper, time step indicates that the signal is discretized into samples. In order to update the neurons' weights, the representation of neuron terminals is uniformly referred by y k . Thus, I denotes current/external inputs in and U denotes a neuron output O; as follows: Accordingly, the output of a neuron is calculated as follows: where f (.) is the output activation function which is the standard sigmoid function. The activation function has a significant role to model the nonlinearities of pixel values in pixel profiles. The gradient step of a neuron output is given by Eq. (7).
The neural network output is one step ahead O(t + 1) which means the network is trained to predict the next future output. In other words, the output of any neuron is not being affected by the external inputs until reaching the timestep t + 1.
The sensitivity of a neuron output to a small change of weights is calculated using Eqs. (8), (9). δ ik is the Kronecker delta function and equals to one if and only if i = k, otherwise it is zero. The term w kl * ∂ O l (t)/∂ w i j represents the effect of changing w i j on other neurons explicitly. Finally, the weights of the hidden neurons are updated using Eq. (10) whereas those of the output neuron are updated by Eq. (11).
3.3 Modeling strategy of pixel profiles Figure 3 demonstrates some pixel profiles with emphasizing the uniform characteristics at different θ coordinates. One can observe that the pixel value varies smoothly and continuously along pixel profiles without any sudden or rapid change. A pixel profile can be considered as a signal of time series. The structure in Fig. 2 enables the network to discern the temporal pattern of pixel values in a time series according to the form in Eq. (12).
where m is the number of external inputs and n is the number of network neurons. The outputs of neurons appear in Eq. (12) because of recurrent connections. The recurrent weights determine the amount of influencing neuron activation outputs in the next time-step. This concept is called the memory resolution. On the other hand, the number of hidden neurons determines the memory depth which indicates the number of previous inputs influencing the current output. Consequently, the network has memorization and generalization properties and can be employed to achieve the task of predicting and modeling the patterns of pixel values in pixel profiles.
Each pixel profile contains a pixel/horizontal-segment of the eye contour as highlighted in Fig. 1(d). As long as the pixel value changes smoothly in a pixel profile, the location of the eye contour is highlighted by creating a sudden change of the pixel value. This modified pixel profile is considered as a training profile and used to train the recurrent network to predict one pixel value ahead based on Eq. (12).
Accordingly, if a pixel profile is presented without any modification, the trained recurrent network is expected to predict the location of the eye contour by generating a sharp change of the pixel value at the actual location. Hence, an eye shape point is extracted. Repeating this operation for many pixel profiles in a first eigenvector image according to a fixed sampling rate of θ , a set of points are obtained. These points form the eye shape and considered as the detected eye contour.

Setup
The proposed method, ASM and DT have been implemented in the environment of visual studio C++ 2008. Matlab has also been used to create some graphs.
Two databases have been used to evaluate the performance of the implemented methods. The first database includes 250 eye images from the UBIRIS database [20]; Fig. 4(a). The second database is locally collected to permit a considerable change of eye rotation with different scales and lighting conditions. The local database contains 100 right eye images as shown in Fig. 4(b). The eye images are resized to 150x100 for Cartesian coordinate system and 256x360 in Log-Polar domain. The eye contour is described by H=16 manually labeled points that form the eye shape.
In ASM, 150 manually labeled eye shapes have been used to model the dynamical change of the eye contour using Principal Component Analysis (PCA) [7].
In DT, the eye contour is described by two parabolas; the upper and lower eyelids. The two parabolas are sampled into 16 points to measure the accuracy. Four potential field images have been used to highlight the eye features, valley image highlights the iris, edge image highlights boundary of iris and eye contour, peak image highlights sclera areas and gray image which highlights the brightness inside iris ( light reflectance); respectively.
In the proposed method, the recurrent network has the order of neurons of 15-40-1 for input, hidden and output layers; respectively. 50 Log-Polar eye images have been used to train the recurrent neural network by 3000 epochs. PCA is applied on RGB color channels of each training image and the corresponding first eigenvector image is obtained. The eigenvector image is rescaled between 0 and 0.9 pixel (a) UBIRIS database [20].  values and then transformed into a series of pixel profiles according to the fixed θ sampling rate S=22 rows. A pixel profile is 1X256 pixels and each image is represented by a series of H=16 pixel profiles. Consequently, a series consists of 16 * 256 pixel values.

Results
In order to evaluate the performance of the proposed method; two training approaches have been considered. Figure 5(a) demonstrates the first approach by showing a pixel profile at θ =359, green signal, taken from the left eye image in Fig. 1(c) and called raw signal. The signal is modified at the eye contour location to have a pulse with value of 0.8; blue signal. The modified signal is used to train the recurrent network in order to predict one future step. Once learning operation is done, the raw signal (without any modification) is presented to the trained network. The network predicts one value ahead and generates a pulse at the actual location of the eye contour; red signal. Figure 5(b) shows the pixel profile at θ =355 of the same eye image; green signal. This profile is presented to the   trained network and a predicted signal is generated. The predicted signal possesses a sharp pulse that expected to indicate the location of the eye contour; red signal. Figure 6(a) illustrates the second approach of training the network by showing a series of H=16 pixel profiles; green signal. These profiles represent an eye image that sampled according to the sampling rate S=22 rows. The series is modified to have sharp pulses at the actual locations of the eye contour; blue signal. The modified series is considered as a training signal and used to train the network. After training, the raw series, which has no sharp pulses, is presented and a predicted signal is obtained; red signal. Figure 7 shows some eye images taken from the databases. The eye contours are represented and manually labeled by H=16 points according to the sampling rate S as illustrated in red points. The predicted eye shapes by the trained recurrent neural network, based on the second training approach, are shown in green points. Table 1 concludes the results of eye shape detection using the proposed method, ASM and DT. 300 eye images have been randomly taken from the two databases for this experiment. The table highlights three comparative components; detection time, accuracy and number of iterations.

Discussion
Besides the advantages of using Log-Polar transform to deal with different eye rotation angles, scales and translations as detailed in the previous work [10], PCA has provided a very stable eye texture representation against lighting conditions by highlighting the variance between RGB color channels. The representation provides a uniform description with reducing the nonlinearity of the change of pixel values along pixel profiles. As long as the pixel values change smoothly, a sharp pulse is guaranteed to be generated by the trained recurrent neural network one-time along a pixel profile.
The recurrent neural network performance has been investigated with respect to the change of pixel profiles (or θ coordinates). The recurrent network has been trained by a modified pixel profile as shown in Fig. 5(a). The trained net- work was able to memorize the location of the eye contour when the raw pixel profile is presented. One can observe that there is a difference in magnitude between the training and predicted signals especially at the actual location of the eye contour. This difference doesn't affect the results because the most important information is the location of the rapid and sudden pulse in the predicted signal.
On the other hand, the difference between two pixel profiles in the same first eigenvector image is small with having the same features of classifications. Based on learning the raw profile in Fig. 5(a), the network was able to predict correctly the location of the eye contour in a pixel profile that is 6 degrees distant in Fig. 5(b).
However, the large difference in θ between the training and presented profiles influence the network performance by reducing the magnitude of the generated pulse. Therefore, the generated pulse becomes very difficult to be recognized in the predicted signals. Hence, training strategy has been changed to the second approach.
In order to make the network more flexible and robust to deal with different pixel profiles of various eye images, the network is trained using a series of modified pixel profiles. Each series represents an eye image according to the fixed sampling rate of θ as shown in Fig. 6. This strategy enables the network to discern various patterns of pixel profiles under different lighting conditions. Furthermore, it enables the network to learn the frequency of sharp pulses in each series that represent the eye shape in an image. Consequently, the changes of the eye contour under different eye states has been modeled by training the recurrent network using different series of pixel profiles. Therefore, the network performance has surprisingly been improved as shown in Fig. 6 and Fig. 7. Figure 7 illustrates the performance of the proposed method in order to detect the eye contour of the testing images. The accuracy is high and the detected points almost match the manually labeled points.
The proposed method outperforms ASM and DT as high- lighted in Table 1. ASM and DT use an iterative strategy to detect the eye shape whereas the proposed method achieves fixed steps. This reduces the detection time significantly.
The accuracy yielded is high because of using the prediction ability of the recurrent neural network as well as its generalization capabilities to deal with new pixel profiles.
One may ask about using feedforward network instead of recurrent network. In feedforward network, the patterns of data are not important whereas discerning the recursive relationship of the output with past states in the form of Eq. (12) is the objective of employing recurrent network. This dynamic form enables the proposed method to deal with noise effectively and to discriminate the eye contour significantly. On the other hand, the authors don't say that the current structure of the network is the best structure for modelling and predicting the eye contour location. We tried to use the same structure as a static network without any recurrent connections. The prediction ability was very low. Moreover, recurrent neural networks have widely been employed in neuro-control applications to be learned in real time. Therefore, the weights of the learned recurrent network can be updated online to track the eye contour in real time. This work is left to be investigated in future.

Conclusion
We proposed a method to detect the eye contour in grayscale eye images. The eye image is decomposed using Log-polar transform into a series of pixel profiles according to a fixed sampling rate. The pixel profiles are represented in a uniform classification by interpreting the variance between the eye color channels. The paper demonstrates a new approach for using recurrent neural networks in the object shape detection domain. A recurrent neural network is employed to achieve two tasks. The first task is in image domain to model the patterns of pixel values around the eye contour. The second achieved task is in shape domain to model the change of the eye contour at different eye states. Furthermore, the proposed method differs from other strategies by attaining fixed steps without using an iterative approach. The results verified that the proposed method demonstrates a very stable performance against the change of eye rotation angles, scales and eye states under different lighting conditions. High accuracy has been achieved with reducing the detection time compared to the conventional methods. Therefore, the proposed method is very promising to be used in real time applications.