A 3D-Eyeball/Skin Decorrelated Active Appearance Model

We propose 3D Multi-Texture Active Appearance Model (3D MT-AAM) where the skin and the eyeball are considered as two decorrelated objects contrary to classical AAM where the eye region is a continuous part of the face mesh. The iris is modeled as a sphere and rotates under the eye hole permitting the synthesis of new gaze directions. We compare the model with previous work that models the iris as a simple 2D texture and with a classical method and we obtain better results. In addition we propose a method for back-projecting the search result of our model from the model frame to the real image frame using barycentric coordinates. On the other hand, we explore different optimizations namely: Gradient descent (GD), Simplex, and Genetic Algorithm (GA) to optimize our model and we compare the performance of the system. We obtain the best performance using GA. However with Simplex we improve the computation time, with a slightly lower performance.


Introduction
Nonverbal communication can be defined as communicating through sending and receiving wordless (mostly visual) cues between people.The eyes, being the window to the brain and soul, play an important role in understanding human intentions and emotions.Eye movements in particular are primary cues of non-verbal communication.Through our eyes' gaze and blinking, we can communicate messages to people around us.Through these motions we equally interact, act and react.Detecting and interpreting gaze in Human-Human Interaction (HHI) is straight forward.However, when it comes to Human-Computer Interaction (HCI), this task is indeed difficult.An automatic gaze detection system should work with low resolution images coming from ordinary webcams.
It should deal with the changes in eye states during blinking, fast movements of the iris and large head movements, occlusions by factors such as eyeglasses and hair, and the variability in light conditions and in iris location, color and scale.Active Appearance models (AAM) (1) are well-known statistical techniques widely used for face analysis.Building an AAM for gaze detection localization requires training a set of hand-labeled faces with different gaze directions.The appearance of the eye is learned simultaneously with the face mesh.After finding the iris location, the gaze angle should be further calculated from this information.
In Salam et al. (2) , we have proposed to separate the appearance of the face mesh from that of the iris texture.To do this, we merge an iris AAM with an eye model where holes are put instead of the iris/sclera part.The iris model slides under the face mesh permitting the synthesis of different iris locations.The method models the iris texture in 2D.This does not take into consideration the fact that the iris looks elliptical in the case of extreme gaze directions.In order to account for this, in this paper, we extend this work by modeling the iris as a part of a sphere.The iris is projected on the sphere which is rotated to a certain gaze angle.Then it is re-projected on the 2D plane giving the iris its real elliptical shape together with the gaze angle.This permits to realistically model the iris appearance in extreme gaze directions.In addition, as we get the gaze angle directly instead of just the iris location, we surpass the need for further calculations.On the other hand, the optimization method used in Salam et al. (2) was the Genetic Algorithm (GA) (3) .Although it gives optimal results, however the calculation time is not negligible.In this paper we explore other heuristics: the Gradient Descent and Simplex (4) and we compare the performances of these with the GA.The organization of the paper is as follows.In section 2 we present past work in gaze and eye detection.In section 3, we present our model for gaze detection.Finally in section 4, we present experimental results.

State of the art
Research in the area of eye and gaze detection is very active and many methods exist in the literature.These methods may use Infrared (IR) light or traditional image based passive approaches (9) .After finding the iris shape and location in the eye, the gaze angle and vector are determined from this information.A detailed state-of-the-art on eye and gaze tracking techniques can be found in Ishikawa et al. (10) .
In IR-based methods, the glint which is a corneal reflection of the emitted light source and which appears very close to the pupil is detected.The pupil center, its shape and that of the iris are then extracted with the help of this information.Hansen et al. (6) propose an intrusive IR gaze tracking system for the aid of bed-bounded people.Their method is expensive in its hardware requirement.Other authors integrate more than one IR light source in their systems.For instance, Perez et al. (7) use four IR sources.The goal of using more than one IR source is to be sure that a glint is always present in the image.Although simple and efficient, IR methods are strongly dependent of the brightness of the pupils which is influenced by many factors such as eye closure and occlusion, external illumination and the distance of the user from the camera.Our method does not need special hardware equipment such that it works with a simple cheap webcam.
Image based passive approaches use the eyes appearance or shape or their combination to detect the gaze.
Shape-based methods extract the shape of the eye.Yuille et al. (8) use a deformable template of the eye composed of two parabolas for the eye shape and a circle for the iris.For the matching process, they use four energy fields composed of intensity edges, valleys, peaks and gray levels.Chen et al. (9) use the elliptical separability filter to find iris candidates in the image where an eye template with elliptical eyelids and circular iris shapes is looked for in the facial image.The test of this method is done only on people in frontal view and looking in front of them so its efficiency in detecting different iris positions is not known.Ishikawa et al. (10) compute an initial estimation of the iris position by template matching and refines this location by an edge-based ellipse fitting.These methods are generally effective but their disadvantage is that they are computationally expensive.They also necessitate the definition of an adequate set of initial parameters for the template and they usually record a failure with big head poses.
Appearance-only methods include image template based approaches (11) which rely on the construction of an eye template that is compared to image patches in the face.Such methods have problems with scale and head pose variation.Hillman et al. (12) construct the image patch model starting from a learning database through eigenfaces.Other methods train classifiers such as Huang and Wechsler (13) that use neutral networks to detect the eye.These need large training databases to work well.
Shape-appearance methods use shape and appearance of the eye.For instance, Active appearance models (AAM) (1) are very popular statistical methods that are able to model the appearance and the shape of the object in question.Hansen et al. (14) combine a mean-shift color tracker with a hierarchical AAM to track the eye corners and the iris positions.Bacivarov (15) uses component-based AAM in order to model the different iris positions.Methods based on AAM need to include faces with different gaze directions in the training set in order to detect gaze.The more images are included in the learning base, the more parameters are necessary to describe the appearance of one face.Our method has the advantage of restricting the database of AAM where it is capable of detecting different gaze directions without the necessity of including this variation in the learning database.
Methods that use 3D eyeball models exist also.For instance Reale et al. (18) analyze the iris center movements by employing a 3D eyeball/iris model.His model contains the position of the eyeball, the iris radius and size.The eye image data is projected to the model and pixel error is minimized between it and the rendered rotated eyeball that gives different iris positions.Iris contour extraction is done using a Starburst algorithm.Yamazoe et al. (19) also apply a similar head-eye model.It differs from that of Reale (18) in the fact that the 3D model is projected on the image plane and not the contrary.In our method, the iris shape is projected on the eyeball and then re-projected on the 2D plane.

Modeling
Most computer graphics face synthesis methods model the eyeballs as separate objects from the facial skin.In these methods, the eyeball is located behind a 3D mesh that represents the facial skin and that has openings between the eyelids.Figure 1 shows the eyeball with its different rotations modeled as a separate object in a computer animation software called "blender".In the facial analysis framework, particularly in AAMs (1) , the eyeball is not modeled as a rotating 3D sphere located behind the skin surface.Instead, the visible region of the eyeball is a part of a continuous face mesh.scale corresponding to the iris, the horizontal rotation of the eyeball and is its vertical orientation respectively.This vector of parameter replaces the vector of parameters proposed in Salam et al. (2) .Rotational parameters are used instead of translational ones in order to find the gaze direction.
Of course, the variation due to the eyeball movements would not be encoded in the face appearance parameters anymore.To do this, a skin model is trained separately.It is the same as in Salam et al. (2) where a hole inside the eye permitting to exclude the interior of the eye from the training process of AAM is put. Figure 3(a) illustrates the mean texture and annotations of this model.Actually, the basic idea is still the same, that is, to decorrelate the appearance of the skin from that of the iris.However, instead of modeling the iris as a simple 2D texture, we project the latter on a sphere, rotate the sphere and then re-project back on the plane surface.This is illustrated in figure 2. The red circle represents the iris shape and the blue sphere represents the eyeball.At each iteration of the AAM search, the appearance parameters of the iris and its scale are tuned, this gives us a certain appearance and shape of the iris.This shape is then projected on the eyeball which is rotated by the gaze parameters and then projected back on the model frame.
This assures to take into consideration the shape and appearance of the iris at extreme gaze directions together with its shape.(e) Tune the pose gaze T and appearance iris C of the iris model.Figure 3(b) shows the iris texture concatenated with the iris shape.As the figure shows, after mapping the iris texture on the rotated 2D shape, the appearance of the iris is elliptical modeling realistically the person's gaze.The horizontal rotation of the sphere is limited to +40° and -40° and the vertical rotation is limited to +10° and -10° which is found to give plausible projections of the 3D iris shape.
In the following section, we show how the iris shape is back-projected from the model frame to the image frame.The above has described how the model works at one iteration until convergence.When the optimal iris shape is found in the model frame, it should be back-projected to the real image.This task is not straightforward.Actually the optimization of the iris parameter takes place in the model frame where the eye texture is warped from the web-cam image frame to the mean model of the eye.The iris position is then optimized with respect to the midpoint of the mean eye shape.In Salam et al. (2) , to retrieve the iris position in the original image using the iris parameters found in the model frame, we suppose that the mid-point of the mean shape of the eye in the model frame is approximated by the mid-point of the optimal eye shape in the web-cam image.The following steps are:

Back-Projection of the iris shape
 Apply the eye scale to the iris so that it will take its size  Apply the gaze parameters gaze T to the iris supposing that the mid-point of the eye is the reference This approach gives the approximate position of the optimal iris in the web-cam image; however this would not be totally precise.Consequently, we propose another approach to perform this back-projection.This approach is based on Barycentric coordinates.This is the same approach used in warping from one image to another.The algorithm works as follows: Let    are barycentric coordinates of mod el i s then the corresponding point in the image frame is thus: V are the vertices of the triangle in the image frame.

Optimization
To optimize the gaze parameters, three heuristics are implemented and compared: Gradient Descent (GD), Simplex and Genetic Algorithm (GA). (3)is a population-based iterative search heuristic that aims at finding the set of parameters that optimizes a certain cost function.It is inspired from the process of natural evolution.Actually, what happens is that a set of candidate solutions evolves towards better solutions until arriving at the best solution.The iris pose and appearance form the genes of the GA.They are combined into a same vector to form a chromosome (cf.Figure 5).Each iteration is a cycle of which a new generation of chromosomes is formed.The group of chromosomes forms a population.At each iteration a population of solutions is formed.Their corresponding fitness is computed.According to the latter, some of these chromosomes are set to be the parents of the population of the next iteration.Fig. 5.A chromosome of GA.The parameters inside are those of the gaze parameters Initialization: The initial population is generated randomly between the upper and the lower limits of the parameters with a uniform distribution.This allows spanning all the search space.Selection: This is the act of choosing the parents for the production of the new generation.We use tournament selection of which a number of chromosomes are randomly selected from the population, their fitness computed, and the ones having the least error are selected to be included in the next generation population.Reproduction: This is done by two-point crossover and guassian mutation with a proportion of 0.8 for crossover 0.2 for the other.Compared to the paper in Salam et al. (2) , GA algorithm was used with stochastic uniform selection and 300 chromosomes as an initial chromosomes number which causes the optimization process to be slow.

Genetic Algorithm (GA)
Simplex (4) is also a population-based iterative algorithm.Initially, a random initialization takes place and the best N+1 solutions are kept (N = number of gaze parameters (appearance and pose)).Then new solutions are generated using simplex operators (mainly reflection expansion and shrinking).These are inserted in the simplex by extrapolating the behavior of the fitness function at each solution.
Gradient Descent (GD) finds the solution by moving small steps in the negative of the gradient of the error function with respect to the gaze parameters.

Results
In this section, first, we compare our results to those of Salame et al. (2) .Tests are conducted on the UIm Head Pose and Gaze (UImHPG) database (5) .We randomly choose 185 images from the initial database.Then, we present results concerning the optimization of our model.Our first experiment compares the 3D MT-AAM to the 2D MT-AAM.For this experiment, we use 10 persons of the PG database (20) looking in front of them to train the eye skin model.We also include comparison with a classical AAM method.This classical AAM is called the Double Eyes AAM (DE-AAM).It is made on the upper part of the face by training on 10 subjects of the PG database.Each subject has 3 facial images where he looks to the extreme left and right and in front of him.In figure 6, we compare the Ground Truth Error (GTE) of the iris versus the percentage of aligned images in the UIm databases (185 images chosen randomly as mentioned earlier).The GTE is defined as the mean of the distance (Euclidean distance) between ground truth (real location of iris center) marked manually and the iris center given by the gaze detection method, normalized with the inter-ocular distance.As the figure shows, for an error less than or equal to 10% of the inter-ocular distance we have a good detection of the iris in 80% of the images for the 3D MT-AAM versus 65.95% for the 2D MT-AAM and 11.35% for the 2.5D DE-AAM method.This proves the superiority of our method on classical AAM where in addition to improving the detection tremendously, the database of AAM is restricted.On the other hand the 3D model of the eyeball improves the detection by 14.05%.This shows the efficiency of our proposition in this paper and that modeling the iris in this way is indeed more realistic and thus gives better results.Actually, as the iris is realistically represented, the active appearance model will converge better since the difference between the model instance and the real image will be smaller.On the other hand, figure 7 shows some qualitative results comparing the two models: the 3D MT-AAM and the 2D MT-AAM.It is obvious from the figure the superiority of the 3D AAM on the 2D AAM.As we see for extreme gaze directions the iris in the 3D MT-AAM will take the shape of an ellipse representing realistically its appearance, however for the 2D MT-AAM, the iris shape is a circle that comes out of the eye in most of the time because it cannot take the real shape of the iris.
As a conclusion, we can state that the 3D representation of the iris is more realistic than a 2D one and thus gives better results.

Comparison between different optimizations
We compare three optimizations of the gaze vector of parameters.This comparison led us to choose the best suited optimization for this parameter.To perform this test, we train the eye skin model on 104 neutral images of the Bosphorous database (17) .The iris AAM is trained using a group of 23 iris textures starting from the images of Dobes et al. (21) .
GA is compared with Simplex and Gradient Descent (GD).To do this comparison, we plot the Ground Truth Error (GTE) of the iris versus the percentage of aligned images.We take into account two criteria: The percentage of good detection at 10% of the inter-ocular distance and the computation time of the optimization algorithm.

GA and GD
Figure 8 shows the comparison between the three optimizations.As the figure indicates, GA gives the highest GTE curves among these three different optimizations.As a matter of fact, for an error less than or equal to 10% of the inter-ocular distance, we have a good detection of the iris center of 91.28% of the total number of test images in the case of GA versus 75.58% using the GD and 86.63% using the simplex algorithm.Concerning the computation time of each, table 1 compares this for these optimizations.As we see from the table, the Simplex algorithm has the least computation time among the others (10 sec per image), GD (20 sec) takes more time than Simplex but less than GA (35 sec).Having the computation time of simplex, the least among all and the % of good detection the second best, Simplex can be a compromise between computation time and accuracy.However, we choose to use the GA because our goal is to achieve the most accurate results.
With respect to our work in (2) , GA was used however with different options.Actually in this past work we used Stochastic Uniform selection with an initial population number of 300 chromosomes.20 parent chromosomes are then passed to the next generation at each iteration.Comparing different options of GA, we have found that the tournament selection with only 20 chromosomes as an initial population number reaches the optimum faster and with less computation time.

Conclusions
In this paper, we have presented a model for gaze detection.The model extends previous work that models the iris as a simple texture to model the iris as a sphere and we obtain higher detection rate.This modeling has the advantage of capturing the elliptical shape of the iris in extreme gaze directions.In addition, we compare three optimizations of our model: Genetic Algorithm, Gradient Descent and Simplex.We find that Genetic Algorithm gives the best results with the slower performance.Simplex gives less performance but the best computation time.

Fig. 1 .
Fig. 1.Modeling the eyeball as a separate object from the facial skin in computer graphics

Fig. 2 .
Fig. 2. Modeling iris on a sphere; the figure shows how the iris shape becomes elliptical for extreme gaze directions

3 .
Using the optimal parameters found by the eye skin model, synthesize the eye skin ( eye m g) (Fig.2(a)).

4 .Fig. 3 .
Fig. 3. Error calculation at one iteration of the algorithm (c) Apply a low pass filter to get the final eye region model m g (Fig. 2(e)).This low pass filter serves to smooth the boundary between the eye skin and iris model.eye region (fig.1(g) and fig.1(h)).

Fig. 4 .
Fig. 4. Barycentric coordinates, the star represents a point of the iris shape

Table 1 .
Comparison of the computation time of the different