Face Detection and Analysis of Relationship between Degree of Emotion Arousal and Facial Temperature

Various studies regarding facial-expression recognition and emotion detection have been conducted from the perspective of emotional communication. In our previous studies, we analyzed the relationship between facial-skin temperature change and emotions. Furthermore, we clarified that temperature change occur on the nose and both cheeks (the regions of interest) when emotions of joy are evoked. Owing to the scarcity of emotional arousal data, the relationship between emotional intensity in intentional-facial-expressions and facial temperature analysis could not be analyzed in detail. Moreover, we could not analyze a large amount of data because individual face detectors were necessitated to extract the regions of interest. Therefore, we herein propose a face detection method that acquires data and combines thermal and visible moving images. Using this method, we conduct a detailed analysis of the relationship between the intensity of emotions in intentional-facial-expressions and the facial surface temperature.


Introduction
In recent years, owing to the development of information equipment, the importance of human interfaces for realizing seamless man-machine communication has increased. Therefore, various studies pertaining to facial-expression and emotion detection have been conducted from the perspective of emotion communication (1−5) . In facial-expression recognition, there are many studies that use not only visible images but also other sensors (3) . Among them, the use of thermal moving images is a reliable and non-invasive technique for assessing emotional arousal (4) . In order to perform detailed facial-expression recognition, it is necessary to look at the degree of intensity. Although there are many studies on discriminating multiple facial-expressions (3−5) , they have not yet focused on a single expression and discriminated the degree of its intensity. In addition, it is clear that changes occur in the cheeks due to emotional arousal (5) . However, few studies that focus on the cheeks. Therefore, it is necessary to examine the cheeks as well.
Previously, we analyzed the relationship between facial temperature change and emotions. We clarified that temperature change occurred on the nose and both cheeks (the regions of interest, ROIs) when emotions of joy are evoked (6) . However, owing to the scarcity of emotional arousal data, the relationship between the temperature change in the target ROIs and intentional-facial-expressions could not be analyzed in detail. Because a face detector was required for each individual to extract the ROIs, a general-purpose face detector was necessitated to analyze a significant amount of data. Therefore, we herein propose a general-purpose face detection method that uses both thermal and visible moving images. Using the proposed method, we analyzed the amount of temperature change during the emotional arousal of intentional-facial-expressions, targeting the ROIs acquired. Fig. 1 shows the data acquisition environment. Eleven participants (A-K: in their 20s; six men and five women) were evaluated over three days (days 1-3). Thermal and visible moving images were acquired simultaneously. The acquisition environment was as follows: room temperature 24.9-25.6 °C and room humidity 49.5%-60.9% (participants A-E); room temperature 18.7-22.5 °C and room humidity 42.5%-57.6% (participants F-K) under fluorescent lighting (700-1,000 lx). In this study, we used the data of intentional-facial-expressions (for three days, five times each for the expressionless state, weak laughter, and strong laughter).

Acquisition Data
Because the thermal and visible cameras had fields-of-view of different sizes, the positions in the real world corresponding to the corners of the thermal moving images were indicated using four markers on the partition behind the participant.
The data used in this investigation were acquired in accordance with the ethical regulations concerning studies involving humans at Akita University, Japan.

Preprocessing for Thermal and Visible Moving
Images Fig. 2 shows the flow and overview of the face detection method. First, grayscale images (grayscale thermal images) were created for each frame of the visible moving images to detect the face region. Normalization was performed to set the pixel intensity from 0 to 255, corresponding to the temperature range of 29.0-37.0 °C (A-E) and 26.0-7.0 °C (F-K). Subsequently, visible moving videos partitioned at intervals of 30 fps were homography transformed using the coordinates of the four markers such that they were the same size (640×480 pixels) as the grayscale thermal images. Finally, because the total number of frames of the visible moving and grayscale thermal images differed, processing was conducted to match with the number of frames of the grayscale thermal images. The frame numbers were calculated using the start and end frames in the visible moving and grayscale thermal images.

Face Detector for Visible Moving Images
The coordinates of the face region in the visible moving images were acquired using the face detection function (7) included in the open-source library, Dlib (8) . Dlib can obtain the coordinates of 64 points of the face in the visible image.

Moving Images
First, the coordinates of the face region of the first frame of the visible moving image were acquired using Dlib. Subsequently, the coordinates acquired by Dlib were drawn on the grayscale thermal image that was acquired simultaneously.
However, the coordinates of the face area in the visible moving and grayscale thermal images did not match exactly; therefore, the identified coordinates in the visible image for the first frame were manually adjusted. This adjustment was recorded and used to adjust the coordinates of the face area for all other frames. The coordinates of the face area of all frames of the visible moving images were acquired, and the adjusted coordinates of the face area for all frames corresponding to the grayscale thermal images were determined automatically.

Temperature Analysis of ROIs
Using the face detection method proposed in Section 3, the ROIs were extracted and the temperature acquired from each ROI was analyzed. In this paper, we used the data of intentional-facial-expressions.

Setting the ROIs
For the grayscale thermal image, which depicts the front view of a face in the first frame, the ROI was set as indicated by the blue frames shown in Fig. 3. By setting the point at the center between the nostrils, the ROIs were processed to follow the tilt of the face.

Calculation of Temperature Change
To reduce the effect of temperature fluctuations in the room, the difference between the average temperature of 20 background points (as shown in Fig. 3) and the time-series data of the average temperature of each ROI were used to calculate the skin-temperature difference (STD). To remove noise, a moving-average filter was used to smooth the STD time-series data. Subsequently, the amount of temperature change (ATC) between the temperature of the frame of interest and that of the frame 30 frames earlier was calculated.

Figs. 4 and 5 show examples of the calculation results
of ATC using the intentional facial-expression (weak laughter, strong laughter) data of Participant G. The ATC calculated shows that both the weak and strong laughter data of the intentional-facial-expression tended to increase in the vicinity of the start frame and the end frame of the facial-expression section on the left and right cheeks. Specifically, the ATC increased in the positive and negative     Fig. 4 (a), the ATC increased by approximately +0.12 °C and ˗0.06 °C near the start and end frames, respectively. As shown in Fig. 4 (b), the ATC increased by approximately +0.11 °C and ˗0.10 °C near the start and end frames, respectively. As shown in Fig. 5 (a), the ATC increased by approximately +0.40 °C and ˗0.21 °C near the start and end frames, respectively. As shown in Fig. 5 (b), the ATC increased by approximately +0.30 °C and ˗0.20 °C near the start and end frames, respectively. A similar tendency was observed in Participants A-F and H-K. However, in the data of Participant A on day 1, a different waveform was obtained. The ATC increased by approximately ˗0.07 °C and +0.06 °C near the start and end frames, respectively, in the intentional-facial-expression data involving a weak laugh. Furthermore, the ATC increased by approximately ˗0.17 °C and +0.17 °C near the start and end frames, respectively, in the intentional-facial-expression data with a strong laugh. Because the data of Participant A on day 1 was obtained after the participant had worked under scorching conditions, we expected the results to differ from those of other participants were obtained. In addition, we compared the intentional-facial-expression data involving weak and the strong laughter. The ATC near the start frame of the facial-expression section was more significant in the strong laughter data compared with that in the weak laughter data.
The results above suggest that in the intentional-facial-expressions, the ATC values of the facial-expression and expressionless sections differed. Furthermore, it is suggested that the degree of intentional facial-expression can be determined by comparing the ATC magnitudes near the start frame of the facial-expression section.

Significant Difference in Temperature Change in ROIs
Welch's t-test (9) was performed on steady-state and the intentional-facial-expression (weak laughter, strong laughter) STDs of each participant. The significance level was set to 5% (two-tailed test). The data used for the test were extracted from the first 100 frames of the first expressionless section, i.e., the steady-state, and all frames (five times each) for each intentional facial expression. Table 1 summarizes the test results for the STDs. By investigating the significance with respect to the steady state in each ROI and the degree of each emotion, a significant difference of 90.0% or higher was discovered in the data. In other words, the temperature ranges of the expressionless and expression states differed. This suggests the usefulness of the STDs in the ROI for detecting laughter.    462  471  480  489  498  507  516  525  534  543  552  561  570  579  588  597  606  615  624  633  642  651  660  669  678

Conclusion
We proposed a face detection method that combines thermal and visible moving images. We used the method to analyze the face temperature in the ROIs when intentional emotional expressions were generated. The following results were obtained: (a) The ATC values for the intentional-facial-expression data differed from those of the expressionless data.
(b) The degree of intentional-facial-expression can be determined by comparing the ATC magnitudes near the start frame of the facial-expression section.
(c) Analyzing the STD in the ROI is useful for distinguishing an expressionless face from an intentional-facial-expression.
Analyzing the temperature of the ROI in natural-facial-expressions will be considered in future studies.