A Proposal for New Subjective Evaluation Analysis Method using the Visual Analog Scale

We propose a new subjective evaluation experiment and analysis method. By adopting VAS, it became possible to perform statistical analysis to comprehend the overall tendency even in subjective evaluation experiment of a small sample. The other feature is to visualize the experimental results by overlaying a box-whisker plot and a univariate scatter plot. We recommend that it is essential to grasp the overall trend by showing all the data when the number of samples is small. We experimented with 20 subjects on conversations with communication toy robot. We investigated the user's emotions from the conversation with a robot, and whether it relates to the smoothness of the conversation. Impressive results were obtained that the question on negative emotions showed the opposite trend from expectation.


Introduction
We have proposed a new analysis method for subjective evaluation experiment (1)(2)(3)(4) for several years.We described how to analyze experimental results on how the conversation with the communication robot affects user's emotions or feelings (5) .We proposed a concept of a new experimental method, which has three features, for subjective evaluation of human sensitivity.The first feature is to set redundant questions for questionnaires to subjects.We set two similar questions in our previous studies.For example, one is the value on the interval [0, 1], the other is the value on the two pole attributes.Adopting continuous quantity allows us to determine if the object has nonlinear features.The second feature is to adopt Visual Analog Scale (VAS).Likert scale is an order scale, especially for small samples, it is not suitable for statistical processing.In addition, we solved the problem of troublesome counting of VAS by using VAS measurement application called VASpad (4,5) .
The third feature is to visualize the experimental results by overlaying two graphs, a box-whisker plot and a univariate scatter plot (5) .Weissgerber et al. (6) recommend using univariate scatter plot instead of bar graphs or line charts to visualize data when the number of samples is small.They proposed to show the full distribution of the data for small sample size studies.From our experiences, we have found that there are many outliers in subjective evaluation experiments, and most cases are small samples.That is why there is no guarantee that subjective information will follow a normal distribution without outliers.We also pay attention to this point and propose to show the experiment results by superimposing the box-and-whisker plot and the univariate scatter chart.We examine the effectiveness of this proposed method in this paper.We conducted the same experiments as before for different subjects.
We also show a histogram showing the trend of the whole data.

Subjective Evaluation Experiment
Here is a brief description of the outline of VAS.VAS is a method of measuring the degree of subjectivity by himself on a horizontal straight line.We use the VAS as a psychometric response scale used in questionnaires.It is a better measurement instrument for subjective information that cannot be measured directly.At first, we set the minimum and maximum conditions at both ends with a horizontal line of 10 cm.Subjects put a mark on the line.We measure its length from 0 with a ruler and calculates its value.Likert Scale (LS) is often used in general questionnaire.However, we adopted the VAS.We can express user's subjectivity directly and can confirm the trend from the whole distribution even when the number of samples is small.Figure 1 shows a sample of VAS and LS.The upper side shows VAS, and the lower side shows LS.We can understand intuitively that it is hard to find trends from whole distribution in small samples when LS is used.
We analyze from the viewpoint of differences in questions.Table 1 shows the measured values obtained from smooth conversations and table 2 shows the measured values obtained from the non-smooth conversation.
All values are within the interval of 0 to 1, rounded to two decimal places.Table 3 and table 4 show the fundamental statistics of table 1 and table 2, respectively.It turns out that the average value shows different trends for each question from the tables.Both standard deviation of Q6, Q8, Q10 is slightly large.Next, in order to grasp the tendency of the distribution as a whole, overlay of box-and-whisker plot and univariate scatter plot is shown in figure 3. The data of case 1 is drawn in light blue, and the data of case 2 is drawn in bright red.The thick line inside the box is the median, the upper side of the box is the upper quartile, and the lower side of the box is the lower quartile.In order to make it easier to see the feature, figure 4 shows only the data of case 1, and figure 5 shows only the data of case 2. The upper and lower lines outside of the box are connected by the dotted line from the position of the upper (lower) quartile to the maxima (minima), respectively.Small circles represent outliers.
The histogram is also a graph that grasps the tendency of the entire distribution.Figure 6 shows the histogram of case 1, and figure 7 shows the histogram of case 2.
We predicted the tendency of answers for each question in advance before the experiment.Figure 8 shows how the ten questions shown in figure 2 are classified.We anticipated that the measurements show opposite trends when the conversation is smooth and not smooth.
The trends of Q1, Q2, Q3, Q4, Q5, Q7 and Q9 supported the prediction.However, Q8 and Q10 did not apply to expectations.Q8 is a question about sadness, Q10 is a question about anger.Impressive results were obtained that the question on these negative emotions showed the opposite trend from expectation.Furthermore, from these results, it is shown that those who feel sadness are less than those who feel angry even if the conversation goes wrong.
Q6 is also interesting.Regardless of the smoothness of the conversation, many subjects feel that this robot has better speaking than listening.This can be seen from the difference in tendency between Q4 and Q5.
I, the histogram is also valid for grasping the overall trend.It turns out that Q8 and Q10 show an opposite tendency to the others only at a glance.
For further reference, we also show results of cluster analysis for each question in the same way as in previous research (5) .Figure 9 and 10 show cluster dendrogram of case 1 and case 2 by the Ward method.Q6, Q8, and Q10 have the same classification as the prediction in case 1.It is also very interesting that the remaining question groups are divided into those related to conversation performance and those related to feelings received.

Conclusions
In order to verify the effectiveness of the proposed method, we conducted a subjective evaluation experiment on the conversation with the communication robot for different subjects by the same method as the previous research.The difference from the experimental result of last year is Q8 in case 2. When the conversation was not smooth, many subjects felt sadness the previous year; however this time it was less.Regarding the result of clustering, we confirmed that case 1 is almost the same as the previous result, but in case 2 there is a difference.As a general trend, the analysis result is similar to the previous one, but we could confirm such a difference.To establish a subjective evaluation experiment method in a small sample, we continue our research in the future.

Fig. 6 .
Fig. 6.Histogram for each question (case 1) Q1, Q2, Q3, Q4, Q5, Q7 and Q9 show high values when the conversation is smooth and low values in the non-smooth conversation.1. Q8 and Q10 show high values when the conversation is non-smooth and show low value when the conversation is smooth.2. Q6 shows various values.

Fig. 8 .
Fig. 8. Trends of measurement values assumed before experiment