Study of improvement of intelligibility for elderly speech based on formant frequency shift

This study aims at improving speech intelligibility based on the acoustic features of Japanese elderly people. In general, it is difficult for elderly people to tune aging articulators accurately. In the previous research, it was found that the spectral transition distance of elderly speech with non-briskness decreased. In addition, it was found that there was relationship between the formant and spectral transition distance. In this paper, we investigated about difference of spectral transition distance and position of articulation in terms of the level of briskness. In addition, we improved the speech of the elderly people by the signal processing. The method of signal processing is method for shift first formant frequency (F1) and second formant frequency (F2). As the results of analyzing speech with non-briskness and briskness, we found that there was a relationship between the difference of spectral transition distance and position of articulation. It is difficult for elderly speakers to utter /a/ and /i/ to move a large mouth and tongue. And we improved intelligibility of speech with non-briskness by the F1 and F2 shift method in the some words which need large movements of mouth and tongue. In future works, we will investigate analysis and how to improve acoustic features of consonants in the elderly speech.


Introduction
Speech is very important for us to communicate with others in our daily lives.However, because of aging, some people's ways of speech change.Owing to it, they cannot have conversation smoothly.For the aging society worldwide, this problem is very serious.
For smooth communication, it is necessary to improve the elderly speech.
The previous research reported about analyzing Japanese elderly speech, and found that elderly speech have no intelligibility as compared with non-elderly by listening test [1].And, there is a correlation between the intelligibility and the difference of transition distance [2].However, the relationship of the utterance behavior is not investigated.
In this study, we improved speech of elderly people based on the analysis.As analyzing, we investigated the relationship between difference of transition distance and position of articulation in terms of the level of briskness.As the method for improving, we used shift method based on formant frequency.Specifically, the first formant frequency (F1) and the second formant frequency (F2) shift method which makes briskness speech close to non-briskness speech.

Database of Elderly Speech
This chapter describes database of the elderly speech e and the method for selecting the subjects.

Recording
We recorded speeches of the 36 male elderly persons over the age of 60 in order to improve the intelligibility of the elderly speech.Recorded words are 543 isolated words which have phoneme balance.
The elderly speech was recorded on 16-bit, 24k sampling.

Subjective characteristics of elderly speech
For analyzing influence of aging intelligibility more, we selected the speaker with the impression conspicuous based on people's feeling impression of subjective characteristics of the elderly speech and analyzing the physical features of the speaker's voice.
We conducted a listening test to determine the degree of subjective characteristics of the elderly speech that include "rough", "slow speaking", and "non-briskness" [3].The subjects were 10 male and 10 female students, and they were asked to judge the degree of subjective characteristics on a five-point scale from 1 to 5 where a higher score indicated a less characteristic voice.The subjects listened to 50 connected words, which were prepared from a phonetically balanced 543-word database, spoken individually by 36 elderly male speakers.Each  In order to analyze about the elderly speech, we selected six speakers with non-briskness and six speakers with briskness.

Relationship Between Difference of Spectral Transition Distance and Position of Articulation
We noted that the degree of reduction of the transition distance is different for each vowel.We studied the relationship between the position of articulation and the difference of transition distance using non-briskness speech and briskness speech.

Spectral transition distance
A spectral transition distance is calculated by the following Eq.( 1).

Difference of spectral transition distance
To reveal acoustic feature, we calculated the difference of spectral transition distance.The difference of spectral transition distance is the difference value of spectral transition distance.By calculating it, we reveal vowels of the elderly speech to have non-briskness.
We calculated difference of spectral transition distance by Eq (1).Fig. 3 shows the results of difference of spectral transition distance about the elderly speech by degree of briskness.Fig. 3(a) shows the results of initiate phoneme.We found that the vowels need large movement of mouth and tongue such as /a/, and /i/ has the big distance of transition distance.Probably, the elderly speakers with non-briskness are not able to move their tongues and mouths well.

The Vowels in the F1-F2 Size
We analyzed speech with non-briskness by linear predictive coding (LPC) with focusing F1 and

Shift of F1 and F2 in the Non-briskness Speech
We shifted speeches with non-briskness to make it close to the briskness speeches about F1 and F2.

Method of shift
Fig. 6 shows the process of the shift of F1 and F2.First, the speeches are divided into a spectrum envelope and a harmonic structure by the LPC.
Second, F1 and F2 are calculated from the spectrum envelope.Third, we shift the shift value by F1 and F2.
We explain about the shift value Chp5.2.Finally, the shift speeches are created by synthesizing the shifted formant frequency and a harmonic structure.In this case, synthesizer causes distortion to shift speech by error.

Calculation of the shift value
A shift value was derived by subtracting the mean of the F1 or F2 of the speech with non-briskness from the F1 or F2 on six speeches with briskness.Table .2shows shift value of one non-brisk speaker.Fig. 7 shows that the speeches of before and the after shift.The points of Fig. 7 show the formant frequency.The joints of the vowel are revised by manual operation to get smooth.Compared with Fig.

Verification of Shift Speech by Listening Test
We verified shift speech by listening tests.The method of comparison is based on the paired comparison called Scheffe.

Listening test
In order to verify the effect of improving intelligibility in this shift method, we arranged "before shift speech", "shift speech" and "non-shift speech".The "non-shift speech" is the speech which was synthesized without shifting the F1 and F2.And it was used to confirm the influence of the distortion due to synthesis.We performed listening experiments for the speech of three types.The method of comparison is based on the paired comparison called Scheffe.The evaluation words are 10 and contain a few vowels from the database of 543 words (table.3.).
The subjects were 17 adult males.

Evaluation method
First, subjects listen to two speeches.Two speeches are two of three about processing method at random.Second, subjects evaluate two speeches at five grades about degree of briskness.The evaluation word is intelligibility.Fig. 8 shows about 5 grades of degree of briskness.

Results of listening test
We calculated the mean of preference degrees and the 99% confidence interval using the method of paired comparison of Scheffe.Fig. 9 shows the result of listening experiments.Fig6 (a) shows the comparison between "shift speech" and "non-shift speech".Fig9 (b) shows comparison between "before shift" and "after shift".The top side is good for the former.The bottom side is good for the latter.

Conclusion
In this study, we analyzed the elderly speeches based on acoustic features, and examined the effect of improving speech with non-briskness using the shifted formant frequency.
Analyzing speech with non-briskness and briskness, it was found that there was a relationship between difference of transition distance and position of articulation.It is difficult for the elderly people to utter /a/ and /i/, because they need large movement of mouth and tongue.In addition, the vowels are converged to /u/ about F1 and F2.The shift method of F1 and F2 for speech with non-briskness improved intelligibility in the words with the large movement of mouth and tongue.In future works, we will investigate analysis and how to improve acoustic features of consonants in the elderly speech.
speaker was labeled with the degree of characteristics based on the five-point scale.Each figure of the degree of characteristics of the elderly speech in "non-briskness" is carried as shown in Fig. 1. .A vertical axis expresses the evaluation degree in each impression of characteristics of the elderly speech, and a horizontal axis rearranges each evaluation speaker's value according with the degree of an evaluation.
the result of taking the Inverse Fourier transform (IFT) of the logarithm of the estimated spectrum of a signal.t 1 and t 2 are the time in the feature points of each phoneme.The feature points represent the most characteristic of each phoneme.t 1 is the time of speech with briskness.t 2 is the time of speech with non-briskness.In this chapter, n is 30 to calculate the difference between the feature quantity of speeches.A previous research shows there is relationship between a degree of briskness and spectral transition distance [3].And, there are different features of phoneme in initiate and medial

Figure 1 :
Figure 1: Preference Scores of characteristics of elderly speech

Fig. 2
Fig. 2 shows the results of spectral transition distance by degree of briskness.Fig. 2(a) shows the results of initiate phoneme.Fig. 2(b) shows the results of medial phoneme.Fig. 2 shows vowels of non-briskness are lower.

Figure 3 :
Figure 2: Spectral transition distance by degree of briskness

Fig. 4
Fig. 4 is the relationship between the absolute difference of spectral transition distance and position of articulation.This radius of the circle is determined as the absolute value of the difference of transition distance.Fig. 4(a) shows the results of initiate phoneme.Fig. 4(b) shows the results of medial phoneme.About position of articulation, we referred to previous researches [4].

F2Figure 4 :
Figure 4: Point of articulation and absolute of difference of spectral transition distance

7
(a) and Fig. 7(b) , F1 and F2 are shifted above by our method.
Figure 8: five grades of evaluation about intelligibility Table.1.shows recorded number of elderly people by age.