Low Distortion Target Speech Extraction Method Using Two-Channel Microphone System

Department of Information, Communication and Electronic Engineering, Advanced Course of Electronics and Information Systems Engineering, Department of Human-Oriented Information Systems Engineering, National Institute of Technology, Kumamoto College 2659-2 Suya, Koshi, Kumamoto 861-1102, Japan Department of Information Electronics, Fukuoka Institute of Technology 3-30-1 Wajirohigashi, Higashi-ku, Fukuoka 811-0295, Japan


Introduction
Many noise reduction methods in a real-time processing such as SS (Spectral Subtraction) (1) , SAFIA (sound source Segregation based on estimating incident Angle of each Frequency component of Input signals Acquired by multiple microphones) (2) and MUSIC (MUltiple SIgnal Classification) (3) have been proposed.These methods can estimate the original source signals.However, these algorithms need a lot of data points since the methods are based on the stochastic theory and used the frequency domain information using Fourier transform.
The authors have already proposed some systems to reduce noise from mixture signals observed with two-channel microphone system (4) .These systems are implemented with a microcontroller and can suppress noise in real-time.The systems extract only one target sound, all other sounds was deleted as noise.In actual application, when multiple speakers utter in a noisy environment, the multi-speaker's speech become target sounds.It means that we need multiple speaker's speech extraction methods under a noisy environment.In addition, a simple two-channel microphone system with variable arbitrary directional pattern has proposed by the authors (5) .However, the musical noise occurred in the estimated signal with the method, when the directivity becomes sharp.This paper proposes a low distortion target sound speech signal extraction method.The proposed method makes a null pattern for the noise.The distortion of the estimated signal with the proposed method is low since the influence of the target sound is low.In addition, the method can make several null patterns for a multi-source condition.A typical microphone forms unidirectional directivity pattern.Therefore, when multiple speakers speak at the same time, it is necessary to place as many microphones as the number of speakers.Our method works well even when the number of sound sources is larger than that of microphones.The target speech of one person or plural persons are emphasized by our method.And other speech and noise are reduced.Since the proposed method is a very simple algorithm and the amount of calculation is small, it can be expected to be applied to industries.

Principle of variable directional pattern
where a mn denotes an unknown mixing parameter, N and M denote the number of the sources and the microphones, respectively.
Using two mixture signals observed by two microphones in the case of two speaker's utterances at the same time, a joint distribution is plotted as Fig. 1(a).The horizontal and vertical axises denote the amplitude of x 1 (t) and x 2 (t), respectively.
In the following, we consider the sound signals as stochastic varieties and omit the time series t.
From the Fig. 1(a), it is found that the joint distribution has two linear components.For clarity the linear components, directions ϕ of distribution are calculated as follows.
Using all ϕ, its histogram hist(ϕ) as shown in  Then, it means that the target source signal can be estimated by extracting the straight line.
To extract the target speech using the joint distribution means that to reduce the straight line components generated by the noise, and to extract the straight line components created by the target signals.Based on the fact, the authors have proposed a noise reduction method based on the ratio of observed signals by two microphones (4) .We define the error ε as where Φ denotes the direction of the target source.From the above discussion, Φ is estimated as the mode value (the most frequent value) of ϕ as follows.
In the case which the target source signal is multiple speakers speech, we estimate another peak Φ k again excluded the mode value Φ and neighbor value.
In order to reduce sounds in a direction away from the target source, we use some functions f i (ε) that gain decreases as the ε increases as where α, β, γ and δ denote learning parameters for determining the amount of noise reduction.orthogonal coordinates and polar coordinates of and f 3 (ε), respectively.As a function with multiple directivity pattern, we adopt the Rose curve g l as follows, where θ denotes the angle from 0 to 360 degrees and φ denotes the phase.
Using the noise reduction functions f i (ε) and the Rose curve g l , a variable arbitrary directivity pattern has proposed as follows (5) and some directivity patterns are shown in Fig. 4. From these figures, the noise reducing functions g l f i (ε) mean spatial filters on the ratio of the transfer functions.By the noise reducing function, only the signal having the specific direction according to the transfer functions is extracted.
Signals having other directions of the transfer functions are suppressed.Even when the number of sound sources is larger than the number of observed mixture signals, our proposed method can reduce noise signals.In addition, since the algorithm of our noise reduction method is very simple, the method works well in real-time.However, when the directivity becomes a sharp pattern, a distortion occurs in the estimated speech signal.Therefore, we propose a low distortion target sound speech signal extraction method in the next section.

Variable null pattern
In the previous method, the directivity was directed to the target signal.It means that the spatial filtering was applied around the target signal.Therefore, in order to reduce the influence on the target signal, a null pattern is formed in the noise direction.
The null patterns as shown in Fig. 5 is proposed using Eq (10) as follows.
The source signal is estimated using the null pattern as y = hx m . (11)

Simulation
In order to verify our proposals, several simulations were carried out.The sources were speaker speech data (6) and car noise (7) .Fig. 6  it is conformed that the proposed method works well.

Conclusion
This paper proposes the two-channel microphone system with the variable null pattern.The proposed method can reduce the noise signal by deleting the distribution except a linear component depending on the target speech.The distortion of the estimated signal was lower than that of the previous method.The method can extract the target speech, even when the number of source signals is larger than that of observed mixture signals.Furthermore, the method can estimate multispeakers speech without noise using multi-directional pattern, when multiple speaker's utterance under a noisy environment.
In this paper, we simulated under the condition that the direction of the noise is known.Direction of the noise is expected to be estimated using the histogram.The direction estimation of noise will be examined in the future.

Fig. 6 :
Fig. 6: Simulation results (a) shows these source signals.The signals were sampled at the rate of 8 [kHz] with 16 [bit] resolution.Using these sources, the mixture signals are generated by Eq (1) are shown in (b).The separated signal y(t) using the proposed method is shown in (c).It is found that noise is removed in this waveform.As a results of using 30 mixing patterns (a combination of 6 utterance signals and 5 noises),