Distinction Method between Expiratory and Inspiratory Sounds Using Biological Sound Sensor

The number of deaths from COPD in 2019 is about 6% of all deaths worldwide. The prevalence of COPD is expected to increase worldwide. Pulmonary function testing, so called spirometry is used in the diagnosis and severity assessment of COPD. There is a simple test method using expiratory and inspiratory time. Systems to measure expiratory and inspiratory time do not proposed. In this study, we propose a novel distinction method between expiratory and inspiratory sounds using biological sound sensors. The biological sound sensor consists of two units: holding and sensor units. The former fixes the sensor unit. The latter obtains biological sounds and adopts a polyurethane elastomer to match the acoustic impedance. The respiratory sounds are extracted by applying a bandpass filter to the biological sounds. Furthermore, Harmonic/Percussive Sound Separation is applied to the respiratory sounds to reduce the residual vascular sounds. The classifier between expiratory and inspiratory sounds is built with a soft margin Support Vector Machine. The feature is the power spectrum extracted from the spectrogram of respiratory sound. The classifier was built for each subject from the two respiration patterns. The proposed method was verified by the accuracy, precision, recall, and F-score. The obtained distinction accuracy was up to 86.8%, and it was possible to distinguish between expiratory and inspiratory sounds with high accuracy.


Introduction
Approximately 6% of the total 55.4 million deaths worldwide in 2019 were due to chronic obstructive pulmonary disease (COPD) (1) . COPD is caused by inflammation of the lungs due to prolonged inhalation of toxins such as cigarette smoke, resulting in symptoms such as dyspnea on exertion (2) . The prevalence of COPD is expected to increase worldwide (3,4) . Currently, Pulmonary function testing, so called spirometry is used in the diagnosis and severity assessment of COPD. For spirometry, an expensive device called spirometer is used. The spirometry is difficult to perform at non-medical institutions because it requires lung volume measurement. Thus, there is a simple test method using expiratory and inspiratory time. Expiratory time in COPD patients is extended over inspiratory time due to bronchial narrowing caused by airflow obstruction (5) . From this symptom, respiratory function status can be evaluated by the difference between expiratory and inspiratory time. This test method does not require expensive equipment such as the spirometer. It is also easy to diagnose non-medical institutions. In our laboratory, a sensor system for obtaining pulse and respiratory sound signal (RSS) using an electret condenser microphone was proposed (6,7) . The system can obtain respiratory sounds non-medical institutions. In this study, we propose a system that uses machine learning to distinguish between exhalation and inhalation sounds in order to measure the time length of the two sounds. The respiratory sounds are obtained by the biological sound sensor. There is a difference in the frequency components of expiratory and inspiratory sounds in the human auditory range. For this reason, the power spectrum of expiratory and inspiratory sounds is used as a feature for machine learning.
This paper is organized as follows: In Chapter 1, the background and motivation of the study are introduced. Chapter 2 describes the biological sound sensor to be used. Chapter 3 explains the distinction method between expiratory and inspiratory sounds using biological sound sensor. Chapter 4 describes the experimental methodology and evaluation functions. Chapter 5 concludes this study.

Biological Sound Sensor
Biological Sounds are generated from the body, and are transmitted through to the body surface. For example, vascular sounds, respiratory sounds, and swallowing sounds. In this chapter, respiratory sounds and biological sound sensors are described.

Respiratory Sound
The respiratory sounds are the sounds produced by the turbulence and eddy of gas flowing in the airways during external respiration (8) . External respiration refers to the exchange of gas between air and blood, by expansion and contraction of the lungs with the vertical movement of the diaphragm (9) . External respiration is of two kinds: inhalation and exhalation. Inhalation has the function of drawing air into the lungs and supplying oxygen to the blood vessels. Exhalation has the function of expelling carbon dioxide-rich air from the body. The frequency component of respiratory sounds, including inhalation and exhalation, is in the range of 200 ~ 2000 Hz. However, the high-frequency component decreases as it passes through the body. For example, the frequency component of respiratory sounds is about 200 ~ 700 Hz in the chest region (10) .

Biological Sound Sensor
In this study, we use a biological sound sensor to obtain the biological sounds. The sensor consists of two units: the sensor unit and the holding unit. The sensor unit is composed of an electret condenser microphone (EM-289 made by Primo Corporation, hereinafter called "ECM") and a polyurethane elastomer (HITOHADA GEL hardness of Asker C 15 made by Exseal Corporation) in the light-curing resin case. The appearance and structure of the sensor unit is shown in Figs. 1a and 1b. The sensor unit adopts the ECM with an exposed diaphragm. The polyurethane elastomer serves to match the acoustic impedance between the sensor and the skin (11) . It also serves to protect the ECM from oily stains on the skin surface. The light-curing resin case protects the sensor and holds the ECM. It also serves to reduce fricative noise and contact noise. The holding unit attaches from back of user's head and serves to fix the sensor unit. The material of holding unit is elastic plastic. Fig. 1c shows the appearance of the holding unit with the sensor unit attached.

Outline
In this study, a distinction method between expiratory and inspiratory sounds is proposed based on the variation of power spectrum. First, the biological sound obtained by the sensor is applied a bandpass filter to extract the respiratory sound. Nevertheless, the noise of vascular sounds remains in the respiratory sounds after filtering. We use Harmonic/Percussive Sound Separation (HPSS) to remove the remaining noise in the respiratory sound. Machine learning is used to build a classifier from respiratory sounds with noise removed. The power spectrum of respiratory sounds is used as feature. Additionally, to prevent underfitting by individual differences in feature, classifiers are built for each subject.  (12,13) . The RSS is extracted by applying an IIR bandpass filter based on the frequency difference. The filter is designed to provide a sharp frequency response with low order. Fig. 3 shows the VSS extracted by the filter. The RSS is 60 dB smaller than the VSS, so the noise of VSS remains in the RSS after filtering. Thus, the filter only is not sufficient to extract the RSS.

Harmonic/Percussive Sound Separation (HPSS)
HPSS is a simple algorithm and effective method for separating the harmonic and percussive components of a monaural audio signals (14) . The harmonic components form horizontal ridges on the spectrogram. In contrast, the percussive components form vertical ridges of the broadband frequency response. In respiratory sounds, the harmonic component corresponds to RSS and the percussive component corresponds to VSS. This method applies median filtering to the spectrogram of an audio signal. To extract the RSS, a mask is created from the filtered spectrogram and applied to the original spectrogram. Fig. 4 shows the results of RSS obtained by using HPSS. The RSS is applied Short-Time Fourier Transform (STFT) to compute the spectrogram. The spectrogram of RSS is shown in Fig. 5. A spectrogram expresses an audio signal in terms of time, frequency, and power spectrum.

Support Vector Machine (SVM)
SVM is a supervised learning algorithm designed to solve binary classification problems (15) . The optimal boundary for classifying two classes is called the hyperplane. The goal of the binary classification problem is to find the hyperplane. The training data closest to the hyperplane is defined the support vector. In addition, the distance to the support vector is defined the margin. SVM has good generalization performance by determining the hyperplane with the largest margin. In this paper, expiratory and inspiratory sounds are regarded as two classes for machine learning. The training data is the power spectrum Here, w is the normal vector of the hyperplane. x in the input vector. b is the linear coefficient of the hyperplane. ξ is a slack variable. C is the regularization parameter controlling the penalty for misclassification.

Methodology
Biological sound was obtained by attaching the sensor unit to the mastoid process as shown in Fig. 6. The subjects were five males in their 20s who were instructed to remain seated and not to speak. The experiment conducted two oneminute biological sound measurements, each of two different respiratory patterns. Pattern I is normal respiratory, and pattern II assumed random breathing is two inhalations and two exhalations. The first biological sound of both patterns were used as training data to build the classifiers. The second two biological sounds were used as test data for classifier validation.
Respiratory sounds were extracted from biological sounds by the signal processing described above. Table 1 shows the experimental conditions. The IIR filter order was set to 12th order to reduce VSS included below 100 Hz. The obtained respiratory sounds were manually separated into expiratory and inspiratory sounds. The separated expiratory and inspiratory sounds were normalized and labeled. SVM was trained using a linear kernel with C = 0.1. The feature dimension was 46. To evaluate the classification accuracy, the classifier was applied to distinguish the test data. If more than 50% of the frames in the respiratory sound data were classified as exhalations, the classification result was defined as exhalation. Likewise, if the inhalations were more than 50 %, the classification result was defined as inhalation. The classification results were compared with the correct labels.
The evaluation functions were accuracy, precision, recall, and F-score. The classification results of the learned model for a binary classification problem are represented as a confusion matrix. The confusion matrix is represented by four values, true positive (TP), true negative (TN), false positive (FP), and false negative (FN) (16) . In this experiment, we define exhalation as positive and inhalation as negative. FP is the inspiratory sound misclassified as the expiratory sound. FN is the inspiratory sound misclassified as the expiratory sound. The formulas for Accuracy, Precision, Recall, and F-score are shown in Equations (3) to (6). Precision and Recall are trade-offs. F-score is the harmonic mean of Precision and Recall.
Precision is calculated from TP and FP. This evaluation function increases as the number of inspiratory

Results and Discussion
The results for each subject in pattern I and pattern II as well as their weighted averages (WA) are shown in Tables  2 and 3. Table 4 shows WA of the two respiratory patterns for each subject. Subject A had about 80% Accuracy in both patterns I and II. WA of pattern I was 62.0%, almost the same value as pattern II. In pattern I for subject B, Precision was 83.3% and Recall was 88.2%. The results show that the classifier correctly distinguishes between expiratory and inspiratory sounds in normal respiration of subject B. On the other hand, Accuracy of subject B was 14.5% in pattern II. The Precision and Recall were similarly low. In contrast, in subjects C and E, respiratory pattern II was distinguished more accurately than pattern I. These results indicate that a classifier built from two respiratory patterns is at risk of specializing in distinguishing one respiratory pattern from the other. Two factors are considered to be the main causes of biased distinction. The first factor is that the power spectrum distribution of the respiratory sound is different between the two respiratory patterns. The second factor is the high dimension used in SVM. The hyperplane is determined using the features of the respiratory sound. Overfitting one respiratory pattern caused the other respiratory pattern to become outliers. As described in Chapter 3, expiratory and inspiratory sounds are partially similar in the power spectrum. In training with the high dimension, the hyperplanes were incorrectly determined by similar frequency bands of respirations. Therefore, high accuracy distinction can be expected in learning each respiratory patterns and by using new features that represent respiratory sounds.

Conclusions
This paper proposes a method for the distinction between expiratory and inspiratory sounds using biological sound sensor. Machine learning such as SVM using the power spectrum of respiratory sounds as a feature. The proposed method can distinguish with a maximum accuracy of 86.8%. The proposed method is expected to be applied to a system that can evaluate respiratory function based on the time length of expiratory and inspiratory sounds from biological sounds.