A Development of AI Predictive Maintenance System using IoT Sensing

This paper describes the development of a predictive maintenance system for cutting machines. In recent years, IoT and AI systems have been developed actively. As a result, sensors and embedded systems are becoming cheaper. Small and medium-sized companies attempt to use these inexpensive embedded systems for predictive maintenance. Therefore, we are developing the AI predictive maintenance system for these companies. In the system, the cutting sound emitted by a cutting machine is acquired by a sensor and an embedded system. The differences in the sounds are analyzed by AI using MATLAB and TensorFlow to predict the wear and tear of the tip of blade. The system was able to predict the tip wear degree with 90.5% accuracy.


Introduction
In recent years, the IoT has become more widespread, as the price of sensors has been dropped and cloud computing has made it possible to connect to the Internet from any location. Therefore, it has become possible to remotely maintain equipment (1)-(4). . Companies may avoid unexpected equipment shutdowns by repairing equipment after it has broken down. In addition, there is a growing need to increase productivity by performing maintenance in a planned manner.
There are two types of losses occur in factories that use cutting machines. The first is material loss, which is the loss of defective products due to damage caused by the chipping of tip of the blade during machining. The other is the time loss due to shut down the production line. However, the frequency and interval at which tip of the blade is chipped depends on the condition of the metal, the amount of cutting oil, etc., and is not constant. Therefore, to deal with this problem, the approximate intervals between broken of the tip were calculated based on conventional machining experience, and the tips were replaced every few days. However, this method does not prevent defects when the intervals are shorter than expected. In addition, the tips are disposed of when they are not being used optimally, which increases the cost.
In this paper, a system that collects data such as sound is embedded into the cutting machine, and the system learns from the collected big data to predict broken time in advance.

Goal System
Fig.1 shows the goal system organization. The system provided predictive maintenance on a machining center used at Nakano Manufacturing Corporation. Two control media, Spresense and a sequencer, are installed in the machining center: Spresense is mainly responsible for high resolution recording with a microphone and generating log data based on the recording. The information obtained from these two controls media is transferred by the Amadillo IoT gateway and passed to the cloud server, which uses AWS to optimize the management and transfer of information. Based on the collected big data, a predictive maintenance system is constructed by conducting cutting sounds analysis using MATLAB. This paper will mention only the collection of cutting sound and AI analysis. Table 1 shows a list of development environment. The cutting sound of the machine was analyzed using MATLAB (5) , a numerical analysis software that excels at analyzing large data sets and can be deployed in web systems and operational systems. Among them, we used the Signal Processing Toolbox for analyzing signal data such as sound data.

Development Environment
The CNN models were developed using Python, a programming language with a proven track record in a variety of specialized fields including AI development, because of its extensive libraries that facilitate development. The AI system employed Keras, which makes it relatively easy to build models for deep learning and other applications, and the function API makes it possible to build complex models. In addition, the system also employed a deep learning library called Tensorflow. It makes possible to calculate data with GPU which enables high-speed deep learning.
The system composes of Spresense as the embedded board that processes the recording from the microphone, because it can be operated continuously with low power consumption and can perform high-resolution recording. The purpose of the high-resolution recording is to obtain signs of malfunction in a wide range including the inaudible range, and to achieve accurate predictive maintenance. The Spresense was programmed using the Arduino IDE.

A recording System using Spresense
An embedded microcontroller, Spresense, was used to record cutting sounds in high-resolution. Spresense receives input cutting sounds data from an analog MEMS microphone and generates a WAV file based on the sound data. Since it takes about 30 seconds to process one part, we record about 40 seconds before and after the processing. The WAV file is saved in the SD card, and when the next processing is completed, it is saved with a sequential number so that it can be linked to the next analysis.
When developing and verifying the control program, the Spresense have a connection to a PC via a USB cable and used the Arduino IDE to promote the work while verifying the operation on the PC screen.

Logging Data using a Sequence Machine
In order to provide predictive maintenance on cutting machines, a large amount of sound data and machine condition information are required. However, if a human prepares all of this data, a lot of time and effort are required, making it impractical. A system records the start signal of a cutting machine and generates a label that links the machine state to the signal. The possibility of malfunctions due to unexpected problems must be considered in the operation of the system. The system can reset itself against malfunctions by communicating with the sequencer inside the cutting machine.
The sound data logging system is shown in Figure 2. After the cutting sounds are saved, log data is generated from the recorded cutting sounds. To generate the log data, Spresense receives four signals from the sequencer: a start signal, a tracing signal, a chip replacement signal, and a measurement completion signal. Based on this information, Spresense determines the state of the cutting machine and generates the log data. This makes it possible to link the recorded data to the machine status.

Preliminary Analysis of Cutting Sounds
Sound data is represented by a waveform graph with time on the horizontal axis and amplitude on the vertical axis. However, with this waveform graph, it is difficult to grasp the characteristics of the sound, such as what frequencies are included. Therefore, we decided to change the analysis to the frequency analysis in order to grasp the characteristics of the sound. The system analyzed power spectral based on the Fourier transform. The spectrum analysis of the actual recorded data is shown in Figure 3.
In Figure 3, the upper graph shows time on the horizontal axis and amplitude on the vertical axis. On the other hand, the lower graph shows the frequency on the horizontal axis and the power spectrum on the vertical axis. In this sound data, the time period surrounded by light blue squares includes the cutting sound. However, it can be seen from this graph that the machine noise data does not only include the cutting sound, but also various other noises such as the fan of the machine itself, the noise of another machine around it, and the vibration noise are generated regardless of whether the machine is being machined or not. Figure 4 shows the spectrum analysis of the sound data at tip condition of blade. Figure 4 shows the spectrum analysis of the sound data from early to later.

Generating of Spectrogram Images
Since the results of the spectral analysis showed that sound is deeply related to the prediction of the tip failure, we decided to improve the recognition accuracy by generating a spectrogram image. A spectrogram is a three-dimensional representation of the time, intensity, and frequency of a sound by performing a frequency analysis along the time axis and expressing the intensity of the sound at each frequency using colors (6) . By generating this image, it is possible to recognize how the sound is changing in time, and the three elements of timbre, height, and loudness simultaneously from a single image.
In order to generate a spectrogram image, we decided to segment the recorded sound at regular intervals. The reason for this is that the types of sounds included in the sound vary depending on the time interval at which the spectrogram image is generated. Therefore, instead of capturing the features from the entire recorded cutting sounds, we can divide the data into segments at regular intervals and capture the common features from each segment as features to achieve highly accurate recognition. Figure 5 shows the process of segmenting the cutting sound data and generating a spectrogram image. All images generated by the procedure shown in Figure 5 were labeled as early, middle, later, as in

CNN Model for Screening of Tip Condition
In order to create a tip condition screening model, we examined what kind of machine learning model would be best. Machine learning is a technology in which a computer learns from a large amount of data to predict and classify and so on. In this learning process, the computer can automatically find the rules, but it is necessary for humans to clarify what to focus on. However, it is difficult to clarify what the features are when implementing a tip condition screening model from cutting sound data. This is because, as mentioned before, the sound recorded in this paper is not cutting sound data that only contains the processing sound, but it is a sound that contains various ambient sounds such as fans and workers' voices as noise, and it changes easily. Therefore, the system employs deep learning. Deep learning is a learning method that has a deep hierarchical structure and can automatically get broken representations from data. Therefore, it is possible to capture the features without the need for a human to specify what to focus on.
In particular, image recognition is one of the fields in which the accuracy of deep learning has been remarkably improved. The improvement in the accuracy of image recognition is evident in the image recognition competition called ILSVRC (ImageNet Large Scale Visual Recognition Challenge). The error rate has been rapidly decreasing year by year, starting with a difference of about 10% over conventional analysis methods by using deep learning. However, in order to improve the recognition accuracy, it is important to prepare a large number of images for deep learning, as well as to prepare images that are easy to recognize the brokens. In this paper, the system employs spectrograms for image classification in order to utilize deep learning.

Design and Comparison of CNN Models
In general, the deeper the layers in a deep learning design, the more expressive and thus more accurate it becomes (6) . However, this is a prerequisite for maintaining a state where vanishing gradient does not occur. To prevent form vanishing gradient, careful consideration is required in the design of the model.
In the field of image recognition, however, excellent models are released every year through the aforementioned competition called ILSVRC. These models have been optimized based on about one million images. If we design a model based on these models, we can develop a model that shows effectiveness in a shorter period of time. Figure 6 shows a graph comparing these models (7) .
As with the number of layers, it can be seen from all models that as the number of parameters is increased, the accuracy increases because the expressive power increases. However, as the number of parameters increases, the number of computations increases, and thus the training and inference time increases. However, a model called EfficientNet addresses this problem, reducing the number of parameters by a factor of about 1/8 and improving the speed by a factor of 6, while maintaining accuracy comparable to that of conventional models that have achieved excellent accuracy in ILSVRC. Based on this model, we thought it would be possible to develop a model with both high accuracy and execution speed. Fig. 7 shows the layer structure of EfficientNet's B0 model. EfficientNet proposes an optimal scaling method using three parameters: the number of perceptrons per layer, layer depth, and input image. In EfficientNet, the smallest scale is called B0, the largest scale is called B7, and Figugre 6 shows that the accuracy is improved step by step. Figure 7 shows that EfficientNet is designed to take an image of 224×224 pixels as input. As the length of one side of the input image increases, the memory size required for training exponentially increases. The longer the input image, the larger memory size required for training exponentially. Therefore, the system reduces the input image size to EfficientNet to 160 x 160 pixels, and created a layered structure that can output three types of images: early, middle,

Results of precision and recall on the CNN model
The system generates about 10,000 spectrogram images from the data collected by sequence machine. About 8,000 images were used as training data, and the remaining 2,000 images were used as verification data. The reason for separating the training data from the validation data is to prevent over-learning, in which the system over-applies the data used for training and is unable to deal with unknown data.
After training the model on about 8,000 images, the system validated it on about 2,000 images, resulting in an overall correctness rate of 90.5%. Figure 9 shows the confusion matrix of accuracy rate on the model. The accuracy indicates, for example, how many of the images predicted by the model to be later period (e.i. pre-broken ) are really later. Figure 10 shows the confusion matrix of recall rate on the model. the recall rate indicates the percentage of images that the model correctly predicted. Figure 9 and figure 10 shows that the tip condition screening model has a very high accuracy rate in predicting the later condition. On the other hand, although there is no significant drop of less than 80% in the prediction of early and middle tip condition, the correctness rate is lower than that of the prediction of later condition. We believe that this is due to the fact that the boundary between early and middle is not fully learned. Therefore, we believe that further data collection and learning of this data will enable us to develop a tip condition screening model with high accuracy and high reproducibility just later condition of tip.
These results indicate that applying spectrograms to CNN models is effective for machine state classification. The reason why the use of spectrograms is effective for machine state classification is that it can handle multiple types of information. In addition to being able to distinguish the intensity of sound at a specific time and frequency by using color, it was also easy to capture various characteristics such as change and periodicity at close time and frequency.

Conclusions
This paper describes the development of an IoT-based AI predictive maintenance system that uses an embedded microcontroller called Spresense to record the cutting sound of a cutting machine in high-resolution. The high-resolution sound was converted into spectrum images for AI predictive maintenance. The system employs EfficientNet (B4) with FineTuning. As a result, we obtained an average accuracy of 90.5%. Future work includes the development of this predictive maintenance system on the cloud system and an AI predictive maintenance system that includes other data (vibration, temperature, humidity).