Driving Warning System in Edge Computing Environment

With the steady development of the social economy, the number of motor vehicles has increased rapidly. The number of traffic accidents continues to rise, and fatigue driving is one of the key factors leading to frequent traffic accidents. The Internet of Things (IoT) technology based on cloud computing is receiving more and more attention, and the requirement of device for edge of IoT also has been increased. The amount of data generated by devices has exploded, which has brought a higher demand for data transmission bandwidth. Therefore, this paper proposes an edge intelligent d riving warning system based on Gradient Boosting Decision Tree (GBDT), YOLOv4, and Deep Simple Online And Realtime Tracking (Deep SORT) algorithms. The system does not need to transmit data to the cloud center for processing, and realizes dual monitoring inside and outside the vehicle locally, effectively alleviating the pressure on transmission bandwidth. The result performed that the edge system can detect driver fatigue status and object categories outside the vehicle in real time, which has a positive effect on reducing traffic accidents and developing intelligent transportation.


Introduction
With the consistent development of social economy, the number o f vehicles in China has increased year by year, which brings convenience to people and also increases the traffic pressure and traffic safety accident [1]. According to the statistical analysis of traffic accidents, 80%~90% of traffic accidents are caused by human factors [2]. A mong the four elements of traffic management, the driver is the most active element. Drivers are prone to traffic accidents due to inattention, excessive fatigue, emot ional instability and so on [3]. In recent years, machine vision technology has been gradually applied to intelligent transportation systems. Object detection technology can help analyze the impact of other vehicles on its own vehicle, and play a very good role in assisting the driver. The centralized data processed model represented by cloud computing has a large overhead in terms of resource requirements and is overly dependent on the network bandwidth of the cloud computing center [4]. The increasing nu mber of devices for edge of the Internet of Things (IoT) has led to an explosive growth in data volu me. Cloud computing is difficu lt to meet its transmission requirements, and edge computing emerges at the historic mo ment. The deep learning algorith m in mach ine vision shows good superiority in co mplex traffic scenarios, but it requires higher hard ware performance of edge devices, so it is particularly important to establish a lightweight edge intelligent driving warning system.

Related Work
Many countries at home and abroad attach great importance to the application prospects of vehicle early warning systems in the prevention of traffic accidents, and conduct research on fatigue detection and vehicle detection.
At present, fatigue detection is mainly d ivided into two categories: one is based on the physiological and psychological characteristics of the driver, and the other is based on computer v ision technology. The former requires the corresponding sensor to directly contact the driver's body, which is likely to cause discomfort and affect the operation. At the same time, its signal is also susceptible to noise interference. Xing et al. designed and implemented a Bluetooth wearable device based on the STM8 chip to detect fatigue by collecting and displaying hu man physiological signals in real t ime [5]. The latter is main ly quantified the network parameters in the YOLOv3 -tiny network, reduced the computational co mplexity, and proposed a real-time vehicle detection system suitable for embedded devices [12]. The above fatigue detection methods generally only take a single face part to detect fat igue state, which is not comprehensive in mu lti factor representation of face feature detection. The driving environment outside the car is complex and changeable, and the object occlusion and overlap will cause problems such as object loss. The above-mentioned vehicle detection method does not adapt to these changes well, and it is not deployed at the edge. Therefore, this paper proposes a method of fatigue detection based on fusion of mu lti features of hu man face, and uses the algorithm co mbined with YOLOv 4 and Deep Simp le On line And Realtime Tracking (Deep SORT) to detect the object. It realizes the dual monitoring inside and outside the vehicle at the edge, which p lays a positive role in preventing traffic accidents.

Proposed Work
The overall structure of the driving warn ing system in this paper is shown in Fig. 1. It main ly includes the following two parts: driver fatigue detection, object detection and tracking outside the vehicle.

Multi-factor fatigue state detection based on key points of human face
The extract ion of key points of hu man face is based on the Gradient Boosting Decision Tree (GBDT) algorith m. The GBDT algorith m builds a cascaded residual regression tree to return the face shape from the current shape to the real shape step by step, and finally integrates the results of all residual trees to obtain the key point positions [13].The When the human eye is closed, the EAR drops rapidly, and theoretically approaches 0. In order to distinguish between sleepy eyes closed and normal blinking and other similar actions, this paper adopts the Percentage of Eyelid Closure over the Pupil over Time (PERCLOS) [15] as the physical quantity to determine the driver's sleepy eyes closed. PERCLOS has been repeatedly tested and demonstrated. Its meaning is the ratio of the time that the eyes are closed more than 80% per unit time (1 minute). The PERCLOS formula is as follows: In the above formula, P is the reference point in Fig. 2.
When the mouth is closed, MAR is close to 0, and when the driver yawns, MAR rises rapidly. Here, the PERCLOS formula is borrowed to calculate the rat io of the t ime the driver yawns per unit time (1 minute) as the physical quantity to determine the driver yawn.
In the above formulas, When the driver nodded sleepily, the pitch angle changes most obviously, so the pitch angle is used as an index to determine the sleepy nod. Here, the PERCLOS formula is borrowed to calculate the ratio of the time occupied by the driver's sleepy nodding per unit time (1 minute) as the physical quantity to determine the driver's sleepy nodding.
(c) Comprehensive determination of fatigue driving In order to accurately and timely detect whether the driver is in a fatigued state, the algorithm fuses facial features such as eyes, mouth, and head posture, and uses the ratio of the length of time occupied by closed eyes,  The trained SVM model can be used to determine the driver's fatigue state.

Object detection and tracking based on YOLOv4 and Deep SORT
You Only Look Once (YOLO) is currently one of the most advanced real-time object detection systems widely used, especially in autonomous driving [16]. Based on the original object detection architecture of YOLOv 3, the YOLOv4 algorith m introduces some optimization methods fro m data processing, backbone network, network train ing, activation function, loss function, etc., and the model achieves better detection speed and accuracy [17].
In the object tracking system, the SORT algorith m is a very fast and accurate basic algorith m, including the commonly used Kalman filter and Hungarian Algorithm [18]. Kalman filter can predict the current position of the object based on the position of the object at the previous mo ment, and can estimate the object's position mo re accurately than the object detector. The SORT algorithm matches the Bounding Bo x (BBo x) predicted by the Kalman filter with the BBo x for object detection, and selects the appropriate BBo x for object tracking at the next mo ment. When an object is occluded for a long time, the uncertainty of Kalman filter p rediction will greatly increase, and the accuracy will decrease in continuous prediction.
During the tracking process, occlusion occurs between vehicles. When the vehicle is tracked again, a new ID will be obtained. Deep SORT uses Kalman filter and Hungarian algorith m, and adds a cascade matching strategy to give priority to objects that appear more frequently. Experiments have proved that the Deep SORT algorith m reduces 45% of the ID switches in the SORT algorith m, effectively improving the problem of object loss in the presence of occlusion, and has reached a good level in high-rate video streams [19].
In a real-time object detection system, object detection tends to complete the positioning and recognition of an object in a single frame in a scene. All p ixels in the image need to be processed, and the amount of calculation will be very large. Object tracking is not a static detection of the position of the object in each frame, but a kind of dynamic detection. It tries to predict the motion trajectory of the object. It only needs to detect the pixels near the predicted position of the picture. At the same time, it doesn't care about the category of the tracking object, only the movement characteristics of the object. This paper combines the object detection results of YOLOv4 with the Deep SORT algorith m, and the specific process is shown in Fig. 3. YOLOv 4 detects the object position of this frame and cascades it with the position predicted by the Kalman filter in the Deep SORT algorith m. During the matching process, the occlusion time is layered. The shorter the occlusion time, the higher the level, and the easier it is to be matched. The Hungarian algorithm is used to obtain the unique match with the largest IoU, and the matching pairs whose matching value is less than the threshold are deleted. Finally, use the matched BBo x of this frame to update the Kalman filter.

Experiment And Discussion
In order to verify the feasibility of the edge system, a large nu mber of experiments and evaluations have been carried out. The computer processor used in this experiment is Intel core i7-9750H, the graphics card is NVIDIA GTX1660Ti, the memory capacity is 16G, the hard disk capacity is 512G, the operating system and environment are 42 Fig. 4. The impact of MAR threshold and PERCLOS threshold on detection accuracy Windows10, the framework used is Tensorflow, and the writing language is Python.

Fatigue testing
(a) Threshold setting For vision-based methods without wearable devices, PERCLOS is still considered to be the most effective measurement parameter to determine driver fatigue. The EA R threshold corresponding to eye closure exceeding 80% is 0.1, and the PERCLOS threshold for d ividing the eye fatigue state is 0.2, that is, when the percentage of the number of frames with EA R less than 0.1 to the total number of frames per unit time exceeds 0.2, it is determined that the driver is fatigued [6]. In order to obtain the influence of the setting of the MAR threshold of the mouth and its corresponding time ratio threshold on the accuracy of yawn detection, and the influence of the setting of the pitch angle threshold and its corresponding time rat io threshold on the detection accuracy of doze nodding, we extract a video of fat igue driving rando mly. In the experiment, according to the calculation of MAR, pitch angle, yawn ing time [20], etc. in d ifferent states of mu ltip le people, it is found that the MAR value is between 0~0.2 under normal conditions, and the MAR value is generally greater than 0.4 when yawning. In order to obtain the optimal threshold, set the initial range of MAR to be 0.3~0.6, and the interval to be 0.05. Similarly, set the init ial range of pitch angle to be 10°40°, and the interval to be 5°.
The PERCLOS threshold of eye fatigue is borro wed to set the initial range of PERCLOS of the two. With 0.2 as the median value, set the initial range to [0.1, 0.2, 0.3].
Taking the MAR threshold of the mouth as an examp le, the accuracy is the ratio o f the nu mber of detected yawns to the labeled data. As shown in Fig. 4, the detection accuracy is different under different MA R threshold and PERCLOS value settings, so this paper chooses the value under the highest accuracy rate, that is, the MAR threshold is 0.4, and the corresponding PERCLOS threshold is 0.3. YawDD [ 7] is a public yawn detection dataset, records three or four videos for each driver, including different mouth movements, and it can be used to verify face detection, face feature extraction, and yawn Detection and other algorith ms.
In this paper, the YawDD video dataset is marked whether yawning, and the selected MAR threshold and PERCLOS threshold are used to test the effectiveness of the algorithm in detecting yawning. Comparing the algorithm in this paper with different documents, the detection accuracy of the driver's yawn is shown in Table 1. A ll algorith ms are tested on the dataset YawDD. This paper has obtained an accuracy of 93.1% on yawn detection, it is a certain improvement compared to the detection method using subtle facial action recognition [7] and the detection method using CNN [8]. Using the same method, the pitch angle threshold is 25° at the highest accuracy rate, and the corresponding PERCLOS threshold is 0.2.
(b) Fatigue driving detection model based on support vector machine Create a corresponding label according to the driver's normal or fat igue state, cons truct positive and negative samples, and then train the SVM classifier until it converges. The training of the fatigue driving detection model is comp leted. The input is the feature vector composed of the ratio of closed eyes, yawning, sleepy and closed eyes, and the output is whether the driver is fatigued or not. The NTHU-DDD dataset [21] includes video data of 36 subjects of different races under simulated driving scenes. The total duration is about 9.5 h, including normal driving under different lighting conditions, yawning, speaking, slow blinking, frequent nodding, etc. In this paper, the NTHU-DDD video dataset is cropped out of the same duration as the fatigue driving video dataset, and the SVM  At present, fatigue detection algorith ms based on facial feature points generally only consider eye and mouth features. The algorithm in this paper integrates the driver's head posture, and it is beneficial to further improve the accuracy of fatigue detection. As shown in Table 2, the detection accuracy of fatigue driving is improved by 1.4% compared with the detection method using lightweight AlexNet classification [9].

Object detection
In this paper, considering the specific driving environment, six object detection categories are trained, namely pedestrians, bicycles, cars, motorcycles, buses, and trucks. In order to verify the effectiveness of the Deep SORT algorithm, we performed real-t ime object detection at intersections with more co mp lex traffic conditions. The system used the YOLOv4 object detection architecture, and compared the difference between the SORT algorithm and the Deep SORT algorithm in object tracking effects. respectively. The ID switches corresponding to the two algorithms are compared in Table 3.
It can be seen from Tab le 3 that in the whole real-time detection video, the ID switches corresponding to the deep sort algorith m are reduced by 18.1% and 16.8% respectively compared with the sort algorithm fo r the object category 'car' and 'truck'. It shows that the Deep SORT algorith m is superior to the SORT algorith m in the detection and tracking of overlapping or occluded objects, and it has more advantages in solving the problem of object loss.

Edge deployment
The vehicle-mounted device selected in this paper is the GPU edge computing device Jetson Xavier NX launched by NVIDIA. It is small in size and can provide higher operating performance for the edge system. With accelerated computing power of up to 21 Tera Operations Per Second (TOPS), it is possible to run modern neural networks in parallel and process data from mu ltip le high-resolution sensors. The operating system of Jetson Xavier NX is an ARM-based Ubuntu 18.04 system. The physical operation status is shown in Fig. 7. The SVM classifier in the edge system determined that the driver was fatigued and displayed the red letter "sleep!!" to warn the driver. Real-time operation of the object detection system has certain co mputing power requirements, so it is necessary to verify whether the system can run in real-time on the device. As shown in Fig. 8, it has been verified that the object detection system based on YOLOv 4 and Deep SORT in this paper can achieve an FPS of more than 62 when running on a computer with a GTX1660Ti GPU. The FPS running on the edge device Jetson Xavier NX can reach 20, and it can detect the object category smoothly. As shown in Table 4, co mpared to the detection method using YOLOv4 and Camshift algorith m [10], the FPS of the edge system has been increased by 4, and it further meets the real-time requirements of vehicle-mounted edge devices.

Conclusions
This paper proposes a driv ing warning system at the edge. The system eliminates the need to transmit image data to cloud and then receive the returned results, effectively allev iates the pressure of cloud data transmission. Based on the GBDT algorith m, the system detects the driver's fatigue status in real time through the key points of the face, and the detection accuracy rate is increased by 1.4%. At the same time, it integrates the YOLOv4 and Deep SORT algorith ms to achieve object detection and tracking.
Co mpared with the SORT algorith m, ID switches are reduced by 18.1%. The system FPS has imp roved, and it is beneficial to assist drivers in preventing traffic accidents.