A survey on crop image segmentation methods

Nowadays, image processing technology has been applied to all walks of life, and good results have been achieved in the field of agriculture. Image segmentation is the foundation and key of image processing. In order to understand the application status of image segmentation technology in the agricultural field, this article systematically sorts out some mainstream image segmentation methods. First, it introduces segmentation methods based on threshold, clustering, edge, graph theory and superpixel segmentation, and then introduces Segmentation method based on deep learning, and prospects for future research trends. and look forward future trends.


Introduction
At present, with the vigorous development of information technology, the use of image processing technology to process crop diseases and insect pests images can accurately identify the diseases and insect pests.The current image processing technology is mostly divided into three steps: image segmentation, feature extraction, and classification and recognition (see Figure 1), and image segmentation is an important part of the image processing process. Image segmentation is the process of subdividing an image into multiple sub-regions according to certain rules, and it is also the basis for subsequent image processing steps. In the agricultural field, whether crops can be successfully and accurately segmented under a more complex background is the prerequisite for subsequent identification of pests and diseases. Image segmentation technology in the field of image processing has always been a difficult point, and its popularity in the academic world has always been high. This article will review the image segmentation methods applied to crop images in recent years and look forward to future development trends.

Traditional image segmentation method
Early research on image segmentation mostly focused on the surface information of the image. These segmentation methods are relatively simple and have good performance. As a key part of each link of image processing, the use of reasonable methods to segment images can provide a good foundation for subsequent feature extraction and improve the accuracy of classification and recognition. Traditional image segmentation methods mainly include segmentation methods based on threshold, clustering, edge, graph theory, and superpixel segmentation. They have their own advantages and disadvantages for different segmentation tasks, and each method adapts to different scenarios, table1 Introduced the performance of some common classical methods. When aiming at a specific task, it is necessary to select a suitable method to segment the image in order to achieve efficient segmentation. Table 1. Performance comparison of several traditional image segmentation methods in the agricultural field.

Threshold-based segmentation method
Threshold segmentation method [1] divides the gray level of an image according to one or more thresholds, and divides pixels with similar gray values into the same category according to certain rules. This method is a gray image segmentation method .
Threshold segmentation is one of the earliest methods used in image segmentation. As early as 1962, a P-tile method studied by Doyle W [3] is actually a threshold segmentation algorithm. For many years, the threshold segmentation method [4~6] has also been widely used in the image processing field of crop diseases and insect pests.
For the original image, we use ( , ) to represent the image pixel, and ( , ) represents the gray value of the pixel ( , ) . By giving a threshold , the pixels in the image are divided into two types: background and target, and ( , ) is output as the processed Image ℎ( , ): Among them, ℎ( , ) = 0 indicates that the image belongs to the background, and ℎ( , ) = 1 indicates that it belongs to the target.
It can be seen that the size of the selected threshold will affect the results of the threshold segmentation algorithm, so how to choose the most appropriate threshold is the key to the threshold segmentation method. As shown in Figure 2, using the classic image processing data set "Pepper" (Figure 2(a)) collected from the USC-SIPI [2] database, the gray threshold T is selected to be 4, 5, and 6 to compare the same The gray value image is segmented by threshold, and three different segmentation effects are obtained (Figure 2(b)~Figure 2(d)). It can be seen from Figure (2) that as the gray threshold value increases, the target is separated from the background more clearly, and the image color becomes darker.  [2] .

Cluster-based segmentation method
Clustering segmentation method [7~10] is to use clustering algorithm to segment the image. The idea is to divide the pixels with similar characteristics together, and then iterate repeatedly until a certain termination condition is met, and finally all the pixels are divided To different classes, the image segmentation is realized by dividing the image area. At present, the K-means clustering algorithm and the fuzzy C-means algorithm are widely used in the field of crop diseases and insect pests image recognition. The idea of k-means algorithm is: 1) Given a value of , randomly select samples from all samples and initialize them as the cluster center; Canny operator [11] Prewitt operator [12] Sobel operator [13] Roberts operator [14] LOG operator [ One Cut [25] long long long most images 2) Calculate the distance between each sample and the centers, divide it into the clusters represented by the cluster center closest to it according to certain rules, and then recalculate the cluster center of each cluster; 3) Continue the iterative process 2) until the center of each cluster no longer changes.
For image data, using the K-means clustering algorithm can make the pixels with similar colors in the image cluster into one category to complete the image segmentation. The key to the K-means algorithm is how to calculate the distance between each sample point and the center. Generally, the square difference is the most commonly used measurement method, such as equations (2) and (3): Among them, there are a total of n pixels, x (i) is the pixel,i = 1,2, ⋯, n;c (i) is the category most similar to x (i) .
Clustering algorithm is an unsupervised learning algorithm. Compared with traditional segmentation methods, clustering segmentation has greater advantages in processing crop images with complex backgrounds or blurred edges.

Edge-based segmentation method
As the basic feature of the image, the edge reflects the difference in gray value between different regions of the image, and can reflect the sudden change of the image feature at the edge, so it can be used as the basis for image segmentation. Edge detection and segmentation methods can be divided into two categories: serial edge detection and parallel edge detection. In actual research, the parallel edge detection method that uses differential operators to segment is more commonly used. This method has relatively better performance.
There are many edge detection operators that have been researched. According to the needs of the task, choosing different edge detection operators can segment the image more flexibly. Currently, the most widely used edge detection operators include: Canny operator [11] , Prewitt operator [12] , Sobel operator [13] , Roberts operator [14] , LOG operator [15] , etc. Taking Figure 3 as an example, by using different differential operators to segment the R-G image of red tomatoes [16] , five different processing results are obtained (Figure 3(b)~(f)). It can be seen from Figure 3 that different differential operators can better segment the contour of the tomato from the background, which can initially achieve the image segmentation effect. Canny   Fig. 3. R-G image segmentation effect of parallel edge detection method using different differential operators [16] .

Super pixel segmentation method
Super pixel segmentation [17~19] is a kind of over-segmentation, which is to divide the original image into multiple sub-regions according to a certain image feature. The collection of these sub-regions constitutes the original image. The pixels in each sub-region are in a certain feature have a strong tightness, which can make the image maintain a better boundary on the outline. Some traditional superpixel segmentation methods such as Cuts, Graph, etc. tend to have high time complexity. On this basis, some new algorithms such as SLIC, LSC, SEEDS, etc. are compared with traditional algorithms in terms of time complexity. Both have good performance in terms of segmentation quality.
SLIC is a widely used superpixel segmentation method. Its basic idea is to calculate the distance between pixels and generate superpixels through clustering. The specific process of the algorithm is as follows: 1)Map the original RGB image to the Lab color space composed of ( , , ), where represents brightness, is the color range from magenta to green, and is the color range from yellow to blue. Through this mapping, the original image has more color characteristics; 2)For the coordinate ( , ) of each pixel point, combine it with the corresponding color feature ( , , ) to form a new vector ( , , , , ) , and calculate the pixel point , by formula (4) and formula (5). The space distance between and color distance , the formula is as follows: Calculate ' to measure the final distance: Among them, represents the maximum spatial distance within the class, = = ( / ) , represents the total number of pixels, represents the total number of super pixel blocks to be divided, / represents the size of the super pixel block, represents the distance between adjacent seed points; represents the maximum color For distance, takes a constant( ∈ [1,40]).
In summary, the distance measurement formula between two pixels can be expressed as: In the SLIC algorithm, the similarity between pixels is measured by the corresponding ( , , , , ) vector. The distance between the two vectors is calculated to reflect the similarity between the pixels. The smaller the distance, the two corresponding pixels. The more similar the points are, the less similar they are on the contrary. Then, the similarity can be used to cluster the pixels to complete the super pixel segmentation of the image.

Segmentation method based on graph theory
The segmentation method based on graph theory is a top-down global segmentation method, which transforms the segmentation problem into the division of the graph, and completes the image segmentation process through the optimization of the objective function [21] . The main idea of this method is to map the image into a weighted undirected graph , where = ( , ) represents a set of vertices, is a set of edges, each vertex in the graph corresponds to each pixel in the image, each pixel The adjacency relationship between points is represented by the corresponding edges in the figure, and the similarity or difference between each pixel is represented by the weights of the edges in the figure. Currently, more methods are used: Graph Cuts [22] , GrabCut [23] [24], One Cut [25] , etc.
The Graph Cuts algorithm is a more commonly used method. The algorithm was proposed by BoyKov [26] in 2006. In Graph Cuts, Cuts represents such a set of edges, these sets include the above two kinds of edges, the disconnection of all edges in the set will cause the separation of the residual and graphs, so it is called a cut. If a cut has the smallest sum of the ownership values of its edges, then this is a minimum cut. This minimum cut divides the vertices of the graph into two disjoint subsets and , of which ∈ , ∈ and ∪ = , so that It is equivalent to the completion of image segmentation.
In Graph Cuts, is used to represent all pixels, and the point pairs of all adjacent points in are represented by , which specifies a binary vector = ( 1 , ⋯, , ⋯ 1 1 ) , and represents which area the pixel P belongs to. The algorithm is as follows: Among them, ( ) represents the cost of assigning label to the pixel ; , represents the cost of discontinuity between and . When and are more similar, the value becomes larger, otherwise the value tends to 0; ( ) is Objective function, the ultimate goal of Graph Cuts algorithm is to minimize the objective function.
The Graph Cuts algorithm uses both the grayscale information of the image and the region boundary information, and can obtain a better segmentation effect through optimization. However, the algorithm has a large amount of calculation and requires a high degree of similarity within the image.

Image segmentation method based on deep learning
For crop images, in addition to some surface information, they also have rich semantic information. However, traditional image segmentation methods cannot make use of these semantic information and cannot meet actual needs. With the continuous development of deep learning, some progress has been made in introducing it into the field of image processing to segment images. The reasonable application of convolutional neural networks to image segmentation can make full use of the semantic information of the image. In order to realize the semantic segmentation of the image. After recent years of research, a series of image segmentation methods based on deep learning have been proposed, making image segmentation algorithms able to deal with more and more scenes. Among them, the more classic segmentation methods are FCN [27~31] and DeepLab [ 32~35] .

FCN
is an earlier deep learning method applied to image segmentation. The main difference from CNN is that it adopts a full convolution method. After multi-layer convolution processing, the feature image is sampled to realize the deconvolution operation. , And then classify the image through the SoftMax layer.
The FCN method converts the 3 fully connected layers on the VGG 16 [28] network into convolutional layers, removes the softmax layer, and adds a deconvolution layer after the pool3 and pool4 layers, using bilinear upsampling The method converts rough output to dense output. As shown in Figure 4, in the VGG16 model, after multiple convolution operations, the size of the feature map will be much smaller than the input image, and a lot of the underlying image information will be lost. Therefore, the Skip layer method should be used to reduce the upper layer at the shallow level. Sampling step size, fusion of the coarse layer and fine layer, and then up-sampling to get the output, so that the semantic information and position information of the image can be comprehensively utilized. Fig. 4.
As shown in Figure 5, the input image is convolved and pooled multiple times to obtain feature maps of different levels. The Conv7 layer obtained after convolution 7 times is upsampled and then classified and output to obtain the segmentation result of the FCN-32s model; The Pool4 layer obtained after pooling 4 times is fused with the Conv7 layer processed by bilinear interpolation, and the segmentation result of the FCN-16s model is obtained after upsampling; the Pool3 layer obtained after pooling 3 times , Merge with the Pool4 and Conv7 layers processed by bilinear interpolation, and perform classification after upsampling to obtain the segmentation results of the FCN-8s model. By combining the deep data with the shallow information, and then restoring to the output of the original image, a more accurate segmentation result is obtained. According to the different pooling layer used, it is divided into FCN-8s, FCN-16s, and FCN-32s.  [27] . The FCN method can also achieve pixel-level prediction. Because the general convolutional network has a down-sampling process, the size of the output image will be reduced, while FCN will convert the fully connected layers of traditional convolutional networks such as VGG and AlexNet [29] It is a convolutional layer, so that only fine-tuning work is required, which can maximize the use of the pre-trained network and improve training efficiency. However, FCN still has some shortcomings, such as: some detailed structure of the object may be lost during the processing; only a single-scale semantic target can be processed.

DeepLab
The core of the DeepLab model is to use atrous convolution [32] . After processing the image by using the jack in the convolution kernel, a simple scoring map is obtained. After the bilinear interpolation is up-sampling, the CRF is introduced as post-processing. It can effectively combat noise and improve the accuracy of segmentation. At the same time, it can integrate more feature information and take more into account the edge features of the target.
DeepLab-v2 [33] expands the atrous convolutional layer to the atrous spatial pyramid pooling module based on the DeepLab model, cascades multi-scale atrous convolutional layers and performs feature map fusion, and retains the fully connected CRF as post-processing [33] .
DeepLab-v3 [34] is shown in Figure 6, convolutional pooling of the original image, so that the size of the image is reduced, and then through convolution, rectified linear unit, and pooling processing, the image is reduced by 8, 16, After being processed by the Block4 module, it is processed by fusing different porous convolutions, and then integrated with the 1×1 convolution layer and the global pooling layer, so that the feature map is reduced by 16 times, and finally the segmentation map is obtained by classification prediction.
The DeepLab-v3+ model [35] adopts an encoding and decoding structure, as shown in Figure 7. The DeepLab-v3 model uses a spatial pyramid module and an encoder-decoder structure to gradually restore the boundary information of the image through the encoder-decoder architecture. The encoding module uses Resnet101 as the backbone network. After feature extraction, the feature map after ASPP fusion and convolution is used to extract the multi-scale information of the image. The decoding module fuses the encoding module feature map with Resnet101's intermediate information, and then the up-sampled ASPP feature map provides semantic information. Resnet101's intermediate down-sampling provides detailed information, and then convolution and up-sampling operations achieve semantic segmentation results. The DeepLab-v3+ model segmented grape leaves. The segmented images have good segmentation effects in each scene, and the target edge contours are clear, indicating that the model can segment grape leaf images. Fig. 6. Structure diagram of FCN [34] .

Conclusions
In the future, crop image segmentation will be a research hotspot and difficulty in the field of image processing. This is due to the complexity of the segmentation problem itself. At the same time, image segmentation is a crucial step in the image recognition process. Only the problem of segmentation can be solved. In order to better complete image feature extraction and classification recognition. Crop image segmentation in complex environments has always been a research difficulty in the field and needs to be studied in the future; in addition, the segmentation of crop diseases and insect pests images is still limited to the laboratory and cannot be effectively applied to actual production. In the future, we need to think about how to apply the research results to practice; at the same time, because the segmentation work has high requirements for the original image in terms of clarity, resolution, and background complexity, it is necessary to be able to collect sufficiently high-quality images. Analyze; finally, the current image segmentation methods are not universal. Most methods are only studied for a certain crop, and they may not have good enough effects for other crops. This is also the work of researchers in the future. Need to study and think carefully.