Image Analysis and Clustering for Defect Inspection using Sparse Representation and Affinity Propagation

This paper proposed a big data analysis system for defective images which can be extended to the applications of automated visual inspection and intelligent manufacturing processes diagnosis. As a statistical machine learning approach, the first step of proposed system is to collect adequate number of training samples, including defective images and false positives, from the production lines. Then, the sparse representation technique is applied to restore the images into better visual quality and generate a customized golden sample with a dictionary learning process. The differential image is estimated directly from the original image and its restored image. Finally, an unsupervised learning method based on affinity propagation algorithm is adopted to categorize the samples into few representative clusters. The proposed system can output the exemplars of the major clusters to assist the engineers or operators in production lines to adjust the manufacturing process or repair the equipment. A real-case of quartz blanks inspection is shown in the experimental results to prove the effectiveness of our proposed methods. Several common but imperceptible defects can be identified for further analysis.


Introduction
Automated visual inspection (AVI) is an image processing method which has been widely applied in the modern production lines of manufacturing industries, usually is used for products or semi-finished products.AVI has been developed for decades with the goal of inspection automation.Early works adopted computer vision and pattern recognition techniques to improve the accuracy and efficiency of AVI systems [1,2,3].Besides, in order to extend the range of applications, some researches focus on developing AVI systems with multiple scales for micro lenses or micro electromechanical systems [4].
However, few studies have reported on the further analysis and reasoned the cause of the defects directly in image domain to improve the production line efficiency.After collecting an adequate number of images of defective products, an intelligent image understanding system based on statistical learning algorithms is required for manufacturing process verification.
In this paper, we present a defective images analysis and reasoning system which can improve the similarity measurements in pattern matching process and increase the applications of automated visual inspection.The proposed system can be divided into two major steps: image restoration based on sparse representation, and defective images clustering by affinity propagation.
The image restoration process aims to enhance the quality of inspected images.The restored images can dramatically improve the performance of similarity measurements in pattern matching inspection.The differential image, which can be calculated from the original defective image and its corresponding restored image, is the input of affinity propagation clustering system.With the similarity matrix estimation, the major categories, which have the most samples in affinity propagation clustering, have the most important exemplars that may describe several important defect types.The proposed method utilize novel computer vision, digital image processing, and unsurprised learning algorithms to solve a practical engineering problems in automated visual inspection system.

Proposed Methods
The proposed defective image analysis system includes DOI: 10.12792/iciae2016.060two major steps.First, the image restoration process based on sparse representation is used to improve the image quality and estimate the differential images.Then, a distance matrix which records the similarity between pairs of samples is calculated for affinity propagation clustering.

Problem Definition and Learning Samples
This paper focuses on developing a defective images analysis system from the visual inspection system for precise manufacturing industries.The inspection tasks for the electronic components in T Corp. are achieved by an AVI system based on SAD localization and rule-based identification.A large number of learning samples could be collected from the non-stop production lines.
Besides the defect clustering and analysis, to decrease the false positive rate of visual inspection system is another challenging problem in practical production lines.Fig. 1 shows several non-qualified can be classified into true defective products and false positives.The false positives are the decisions which falsely reject the normal products.

Sparse Representation Algorithms
Sparse coding algorithm [5][6][7][8][9][10] express the data as a linear combination of pre-learned basis signals, which are defined as the representative vocabularies in the dictionary learning.The basic concept of sparse representation is to represent a signal or image by a few linear components, and the components are the vocabularies of the dictionary that learned from training samples.The dictionary is the analysis result of the features of the input data which are transformed to the sparsity space.
Let x in Rm be a input signal and D = [d1, d2,…, dp] ∈ Rmxp is the dictionary which consists of a set of normalized basis vector (as called vocabulary), where m is the pixel number of the training patterns and p represents the number of vocabularies in the dictionary.If x can be approximately represented by a few elements from the dictionary D, we can say that there exist a sparse representation α ∈ Rp such that x ≈ Dα.The sparse decomposition problem can be viewed as: where λφ(α) is a sparsity-inducing regularization term.The construction of dictionary, which aims at learning compact representations of inputs, is a critical step in spares representation.In order to restrict the size of D, the convex set of matrices C is introduced as the following constraint.The process of dictionary learning can be illustrated as follows: C is defined as the convex set of matrices.

Inspected Image restoration
The inspected image of the AVI system is usually captured by CCD camera, and the image contents will be processed by the matching algorithms.However, sometimes the captured images will be unclear due to the high-speed transportation or the limitation of the image capture equipment.In order to avoid the blurred image affect the inspection result, image enhancement techniques are suggested to improve the image quality to make them more suitable for the display or further research.Image restoration is one of the branches in image enhancement which aims at recovering the noisy or blurred image to a clear version.A degraded image y can be represented as a clear image x plus the degradations w. y = x + w (4) In order to recover the original image, we adopted the inspection images, including accepted products and defective products, to construct the dictionary which is described in the previous section.After the generation of the dictionary, each sparse representation can be written as: Then, the closed form solution from the optimization process, which based on the optimal a ̂ij can be written as: Any input inspection image of the AVI system can be transform to a clear and noiseless image based on the sparse representation technique.The transformation can reduce the false alarm or misjudgment caused by the degradations or the limitations of the CCD camera.
By the constructed dictionary, every image can be represented as a linear combination; therefore, we can use the weight a ̂ij of the vocabularies for further studies.In order to extract the precise information from the vocabulary weights, unsupervised learning methods are adopted to find out the characteristic of diverse linear combination.

Machine Learning and Affinity propagation Algorithm
Affinity propagation (AP) algorithm is adopted as an unsupervised learning method for image classification, which is proposed by Frey and Dueck [11] in 2007.The main concept is to find out the most representative exemplars from all data points and classified the data points into several groups.The similarity between the samples is served as the input of the algorithm, and a message network is generated based on the similarity relationships.After the message exchange and network convergence, the exemplars can be identified.
In our work, the input data is defined as the difference between the original image   and the restored image  ̂: = ‖  −  ̂‖ (7) After the difference calculation, the similarity relationship s(i, k) is defined as the negative Euclidean distance between the data points   and   , which is written as: s(i, k) = −‖  −   ‖ 2 (8) In the process of affinity propagation algorithm, two messages are passing between the data points to update the structure of relationship network."Availability", a(i, k) , means the message send from representative exemplar k to data point i.This value represents the accumulative fitness when data point i choose point k as its exemplar, and its default value is zero."Responsibility", r(i, k), means the message send from data point i to representative exemplar k.This value represents the degree of data point i supports point k to be the exemplar, and the default value is s(i, k).The two messages pass among the network and are updated by the following equations, until the network converges.Several representative exemplars will be selected at the end of the updating process.Every data point   and its corresponding exemplar   will be strongly connected by a(i, k) and r(i, k).The relationship between the data points can help to generate classification results.

Experimental Results
In this chapter, several experiments are designed to verify the effectiveness of the proposed learning methods.First, a higher quality image is recovered from the input image of the AVI system to enhance inspection accuracy.
The second experiment illustrates the results of unsupervised learning technique and affinity propagation algorithm.The differential images of defect products can be categorized into four groups, which means there are four major defect types.

Image restoration
To avoid the misjudgments caused by the input image with degradations, we adopted spare representation to recover the images to higher quality which is more suitable for the subsequent image processing process.The training data for dictionary construction include inspection images of accepted products and defective products.Total 46800 images are adopted in the process of dictionary learning and 400 representative vocabularies are extracted after 1000 iterations.The size of each vocabulary image is 25x25 pixels.

Fig. 3. Inspected images and restored images of defective products
Based on the dictionary as Fig. 2, each input image can be represented as a linear combination of the vocabularies.Fig. 3 shows the restoration result of the same product with different noises or defects.The left image is the original input images.The defective regions are extracted and enlarged as the middle image and the right image is the restored images based on sparse representation.However, some results are not as expected.Several reasons may explain why the restored images still preserve parts of the defects or result in degraded images.First, the number of the learning data samples is not big enough to represent all possible conditions.Second, the parameters of dictionary learning, such as iterations or image size, may affect the learning results.Another reason may lie in the situation that the degraded part cannot be recovered by the dictionary.
To verify the restoration effect, both original image and restored image were adopted as the input data of AVI system.In the inspection process, SAD and NCC algorithm are applied for similarity measurement.The experimental samples include three different quartz blank products and the average matching results of SAD and NCC measurement are listed in the Table 1.
Table 1.Matching results of two different inputs under SAD and NCC algorithm In SAD algorithm, the similarity value will be closer to 0 if the two images are more similar.On the contrary, in NCC algorithm, the similarity value implies that how much the images represent in the same pattern.If the two images are almost the same, the value will close to 1, and if the images are totally different, the value will be close to -1.Product 2 shows the best performance under both SAD and NCC as the pixel difference is reduced from 21.2478 to 17.8092, and the similarity measurement can be improved from 0.8261 to 0.8812.For the other two products, when under SAD, after image restoration, the results can reduce around 2 pixel difference; when under NCC, the similarity measurements can be improved to above 0.9, which shows promising results for product inspection.

Clustering by affinity propagation algorithm
In this section, unsupervised learning method, affinity propagation, is adopted to automatically classify possible defect types based on the difference between the original image and restored image.The input data will be categorized into several groups and each group owns a representative exemplar.It is assumed that the same defect type will display in similar pattern after the subtraction of the two images.The five major categories which contain the most samples are selected as our analysis targets and showed as Fig. 4. The five exemplars represent the defect types of noise on part of the image, noise on all regions of the image, incomplete tapping, and two types of scratch.The upper image is the original image, the middle one is restored image, and the bottom image represents the difference between two images.Since the images are all in gray-scale, the difference can be displayed as values between 0 and 1.
After the classification, diverse defect types are identified and the exemplars are the representative image of the defect type.According to the results, the potential problems of equipment and manufacturing process can be systematically surveyed.The yield rate also can be effectively increased as the demand of workforce and time for quality control can be reduced.

Conclusion
Automated optical inspection (AOI) plays an important role in industrial automation, as it can achieve more efficient and accurate inspection than manual inspection process at lower cost.Nowadays, the German government promotes the Industry 4.0 which aims to increase the computerization of the manufacturing industry and let the production lines more intelligent and efficient.This paper proposed a learningbased defective image analysis using sparse representation and affinity propagation.Among the methods that is applied in this paper, sparse representation is utilized to construct the dictionary which can be used for recovering the defect images into a restored and high-quality image.The experimental results of the constructed dictionary, restored images and defect clusters show the effectiveness and practicability of the proposed methods.In the future, we expect that the proposed methods can be applied to the surface inspection of wafers or other semi-conductors and help the production line find out facilities problems.

Fig. 4 .
Fig.4.The exemplars (a) noise on part of the image (b) noise on all regions of the image (c) incomplete tapping (d) scratches in the upper region (e) scratches in the left-down region