Elimination of signal-resembled anomalies in detected plate

This article addresses the problem of eliminating signal-resembled blob or blobs on detected region from plate detection stage of automated license plate recognition (ANPR) system. The proposed method amplifies the slight differences between the non-character blobs (anomalies) and the character blobs (true signal) to enhance the tractability. This method postulates on two propositions: 1) the anomalies usually locates around the true signal and the suspected anomalies 2) blob should be given less emphasis in computing a reference point. Performance of the method is evaluated on both its capability and consistency in solving certain types of anomalies.


Introduction
Automatic license plate recognition (ALPR), a system that requires a synergy between camera and video analytics algorithm for the purpose of capturing, detecting and recognizing the vehicle registered plates [1].There are many applications that make use of ALPR technology [2].For instance, the implementation of ALPR in managing the parking facilities for security purpose and access control system [3]; ALPR is also frequently applied in tolling system to collect the toll fare [4].Also, ALPR is an effective tool for enforcing the traffic law by spotting and recording the car that violates the traffic law [5].It is obvious that ALPR technology is indispensable in integrating intelligent transport system [6] into transportation system infrastructure for any city.The ALPR overall system consists of following main processes [7]: plate detection stage that is responsible to extract the plates from the frame captured by the camera [8][9][10][11]; then second stage is the plate segmentation stage that is responsible to extract the character from the detected plate in first stage [12][13][14]; followed by plate recognition stage that is responsible to recognize the segmented character from second stage [15][16][17][18]; lastly, the recognized optical character will be analyzed before being published on screen and being stored in database.The proposed method focuses on plate segmentation, which is second stage in Figure 1.Plate segmentation stage can be further comprises a series of procedures as illustrated in Figure 2 such as noise removal to remove non-character objects, transformation correction to skew and tilt the plate to desired orientation, binarization and character segmentation to extract the character from the binarized plate image.The proposed method falls into the noise removal procedure.
In ALPR, license plate will be first detected and then cropped from the video frame.The common subsequent step after obtaining the plate region is to convert it into gray scale and then undergo the binarization process (a procedure to replace all the image's pixels by using only two intensity values, normally 255 and 0, or represented by 1 or 0) before each individual character in the plate can be segmented and eventually sent to classifier for recognition purpose.Due to imperfection of plate detection algorithm or binarization on detected region, information loss or information distortion occurs.The information of the plate here refers to the structural information of the characters contained inside the detected plate.These imperfection, though affect small percentage of plates (usually 2 to 3%), is crucial to advanced ALPR that strives to achieve higher accuracy approximating perfection.
To be more specific, image region that passed down to DOI: 10.12792/icisip2015.031image segmentation stage from detection stage might not ideally contain only plate but other anomalies as well (that is the reason the words "image region" are used here instead of "detected plate").Anomalies formed by binarized edges such as side car part of tailgating car, logo or any stickers on plates or frame that enclosing the plate.Besides, noises that caused by uneven illumination will left some white blobs after binarization on the detected region.Therefore, after this binarization, before character segmentation is performed, a process that eliminates these anomalies or unwanted blobs is necessary.Nonetheless, not every type of anomaly is difficult to be removed.For those anomalies that are distinct from character blob in terms of some measurable properties (such as height, width, parameter, diameter, white pixel density, shape regularity or number of white pixels.),simple rule based method can be adopted to remove them easily.However, removing anomalies that resemble (but not identical) character blob after binarization in terms of these properties is a daunting task.
To countervail the anomalies from remaining in the detected region after binarization, we propose a method that can effectively eliminate those anomalies in a fast computational manner.In this method, the properties used to differentiate anomalies from true signal (which are the character blobs) remain the same (in other words, we do not compute complicated properties from each blob); the simple rule based method can still be adopted (in other words, we do not adopt complicated rules to track these anomalies); the only difference is the methodology we adopt to compute a reference point.This reference point is utilized to estimates the average value of the properties of true signal.Note that in this case, a straight forward computation of average of properties is not effective because the value can be distorted by those anomalies.Note also that the anomalies resemble the true signals and not identical to the true signals.It means that there are slight differences between anomalies and true signals but these differences are too difficult to be tracked down.The invented method will actually amplify these slight differences so that the properties between anomalies and true signals are distinct enough to be tracked down by simple rule based method.

Methodology
The proposed anomalies elimination method, a procedure that aims to remove or eliminate non-character blobs after binarization of detected region from the car plate detection stage in ALPR system, consists of three modules, This procedure involves three main inventive steps: 1) Fit emphasis density distribution on detected region; 2) compute the reference point; 3) Identify and eliminate the anormalies.The details of each module will be explained in following section.

Creating and fitting an Emphasis Density Distribution (EDD) on detected region
All First model of overall process as illustrated in Fig. 1 is a series of steps that serves the purpose of assigning a value to each pixel; this value of which serves the purpose of quantifying the relative degree of emphasis of each pixel; in other words, this quantified value implies the relevancy or reliability of this pixel belong to a character blob.The more probable this pixel belongs to a character blob (true signal) instead of non-character blob (anomalies), the higher the value this pixel will be assigned.
The purpose of doing this is to increase the involvement of true signal and decrease the involvement of anomalies, in computing the reference point so that this reference point approximates true signal better.This weight distribution function that indicates to which extent the pixel involvement in computing reference point is emphasized and thus is referred by us as emphasis distribution function or EDD hereafter to ease the explanations.The EDD is designed according to our first proposition that the anomalies usually, if not all the time, scattered around the true signals.Basically, three steps are needed: pre-select a continuous parametric distribution model, compute parameters of the distribution model based on plate properties, discretize the continuous distribution and assign a value to each pixel on the plate.
Where G(C,L) represents the EMD in coordinate (C,L) where C represents column in R and L represents row in R, X depicts 1-by-2 vectors, [C L] representing C and L coordinates of the rectangle detected region, R and ∑ depicts covariance of vector X ∑ is a symmetric positive definite matrix defined as following equation .For example, if 2 properties of blob are chosen, namely P1 and P2, then X = [P1 P2] and The μ in (1) is selected as the center point of the plate in which the x-coordinate of the center is given by (plate width/2) and y-coordinate of the center is given by (plate height/2).Center is chosen as μ to assign higher weight to pixels closer to center based on the aforesaid assumption: the center pixels are more "reliable" than pixels near to the four sides of the plate because anomalies often, if exists, scattered around the center.It is important to note that even anomalies do not exist; still the algorithm works well because side pixels are more susceptible to uneven illumination.For correlation, an advanced version of this method can have a variable correlation determined after analyzing the blobs direction; however, in this described invented method, we assume the input image for the method is a well-skewed detected region; therefore, correlations are set to 0. For variance of two sides, which are the vertical side and the horizontal side, we design them to be dependent on the plate width and height respectively.Fig. 2 shows an example of emphasis density distribution (EDD) for a possible single line detected region with height 20 pixels and width 60 pixels.

Computation of reference point
The reference point is a point in D-dimensional feature space that acts as an estimation of central tendency.This reference point serves as a reference to which we believe a character blob will contain this value if D-dimensional properties or features of character blob is computed.In other words, this reference point acts as an approximation of true signal, or expected true signal to be exact.
An intuitive way to compute a central tendency is to compute mean, median or mode of all the feature points.However, each of them has its own drawback in which mean include anomalies in calculation, median does not include other data points except the median point, and mode always fails in data set with low standard deviation in terms of anomalies elimination of ALPR.In other words, the motivation of computing reference point is to generate a new point between which and the anomalies is larger than the distance between conventional central point (such as mean) and the anomaly, in order to amplify the discrepancy between central tendency and anomalies Following explanation narrates how EDD explained in 3.1 is utilized to fit on the detected region in order compute a reference point that approximate true signal better as compared to a conventional reference point computation.
Assuming that two features or properties of blob are used to compute the reference point (two dimensional features space in this case), namely P1 and P2.To compute this reference point, the input image must be a binarized image where white pixel is pixel with value =1 and black pixel with value =0.To explain the process, we have to introduce the term "blob".Blob is a group of connected pixels in the detected region.Second term to introduce is (, )  where subscripts represent the blob index in which  ∈ [1,2,3 … ,  ] , C represents column or x-coordinate in two dimensional region R and L represents row or y-coordinate in two dimensional region R.For example, (23,44)  represents the pixel value in image coordinate (23,44) and this pixel belongs to second blob; if that particular pixel is a white pixel, then (23,44)  =  or otherwise, (23,44)  =  .

Identification of the anomalies using reference point
The last procedure serves to identify blobs that are suspected to be anomalies.One of the challenges in this procedure is that the existence of anomalies is unknown, the number of anomalies is unknown and the type of anomalies is unknown.In other words, there is no clue about the occurrence of anomaly in the detected region and also number and type of anomalies are there if exist.This is considered a challenge because if we are certain that anomalies exist then the algorithm in anomalies identification become much simpler since the existence of anomalies is certain in every detected region and our task is just to identify it.Similarly, this procedure become even much simpler if we are acknowledged about the number of anomalies; what we have to do is just to pick a certain numbers of the most unlikely-to-be-character blobs.Knowing the type of anomalies, whether the anomalies are extreme anomalies (in which the feature value extracted from the anomalies and character blob is of big difference) or signal-resembled anomalies aids in designing effective algorithm, but this information remains unknown.Absence of all these prior knowledge leads identification of anomalies into a non-trivial problem.
With the previously computed reference point, we design a framework that deal with these unknown factor by a hierarchical setting in which the extreme anomalies will be first eliminated to avoid the masking effect on the signal-resembled anomalies.This framework also assures if the plate has no anomaly, character blob will not be eliminated (false negative).We propose an identification of anomalies method based on Chebyshev's inequality.First we summarize the steps involve in identification of anomalies using reference points followed by explaining the details involve in each step.
The first step of identification is to compute discrepancy (using Euclidean distance) between each two-dimensional feature point, (  1 ,   2 ) (of each B-th blob) and reference point ( 1 ,  2 ) in the vector space.
Let   denotes the real-valued random variable that represents the computed Euclidean distance in which subscripts  ∈ [1,2,3, …  ] .The Euclidean distance between each feature vector and reference point is computed as following: Next, a measure of spread () and expected value of D, E(D), are defined as following: The last step is to perform analysis using one-sided Chebyshev's inequality.This analysis serves to eliminate remarkable anomalies that lead to masking effect of signal-resembled anomalies¬.Anomalies that are different in great extent from true signal in the detected region will be eliminated by the following inequality: Chebyshev's inequality does not assume any underlying probability distribution of the data.Since the number of discrepancies is always in small number, law of large number might not be a suitable estimation to assume normal distribution on discrepancies.Decision rule with C% confidence interval: Experiment is set up to examine the robustness of the proposed anomaly removal framework.The robustness of this anomaly detection is assessed in two perspectives:1) ability of the framework to remove anomalies if anomalies or anomaly exists in the detected region from detection stage 2) ability of the framework to retain all the characters either in the presence or absence of anomalies in the detected region from detection stage.
The experimental data set is formed by randomly selected 1140 detected regions and within each of which contains license plate.These data are extracted from set of detected regions that fail to be recognized correctly due to anomalies.This experiment examines the extent of the designed framework in solving the failure of which.Without Of these detected regions, 900 detected regions contain 450 double rows license plates and 450 single row license plates, without any anomaly.These 900 detected regions serve as controlled set to assess the second ability by examining if anomaly does not exist in the detected region, will the proposed framework wrongly classifies true signal into anomaly (False positive).The remaining 240 detected regions contain 180 single row license plates and 60 double rows license plates, with anomaly or anomalies.
The experiment result indicates that all anomalies in 92.91% of plates with at least two anomalies are able to be removed.Of these plates, 97.69% plates retain all the characters.This indicates that the proposed framework demonstrates ability to remove anomalies without removing the required information.The result for controlled set shows that 98.22 % of plates without anomaly succeed to retain all characters.This indicates that the proposed framework demonstrates stability in the event of detected region without anomalies.Both performances of proposed framework in single plates and double plates are consistent and therefore this finding confirms its applicability in several common patterns of plates.

Conclusions
This paper first presents the common architectural design of ALPR followed by identifying the possibility to further improve the performance of this system by designing a filtering process that serves to eliminate anomalies.This filtering framework is designed to compensate the imperfect plate detection algorithm.We have demonstrated the detail methodology in setting up the filtering framework from assigning weight to identifying anomaly followed by elimination.Experimental result shows that the proposed framework is practical and useful to be implemented into ALPR system to boost the accuracy as long as the frequency of occurrence is within the range depending on the false positive and true positive of the proposed algorithm.

Fig. 2
Fig.2 Example of Emphasis Density Distribution (EDD) prepared for a single line detected region with k=0.8, plate height, 20 pixels and plate width 60 pixels.