NOISE TYPE IDENTIFICATION USING MACHINE LEARNING

In this paper, we have proposed a new technique for automatically identifying the type of noise in digital images. Our machine learning based noise Type identification scheme uses some well-known statistical parameters to distinguish different types of noises. Local features of 3x3 window are used to train the machine learning based classifier. We have catered for 2 types of noise (salt & peppers and random-valued) in this paper. Experiments show that the proposed technique gives promising results and can be enhanced to be a generic noise identification system for every type of noise.


Introduction
Noise detection and removal from digital images is very primary task in most of digital image processing applications.Images corrupted with noise are first analyzed to find the type of noise and then specific noise detection and removal algorithm is applied for that type of noise.Different applications assume the type of expected noise depending upon their environment and apply algorithm for detection and removal of that noise type.With the increase in digital image processing applications for versatile and dynamic environment, the assumption of a specific type of noise is no more valid.Now a days, image processing applications are used in variety of way and in almost every discipline of life.So images can get corrupted with different types of noise in different times in a dynamic environment.There is a need of some mechanism to automatically identify the type of noise; so that one can apply specific algorithm for that type of noise.
Noise type identification is a very new topic of research and very little work has been done on it.In [1], Noise identification using Local Histograms method is proposed which consists of roughly segmenting and labeling the noisy image.The image of labels is then used for the selection of homogeneous regions.
In [2], a neural network based technique for identifying the type of noise present in a noisy image is proposed.The proposed method exhibits fast training process and does not require any assumption in the given images such as homogeneous areas etc.Its accuracy gets down with the increase in noise density.[3,4,5] implement statistical feature extraction for calculating the statistical properties and a simple pattern classification scheme is applied on the features to identify the noise type present in an image.This method first applies noise removal filters for all types of noises, subtracts the resulting image from original image to get noise and then tries to identify it.
In [ [14]], Gonzalez and woods has given some methods for noise type identification, which are based on histogram analysis.These methods are based on global perspective and can be used to get an estimate about occurrence of a noise type in an image.These methods have some limitations or assumptions to work e.g. they require imaging system to be present or location information of noise is known or a single type of noise present in the image [ [14]].
All the techniques available in literature simply inform about presence of a certain type of noise in an image but can't tell about the location of the noise.In this paper we have proposed a generic noise type identification method, which identifies noise type based of local window and thus can tell about the noise type in each corrupted pixel individually.Our technique not only works good for whole range of noise but also performs very well for mixed noise.

Major Contributions:
In order to identify noise type, we have proposed a machine learning based approach which uses well-known statistical features to identify different types of noise.Main contributions of the proposed technique are:  Novel approach for identification of noise type using statistical features and machine learning algorithm. Our method not only identifies the type of noise but also gives location of certain type of noise. Our technique identifies noise type on pixel by pixel bases, so it also has the capability to detect multiple types of noises present at different parts in a single image. Proposed technique identifies noise type with high accuracy without any prior knowledge about degradation process or original image.
Rest of the paper is organized as follows: Section 3 explains the proposed technique.Section 4 discusses experimental results and discussions.Section 5 gives conclusion and future work.

Proposed Technique:
The proposed technique uses well-known local statistical features of the images and utilizes Decision Tree algorithm (C4.5) [ [12]] to solve the problem of noise type identification.Standard artificially generated training image [ [6], [7], and [13]] is used to train machine learning (ML) algorithm and then tested over database of images.
Ten well-known statistical features (which are discussed below) are used for the training of algorithm.
Figure 1 shows the proposed system architecture.Detailed procedure comprising of different steps is given below.

3.1.Training Data Generation:
Selection of a good training image is the most vital part of a trainable noise identification system.A synthetically created training image is used in [[6], [7], and [13]], which has more generalization capability.In Figure 2

3.2.Feature Set:
We have used well-known statistical features in image processing for the problem of impulse noise detection.
We applied 10 well-known statistical functions on the N x N considered window.The N x N window is converted to a single dimensional 1x N 2 vector and then these functions are applied on it.Here is the detail and significance of each of the statistical function.

ROAD:
ROAD factor, which is a very valuable feature for distinguishing between noisy and non-noisy pixels, is proposed by [[8]].The value of ROAD factor is low in case of non-noisy pixels and high in case of noisy pixels.The ROAD factor is calculated using following steps: First of all, absolute deleted difference is calculated among considered vector and the central pixel (for a 3x3 window, it consists of eight elements).
In the next step, this vector is sorted in increasing order.ROAD factor is the sum of the first four values of this sorted vector.
ROAD value is calculated for each pixel using its N x N window.

MAD:
MAD (median of absolute deviations from the median) is a robust order statistics of the local variance [ [10]] calculated according to the following equation:

MAD = median abs x − med
Where x is the considered vector and med is the median of the considered vector.
MAD is a robust estimator and has the capability of accurately estimating distribution variance, even if the specified window has more than 50% corrupted samples.

STANDARD DEVIATION:
Standard deviation of a vector is calculated as: Where

VARIANCE:
Variance is square of the standard deviation and is calculated as:

MEDIAN:
Median is the value at middle index of a sorted vector and can be calculated as: Where n is the size of the sorted difference vector dx, and dx is difference of the vector from central pixel.

MEAN:
Mean is calculated as: Where dx is the difference of the vector from central pixel.

MIN:
Min returns the smallest value of the difference vector dx.

MAX:
Max returns the largest value of the difference vector dx.

ENTROPY:
Entropy is a statistical measure of randomness and is calculated as: Where p k contains the histogram counts for k th gray level and  is total number of gray levels in the image.

ROLD:
ROLD is proposed by [ [9]] and is very good feature to detect random-valued impulse noise.It can be defined as: Where R k is the th smallest   for all s, t ϵ Ω N 0 and   (y i,j ) is defined as   y i,j = 1 + max log 2 y i+s,j+t − y i,j , −5 5 ∀ t ϵ Ω N 0

3.3.Training of Classifiers:
We This is a 3 class classification problem, which has been solved using machine learning based classifier.
After training, classifier generates a model or set of rules to be used in future for the process of noise detection.In the case of C4.5 [ [12]], a set of rules are generated after the completion of training.These rules are in the form of IF-THEN-ELSE and can be directly used in real time.To detect noise type in a given image, same set of features of the current window (given in section 3.2) are calculated and are passed through to the rule set.The central pixel of the current window will be non-noisy if the output is 0, will have random-valued noise, if output is 1, and will have salt & peppers noise, if output is 2.

Experimental Results:
We have performed comprehensive experiments to show the performance of the proposed technique.First of all we generated training data from training image (Fig. 3 x.In the matrix, this is the column sum of class x minus the diagonal element, divided by the rows sums of all other classes".[ [11]] "Accuracy is defined as the total number of correctly classified instances for all classes divided by the total number of instances in the test set".[ [11]]

Conclusion:
We proposed a novel method for identification of noise type in digital images.Experiments show promising results as we have achieved minimum accuracy of 96% for the whole range of noise density.Further experimentation is required to make this scheme more generic and reliable.For future work, here are some options to explore more.
 In this paper, we have taken 3x3 fixed window size.In future we can change window size to 5x5 or 7x7 etc.  We have used C4.5 (Decision Tree) for classification purpose, which is not a specialized classifier for multi-class problems.We can use some evolutionary algorithm based classifier in future, which are known to perform better for multi-class problems. In future, we'll use some more advanced features for noise type identification. Here we have considered only two types of noise (salt & peppers and random-valued noise).In future, we'll add more noise types to make it a generic noise identification system.
, we have shown the training and target images used by [[6],[7], and[13]].The Figure2(a) is 128x128 pixels image and consists of 4x4 pixels square boxes.Each square box has same gray level values of all pixels.Value of these pixels is chosen randomly between [0, 255].We add 50% impulse noise to the first image to obtain the training image Figure 2(b).The Figure 2(c) is the target image for our noise identification system, and contains white and black spots denoting the existence and absence of noise.Since the pixel values of the training image are chosen randomly, training a network for this training data provides more accurate results for all class of images.
take the training image and add 50% random-valued impulse noise.3x3 window is taken from start of image and is moved through the image.Feature vector of each of the window is calculated and is used for training.The target value or class of the considered feature vector is 0, if central pixel of the window is noise-free and class is 1 for random-valued impulse noise.
Same training image is taken again and 50% salt & peppers noise is added to it.Same process is repeated except class is set to 2 for salt & peppers noise.Feature vectors and class of both types are merged to form the training data.So now we have feature vectors having salt & peppers noise, having random-valued noise, and having noise-free pixels.
[11] discussed in section 3. Classifier is trained on this training data.For testing purpose, we have used standard test images.Results have been reported on 4 standard images i.e.Baboon, Lena, Parrots, and Peppers.All these images have totally different texture pattern and dynamic range.So the results reported here are representative of wide range of image types.Performance Measures used in the experiments are True Positive Rate, False Positive Rate, and Accuracy [[11]].These measures are widely used in literature to gauge the performance of a classifier.Here is the description of these performance measures.
"The True Positive (TP) rate is the proportion of examples which were classified as class x, among all examples which truly have class x, i.e. how much part of the class was captured.In the confusion matrix, this is the diagonal element divided by the sum over the relevant row".[[11]] "The False Positive (FP) rate is the proportion of examples which were classified as class x, but belong to a different class, among all examples which are not of class

Table 1 Noise Type Identification Accuracy
Tables2 to 5show the TP and FP rates at different noise densities for the three classes i.e.Non-noisy, Randomvalued noise, and salt & peppers noise.We see that TP rate for Noise-Free and random-valued class is high for whole range of noise density.But at 90% noise Noise-Free pixel detection performance is degraded.Similarly at low noise density, some salt & peppers noisy pixels are misclassified as Noise-Free and FP rates for random-valued class remain good and stable for whole range of noise density.Table1shows the overall accuracy of the classifier for whole range of noise density.We see that overall accuracy is above 96% for the whole range and for all test images shown here.