The Study of Detection Method of Echizen Jellyfish Using Images from the Internet

This study is proposed on detection of Echizen Jellyfish from the images which were anonymously uploaded to the internet. The outbreak of Echizen Jellyfish will seriously damage the fishery of Japan. In this work, a machine learning algorithm such as SVM with the HOG feature was employed. All of the test images were gathered from internet. There is no need to install underwater camera, and this method can reduce cost for detection. Also real observation requires many human resource and time. On the other hand, proposed method requires no extra people and can detect the jerry fish via internet images timely. In this method, precision is lower than real observation, but such an error can be canceled because what we need is the relatively appearance frequency for a limited time span consequently. The evaluation of this system is performed by using the F-measure. F-measure is a comprehensive evaluation of the precision and recall. The final goal of this study is detect Echizen Jellyfish from internet automatically without human interaction and extra cost. As a previous step in this work the detail of detection algorithm is discussed.


Introduction
In the past, outbreak of Echizen Jellyfish has been observed in the 1920's, 1958's and 1995's.And this phenomenon has been thought occurring by every 40 years.But In recent, this phenomenon occurred every year from the 2002 to 2005.Based on this fact, it is difficult to predict the outbreak of Echizen Jellyfish by the periodical experience.This phenomenon mainly harm to the fishing in Japan.For example, the fishing net was broken by heavy weight, and the fishery amount of other fishes decreased due to the insufficient of plankton.Thus Echizen Jellyfish should be monitored before the outbreak happens.
In this study, we hypothesized as the increment of the number of images with Echizen Jellyfish means the increment of real number of Echizen Jellyfish.Referring images in website indirectly but not the direct observing in real will gives a bias to the detection accuracy.It is important to increase the recall and precision factor of proposed system by decide an appropriate the number of images and combination of images for learning step, which improve the performance of the proposed system when using a limited image set.
This study aims to efficiently grasp the occurrence of jellyfish by using valid data on the website with enormous amount of information, which can be used by anyone.

Machine Learning
Machine learning is the study began as a branch of artificial intelligence since 1960's.This technology gives computer learning ability as the human has.And machine learning has two main learning model, supervised learning and unsupervised learning.In this study, we use supervised learning.Supervised learning is learn by teaching case consisted of pair data, input and output.This learning method is predict output data from input data.So input multiplex data as teaching case, we can get correct output data.This method have any case that system can high performance for teaching case, but low performance for unknown case, which called over fit.

Support Vector Machine (SVM)
SVM uses the method called kernel method.And kernel method constitutes non-linear discriminant function.With this system has high performance and can identifies unknown case by using linear threshold element, which using method constitutes two classes of pattern classifier.This method can improved performances by linear threshold element learn parameter based on margin maximization.But this learning method can increase amount of calculation by increasing learning data.The algorism of SVM is shown in Fig. 2.

Kernel method
Kernel method is the method that maps the data to the feature space with high dimension to extract non-linear information and high dimension moment.This can calculate easy.

3.4
The margin maximization In case of linear separation, the hyperplane of dividing sample data is present in countless.And if raise the dimension, rise the degree of freedom.It considered the hyperplane is best, which is the middle of 2-class of hyperplane.This hyperplane means margin maximization.

3.5
The HOG feature HOG feature is perform with following procedure.1. Calculating the gradient direction and gradient strength of brightness.2. Creating a histogram.

Normalization by block area.
By create histogram from the gradient direction of brightness in the local area, react to the shape of the object.In other words, this method is suitable for the detection of Echizen Jellyfish.Since, based on the gradient information, this algorism can applicable to different image size.This mean that this method is suitable for the detection of the images in website.Fig. 3 is a sample of image which learned Fig. 2 The algorism of SVM.Echizen Jellyfish by HOG feature.

Experiment
Collecting images with Echizen Jellyfish from website as far as possible by manually.As a result of collecting images, we can get 57 images, which includes the images with plural.In this study, images count only whether Echizen Jellyfish present in the photo or not.This images divides for CV set images and training set images randomly.In this study, we have to consider how many images and which combination can constitute most proper system.There is a need to clarify experimental results.

Experiment 1: How many images can constitute most proper system
We made 5 datasets, for each include 15 images selected randomly.And change the number of images in each dataset.From this, investigate the accuracy of detection studied by each dataset which have different number of image.And we use the average number detected by these 5 datasets.The result of this experiment is shown in Table 1.
In this experiment, it is considered that 11 images is better for constitute system.
Table1.The result of experiment 1.

The number of raining images
The total number of detected The average number of detected

Experiment 2: Which images can constitute most proper system
At first, we named images for ease of experiment.We made 4 dataset which no matter with dataset in experiment 1.And these dataset not include same images in each dataset.Each dataset includes 11 images which were decided by experiment 1.
(a) Rearrange the first half and second half of the dataset.And we constitute dataset of 16 ways.The rearrange in this experiment is shown in Fig. 5. From these dataset, we created a classifier.The number of detected images in this experiment shown in Table 2.  Table 2.The results of experiment 2(a).

combination [a][a] [b][b] [c][c] [d][d]
The number of detected The number of detected 0 2 0 0 The number of detected The number of detected 3 1 3 3 (b) Rearrange the odd and even of the dataset.And similar as in Table 2, we constitute dataset of 12 ways.The rearrange in this experiment is shown in Fig. 6.From these dataset, we created classifier.The number of detected images in this experiment shown in Table 3.
From the above, in this experiment, created the classifier by constitute total 28 way of dataset.Table 3.The results of experiment 2(b).

Precision and recall of system
We collect images which not include Echizen Jellyfish from website.And to investigate whether detected or not as Jellyfish for highly detected classifier.In this experiment, there are four events.The events can branch by combination of input and output.In detail, the input images are true or not and the output images are positive or negative.These four events is shown in Table .4.

4.4
The study of learning results The classifier created by machine learning must be evaluated.In machine learning, Precision (P) and Recall (R) is used as the measure of the evaluation.These measure is given by the equation ( 1).Table 4.The events about input and output.
These equations are intended to evaluate the usefulness of the system.And both equations must be considerate.Therefore, we employed F-measure, which is the harmonic mean of Precision and Recall.F-measure can represent the following equation.
Referring to Tables 1 and 2

Conclusion
In this research, a high performance classifier has been created.The classifier is created by using dataset [b'][d'].And the F-measure is 0.69 from equation (10).In addition, based on the F-measure calculated in this study, there is need to create a better classifier.The Incorrect images includes some images which has clear circular contour.And thought that the classifier precision can improved by processing to these images.
dataset [c'][d'] , [a'][c'], and [b'][d'] can highly detect the Echizen Jellyfish.Determine the F-measure of these classifiers.First of all, the data of each classifier is shown in Table5~7.Table 5.Detection of dataset [c'][d'].Detection of dataset [b'][d'].measure can calculate from Table.4~6 and each measure is calculated in the equation (2) ~ (10).About the dataset [c'][d'].these experiments, some true positive images and false positive images could be obtained.The sample of images is shown in Fig. 7 and Fig. 8.