Time Trend Analysis of Scenic Leaves and Blossoms Viewing Places

With GPS-enabled digital cameras and camera phones photo sharing websites have become exceptionally popular recently. Traditional seasonal activities are to admire leaves and blossoms. Tremendous photos about them have been uploaded. The goal of our paper is to detect the most scenic leaves and blossoms viewing places from Flickr based on social and image features as well as to perform their time trend analysis based on statistical and control engineering methods. Firstly, methods for social interestingness, color percentage, edge rate, sharpness, ranking score have been proposed. Secondly, methods for time trend analysis based on chi square test of independence and subspace identification have been provided. The proposed methods have been validated and verified with experimental results for maple leaves in Kyoto.


Introduction
Photo sharing websites (Flickr, Panoramio, etc.) have become exceptionally popular recently with the spread of GPS-enabled digital cameras and camera phones.On these websites users can upload, tag, and share their geo-tagged photos which can be used for landmark recognition, trip recommendation, event detection, etc (1,2) .
Traditionally very popular seasonal activities are to admire plum blossoms, cherry blossoms, tulips, pink moss, wisteria, hydrangea, irises, sunflowers, lavender, autumn leaves, etc. especially in Japan.Weather forecast associations try to estimate their best viewing times, while travel sites recommend best spots and their average best viewing times all around the country.
Tremendous photos about leaves and blossoms have been uploaded to photo sharing websites.The spread of digital cameras and camera phones has caused huge number of taken photos, often a whole photo collection is uploaded at the same time, which can result in low-quality images, misleading tags, etc.A goal of our paper is to detect and rank the most scenic seasonal "must go places" for autumn leaves, cherry blossoms, plum blossoms, etc.
The season for admiring leaves and blossoms is sometimes longer or shorter, it starts sometimes earlier or later than usually, the colors are sometimes more or less intensive.This variance can be influenced by many climate and other factors and their complicated effects to each other.A goal of our paper is to perform time trend analysis by identifying and analyzing influencing climate factors.
The paper is organized as follows.Section 2 covers the background and related work.Section 3 detects and ranks scenic leaves and blossoms viewing places from Flickr based on social and image features.Section 4 performs their time trend analysis based on statistical and control engineering methods.Finally, last section reports the conclusion and future work.

Background and Related Work
This section is devoted to review the background and research efforts related to this work.Firstly, interestingness algorithm in Section 2.1 is aimed for social network analysis (see Section 3.1) as well as edge detection and image gradient based blurriness in Sections 2.2 and 2.3 are devoted for image processing (see Section 3.2) in order to detect and rank scenic leaves and blossoms viewing places.
Then, chi square test of independence statistical method and subspace identification of control engineering in Sections 2.4 and 2.5 are aimed for time trend analysis of the detected scenic leaves and blossoms viewing places (see Section 4).

Interestingness Algorithm
The interestingness algorithm of Flickr (3) is a media ranking algorithm to provide an additional metric for search results licensed under US patents.It is a function of views, faves, comments, tags, time, user, network relationships, etc.
This paper is not intended to reconstruct the patented algorithm, but would like to take into consideration social features besides image ones in order to detect and rank scenic leaves and blossoms viewing places, thus, a social interestingness metric is proposed based on elapsed days, views, faves, and comments (see Section 3.1).

Edge Detection
Edges indicate object boundaries between an object and the background or overlapping objects.Edge detection identifies sharp discontinuities in an image, namely, abrupt changes in pixel intensity.
There are several well-known edge detection operators like Canny, Laplacian, Robert, Prewitt, Sobel, etc. designed to be sensitive to certain types of edges.Many papers deal with their comparative analysis (4) .Edge detection can be used for object recognition, image segmentation, etc. in numerous application fields like traffic, industrial, medical, military, etc.
In this paper, we would like to detect only the existence of given colored leaves or blossoms (see Section 3.2).The Laplacian function of OpenCV environment has been applied to edge detection (5) (InRangeS function for color filtering).

Image Gradient Based Blurriness
Image gradient can be considered as a directional change in the intensity of an image given by the derivatives in horizontal and vertical directions.It can be used to extract different kind of information from the image.The gradient magnitude and direction are defined as follows.
Image gradient can be used for example for blurriness detection.A complete study (6) is performed to classify locally and globally blurred images as well as the reason of blur.
The gradient magnitude histogram of a clear image looks like massive on smaller magnitudes and also several higher magnitudes exist.However, in case of a blurry image it is empty on higher magnitudes, only smaller magnitudes exist.The gradient direction histogram of a clear image shows that every direction has almost the same probability.However, in case of a blurry image there are higher and lower values, as well.
In this paper, we would like to decide only whether a photo is blurred, hence a moderate solution is implemented (see Section 3.2).

Chi Square Test of Independence
The chi square test (7) of independence determines whether two probability variables are independent.The null hypothesis (H0) states that the probability variables are independent.Alternate hypothesis (H1) states that the probability variables are dependent.
The chi square statistic is defined by the following equations, where Oij is the observed frequency, Eij is the expected frequency under the assumption by multiplying the row (ki.) and column (k.j) totals divided by the grand total (N).
In our previous work (8) , performance factor identification method has been proposed based on chi square test of independence in order to identify novel performance factors of web-based software systems.

Subspace Identification
The state space representation of a deterministic, discrete time, linear, time invariant system is defined by the following difference equations, where xk represents the state vector, uk is the input vector, and yk is the output vector at time k as well as A is the state matrix, B is the input matrix, C is the output matrix, and D is the feedthrough matrix. (5) The aim is to determine system matrices from input-output data by subspace identification.
The main thoughts of subspace identification (9) are demonstrated as follows.Hankel matrices (reflecting the history of input-output data) are constructed from input-output data.State sequence plays an important role in derivation and interpretation.State and output equations can be written using extended version of controllability and observability matrices as well as lower block triangular Toeplitz matrix.In geometrical interpretation, output is in the vector space determined by the union of state and input row spaces, state sequence can be estimated by projection of output row space onto orthogonal complement of input row space.Rank can be determined using singular value decomposition.System matrices can be estimated in least squares sense from the following equations.
Subspace identification has been successfully applied in various application fields.In our previous work (10) , performance modeling and prediction method has been proposed based on subspace identification in order to model the identified factors and predict the performance of web-based software systems.

Scenic Leaves and Blossoms Viewing Places
Flickr contains tremendous amount of text and image information in form of tags and images.A goal of our paper is to establish a complete framework in order to detect and rank the most scenic leaves and blossoms viewing places with scenic cultural heritage or landscape based on social and image features.Table 1 summarizes the notation of the proposed methods.
In Fig. 1 the flowchart demonstrates how the system works.From Flickr photos are crawled for an area for leaves or blossoms related tags.The collected photos are called core photos.The social interestingness of core photos is analyzed, images which are not so interesting for the community are discarded.The color percentage and edge rate of core photos are investigated, images which do not contain any given colored leaves and blossoms are discarded.The sharpness of core photos is analyzed, blurry images are discarded.The three branches can be executed parallel.If an image is discarded on a branch, then it is discarded at all.An image is kept if it passes the checks of all branches.The remained images are called resulted photos, these are the detected scenic leaves and blossoms viewing places.The resulted photos are ranked in order to detect the most scenic places.
All three checking branches have importance.Analysis of social interestingness is because Flickr is a social network site.Investigating color percentage and edge rate is because of given colored leaves and blossoms.Analysis of sharpness is because of image quality.Section 3.1 analyzes social features, Section 3.2 performs image processing, Section 3.3 ranks resulted photos, and Section 3.4 demonstrates experimental results.

Social Features
From Flickr geo-tagged and tagged photos have been crawled for an area with blossoms or leaves (like maple leaves, cherry blossoms, plum blossoms, etc.) related tags.
The crawled photos contain many misleading ones which do not contain any maple trees or leaves or whose quality is low, thus, further analysis is necessary.
There are many images among core photos which are not so interesting for the community.For example a photo has been uploaded long time ago, however, it has less views, nobody faved or commented it.The aim is to discard such images by Algorithm 1.For this purpose a social interestingness metric defined by Eq. 8 has been proposed based on elapsed days, views, faves, and comments.
A user can check several photos.If he/she really likes some of them, then maybe he/she faves it, or writes a comment.In other words, faves and comments are more significant since usually they express like, while views are related to popularity.Elapsed days are also relevant since same views in shorter days are more substantial.
In this section, dynamic analysis is necessary.Since new photos are uploaded frequently, they should be collected time to time.In addition, variables of the proposed social interestingness metric frequently change over time, hence it should be recalculated time to time.

Image Features
In core photos there are many such images which do not contain any given colored parts at all.For example in case of red maple there are many images without any red parts, namely, images with totally different content (Fig. 2a) or about green maple (Fig. 2b).The first part of the proposed Algorithm 2 is intended to discard such images (see Steps 1-5).An example for red part detection can be seen in Fig. 3. Firstly, the original RGB image (Fig. 3a) is converted to HSV in Step 1.Then, the red parts are detected by red color filtering in Step 2 (Fig. 3b).
Sometimes an image contains many nearby maple trees or leaves, however, sometimes it is a landscape image with only one maple tree in the background.Thus, the threshold in Step 4 should be set sufficiently low.After detecting given colored parts in Steps 1-5 the remaining images definitely contain some desired colored parts, however, sometimes not the desired blossoms or leaves.For example in case of red maple there are many images with totally different content (Fig. 4a and 4b) or too close maple leaves (Fig. 4c).The last one shows beautiful maple leaves, but it can be taken anywhere where there is a maple tree, even in a factory, it is not a landscape photo with scenic cultural heritage or landscape.The second part of the proposed Algorithm 2 is stood for discarding such images (see Steps 7-11).
An example for red edge detection can be seen in Fig. 3c, red edges are detected by Laplace on color filtered image in Fig 3b in Step 7. On landscape photos with leaves and blossoms there are many short given colored edges because of the shape of small leaves and blossoms.How can this behavior be detected?Measuring lengths of edges is overdone.It is sufficient to detect the existence of many short edges.The main idea of the proposed method is to calculate edge rate for the given color, namely, as given colored edge pixels versus given colored pixels (Fig. 3c versus Fig. 3b in Step 8).
For landscape photos with leaves and blossoms, in case of distant images the edge rate is very high, many times more than 100, in case of closer images the edge rate is still quite high.For photos with different content and too close maple leaves, the edge rate is usually less high.
Photos without given colored leaves or blossoms can be discarded with the proposed Algorithm 2.
There are many blurry images in core photos.With digital cameras users usually take a plenty of photos, several of them have low quality (Fig. 5a) because of camera shake, out of focus, moving object, night scene, etc.In addition, some images are very artistic (Fig. 5b), these photos are usually beautiful, but it is not a landscape photo with scenic cultural heritage or landscape.The proposed method is aimed to discard such images (see Algorithm 3).All low quality blurry photos can be discarded with the proposed method.In this section, the proposed metrics do not change over time, static analysis is sufficient, they are calculated only once, recalculation is not necessary.

Ranking
Using the proposed methods of the previous two sections, many photos can be discarded which are not so interesting for the community, do not contain any given colored leaves or blossoms, or are blurred.The aim of this section is to rank the remained resulted photos with scenic cultural heritage or landscape.A metric based on social and image features has been proposed for ranking scenic leaves and blossoms viewing places (see Eq. 10).For calculating percentages of value series, a given value is divided by the maximum value of the series, and multiplied by 100.However, in case of these metrics the maximum values can be extreme high compared to the average.Thus, average values are applied for dividing.The scale factor is distributed among every metrics in order to emphasize their relevance.Sharpness is less significant, since it is related to the quality of photos (αS = 20).Interestingness as well as colorful leaves and blossoms have around the same significance (αI = 40 as well as αC = 20 plus αE = 20).
Recall that social interestingness should be recalculated over time, thus, ranking score should be recalculated, as well.

Experimental Result
In this section, experimental environment is introduced and experimental results are demonstrated.
The used development environment is Microsoft Visual Studio 2012 and C# language.The developed framework has a three-tier architecture.The presentation tier is in ASP.NET web forms, the business logic layer is in C# classes invoking external libraries, the data access layer is using ADO.NET and stored procedures, and the database layer is in SQL Server 2012.
From Flickr photos have been crawled using Flickr.NET API Library (11) .For image processing OpenCV has been applied via a .NET interface, namely, OpenCV.NET (12) .For showing maps a Google Map control for ASP.NET has been used, namely, GoogleMaps.Subqurim.NET (13) .
The experimental results have been demonstrated for maple trees and leaves in Kyoto.From Flickr around 7200 photos have been crawled for Kyoto circle area for maple leaves related tags.The three checking branches can be executed in parallel.If an image is discarded on a branch, further checking on other branches is unnecessary.The social interestingness photos has been analyzed, around 2400 images have been found interesting for the community.Besides, the color percentage has been investigated, around 4800 images have been found to contain red parts.After checking color percentage, the edge rate of 4800 photos has been investigated, around 4600 images have been found to contain maple trees or leaves.In addition, the sharpness of has been analyzed, around 4600 images have been found not blurred.After the checks of all three branches, 1100 photos have been remained, the others have been discarded.These resulted photos have been ranked.20 example photos from top 100 scenic maple leaves viewing places in Kyoto are depicted in Fig. 6, they are beautiful photos with scenic cultural heritage or landscape.

Time Trend Analysis
Monthly and daily climate statistics like average temperature, total precipitation have been downloaded from the homepage of Japan Meteorological Agency from 2005 until 2015 for Kyoto WMO Station ID 47759.
Firstly, the relationship between monthly climate statistics and taken date of resulted photos about scenic leaves and blossoms is analyzed by chi square test of independence statistical method.Then, the relationship between daily climate statistics and taken date of resulted photos are investigated by subspace identification of control engineering.

Chi Square Test of Independence
The relationship between monthly climate statistics and taken date of resulted photos (best viewing times for leaves and blossoms) have been analyzed by chi square test of independence.
One probability variable is a factor candidate like average temperature in November.The other probability variable is the metric for number of taken photos.The Fig. 6. 20 example photos from top 100 photos.variables have to be exhaustive events.Both are discrete probability variables.They have been classified into categories according to increasing order with same number of values (see left and top headlines in Table 2).
The null hypothesis states that the given factor candidate and the metric are independent, there is no relationship between them, in other words, the factor candidate does not influence the metric.The alternate hypothesis states that the given factor candidate and the metric are dependent, there is a relationship between them, in other words, the factor candidate influences the metric, namely, an influencing factor is identified.
An example contingency table filled with observed frequencies, row (ki.), column (k.j), and grand (N) totals is shown in Table 2.
The detailed results of executed chi square tests are depicted in Table 3.Each null hypothesis is rejected at every acceptable significance levels, because the chi square statistic is larger than the critical values belonging to each acceptable significance level.Six influencing factors have been identified and proven for maple leaves: average temperature and total precipitation in November, in December, and annual.
It has been shown that influencing climate factors for leaves and blossoms can be identified and proven by chi square test of independence statistical method.

Subspace Identification
The relationship between daily climate statistics and taken date of resulted photos (best viewing times for leaves and blossoms) have been investigated by subspace identification.The subspace identification method has been implemented in MATLAB.
Influencing factors have been investigated as inputs like daily average temperature and total precipitation in a given year.The output is the metric for daily photo count in a given year using central moving average.Namely, for a given day average value is calculated from day before yesterday until day after tomorrow, since probably almost the same scenic photo can be taken even yesterday or tomorrow, as usually the condition of leaves and blossoms do not change day by day very sharply.
The relation between the combined two input factors and the output metric has been modeled by subspace identification.State space models have been provided using subspace identification.For example the system matrices of state space model for year 2014 are shown in Fig 7.
The goodness of fit is defined as follows where norm stands for L2 norm, ymeasured is the output metric, namely, the downloaded daily climate values, and ymodel is the calculated output using the provided state space model.In case of the state space model shown in Fig. 7, the goodness of fit is 70.84%, namely, the relation between the  For average temperature as a single input the goodness of fit is 47.2%.For total precipitation as a single input the goodness of fit is 32.99%.The average temperature has stronger influence than total precipitation.However, combining these two inputs the goodness of fit is higher, 70.84%, the relationship is stronger.

Conclusions
Photo sharing websites have become exceptionally popular recently with the spread of GPS-enabled digital cameras and camera phones.Traditional seasonal activities are to admire autumn leaves, cherry blossoms, plum blossoms, etc. Tremendous photos about leaves and blossoms have been uploaded to photo sharing websites.In this paper, a complete framework has been established in order to detect and rank the most scenic leaves and blossoms viewing places based on social and image features as well as to perform their time trend analysis based on statistical and control engineering methods.
Firstly, social features have been investigated, photos have been crawled from Flickr for an area for leaves or blossoms related tags as well as images which are not interesting for the community have been discarded.Then, image features have been analyzed, images which do not contain any given colored leaves and blossoms or blurred have been discarded.In addition, based on the proposed social interestingness, color percentage, edge rate, sharpness metrics the remained photos with scenic cultural heritage or landscape have been ranked.
Moreover, time trend analysis have been performed.The relationship between monthly climate statistics and resulted photos have been analyzed by chi square test of independence, it has been shown that monthly and annual average temperature and total precipitation are influencing factors.The relationship between daily climate statistics and resulted photos have been modeled by subspace identification, state space models have been provided for combined inputs of influencing factors.
The proposed methods have been validated and verified with experimental results for maple trees and leaves in Kyoto.

Table 1 .
Notation of proposed methods in Section 3.
αE scale factor for edge rate (constant) Eavg average edge rate of resulted photos Sp sharpness of photo p, Eq. 9 tS threshold for sharpness (constant) αS scale factor for sharpness (constant) Savg average sharpness of resulted photos Rp ranking score of p, Rp(Ip, Cp, Ep, Sp), Eq. 10 (a) Different content (b) Green maple

Table 3 .
Detailed results of chi square tests.Fig. 7. System matrices of state space model.combined two input factors and the output metric is strong.