Analysis of Water Quality Conditions of Lake Hachiroko , Japan , with Fuzzy C-means Using Terra ASTER Data

Lake Hachiroko, Japan, has many water quality issues, evident from phenomena such as green algae blooms. Understanding the details of the surface water quality of the lake and the effect of seasons on its quality is necessary. In our previous studies, we conducted fuzzy regression analysis of remote sensing data and direct measurements of water quality. The results showed that estimation maps of water quality were well created, using only five data points of water quality parameters. To obtain maps that are in good agreement with experimental data, remote sensing data and water quality values should be acquired simultaneously. However, performing such simultaneous observations can affect the preparation of the water quality estimation maps. We overcame this obstacle by using fuzzy c-means clustering (FCM). With this method, we considered the effects of specific disturbances and uncertainties on the remote sensing data. The results indicated that FCM was particularly effective in determining suspended solids during water quality analysis. However, FCM has the issue that the initial values given to the classes used in FCM affect the FCM results. Thus, it is necessary to study the setting of the best initial values so as to obtain good estimation results. In this study, we evaluated the setting of the best initial values of FCM. In addition, based on the water quality data obtained from the study area, we considered to generate estimation maps that preset ranges of water quality levels. To evaluate the usefulness of FCM, we compared the estimation maps with the water quality data and the result of the fuzzy regression analysis.


Introduction
Changes in water quality in rivers and lakes caused by water pollution are generally investigated by extracting water samples directly from several locations.However, this method is not appropriate for understanding the water quality conditions over a large area.
Therefore, remote sensing techniques have been used to analyze water quality, and these techniques are especially suitable for understanding the water quality conditions over a large area (1)(2)(3)(4) .In our previous studies, to estimate the water quality parameters in Lake Hachiroko, Japan, surface information, such as information on poor water quality, was obtained at different observation points.Because remote sensing data may have uncertainties, we applied a fuzzy regression model (3) and a fuzzy multiple regression model for analysis (4) .The conventional methods for estimating water quality use water quality monitoring data sampled from a large number of observation points.However, we considered the possibility of applying fuzzy regression analysis to water quality monitoring data sampled from only a few observation points using the Advanced Visible and Near Infrared Radiometer type-2 on the Advanced Land Observing Satellite (3) .The results showed that water quality estimation maps can be created using only a small amount of water quality data.Because fuzzy regression analysis considers only one feature of remote sensing data, the analysis may be difficult depending on the specific bands and the water quality data.To solve this problem, we applied a method that uses a fuzzy multiple regression model that considers two features of remote sensing data, band data and water quality data (4) .To obtain maps that show good agreement with experimental data using the fuzzy regression model, remote sensing data and water quality values should be acquired simultaneously.However, simultaneous observations hamper the construction of water quality estimation maps.We challenged this obstacle by using fuzzy c-means clustering (FCM) to estimate water quality from only remote sensing data (5) .This method considers the effects of specific disturbances and uncertainties on remote sensing data.The results indicated that FCM was effective in estimating the level of pollution in Lake Hachiroko.In addition, we worked to understand the water quality parameters that have a strong relationship with the estimation maps by FCM, using multi-temporal remote sensing data (6) .As a result, it became clear that the estimation maps obtained by FCM were particularly useful for determining suspended solid (SS) amounts in the water quality analysis.
However, FCM has the issue that the initial values given to the classes used in FCM affect the results of FCM.Thus, we are required to study the setting of the best initial values that can obtain good estimation maps.On the other hand, in previous studies, we classified Lake Hachiroko into classes whose degrees of pollution were low and high using FCM.These initial values were set using histogram information obtained from remote sensing data.For this reason, evaluating the water condition in the estimation maps by FCM remained relative.
In this study, we studied the setting of the best initial values in FCM.In addition, based on the water quality data obtained from the study area, we considered to generate estimation maps that preset ranges of water quality levels.To evaluate the usefulness of FCM, we compared the estimation maps with the water quality data and the result of the fuzzy regression analysis (3) .

Study Area
Lake Hachiroko is located approximately 20 km north of Akita, Japan, and is a freshwater lake with a center latitude of 40°N and longitude of 140°E.The average depth of the lake is approximately 2.8 m and the maximum depth is 10 m.The lake has a surface area of 4732 ha and consists of an east waterway, a west waterway, and an adjustment pond, which are collectively known as Lake Hachiroko.More than 20 rivers flow into the lake.The annual water flow, including rainfall, into the lake is approximately 1.2 km 3 , and almost the same volume of water flows out from the floodgate to the Sea of Japan.Furthermore, water pollution has become a significant problem in the lake because of incoming agricultural drainage.Green algae blooms occur during the summer (7) .Thus, since 2007, the lake has been designated as part of the "Act on Special Measures Concerning Conservation of Lake Water Quality (8) ."Therefore, it is necessary to understand the details of the surface water quality of the lake and how these are affected by seasons.Fig. 1 shows an outline of Lake Hachiroko and the water quality measurement sites, denoted as St. 1 to St. 5.According to experts on water quality in the study area, it is likely that the east side of the adjustment pond, which receives polluted water from nearby rivers, will become polluted.

ASTER Data
The ASTER data used for analysis constitute three spectral bands at visible (green and red) and near-infrared wavelengths and have a resolution of 15 m.The approximate scene size is a 60 km wide swath.In this study, data recorded on August 5, 2012, were used for analysis, as shown in Fig. 2

Water Quality Measurements
Environmental quality parameters are constantly reviewed and updated by the Akita Prefectural Government.From these, five water quality parameters were used for our analysis, namely biochemical oxygen demand (BOD), chemical oxygen demand (COD), SS, total nitrogen (T-N), and total phosphorus (T-P).We selected water quality measurements recorded on August 5, 2012 (8) , as listed in Table 1.These data synchronize with the above remote sensing data.There was no rainfall when measurement of water quality and collection of remote sensing data were conducted (10) .

Data Analysis
First, we performed atmospheric correction to remove atmospheric effects from the ASTER data.Subsequently, mask processing was performed to detect the water areas in the study area.After that, we applied FCM to the extracted area to calculate the degrees of belonging to each class.Before performing FCM, the initial values in FCM were set using ASTER data and water quality data.To obtain the estimation maps, we applied level slice processing to the obtained degrees of belonging.In addition, to evaluate the usefulness of FCM, we compared the estimation maps with the water quality data and the result of the fuzzy regression analysis (3) .Fig. 3 shows the process of generation estimation maps by FCM.

Atmospheric Correction
Scattering and absorption effects of the atmosphere affect approximately 90% of the luminance values of remote sensing data acquired in the visible wavelength range.Therefore, atmospheric correction (11) of the remote sensing data obtained in the visible wavelength region was performed based on the assumption of negligible water-leaving radiance in near-infrared wavelengths.

Mask Processing
Mask processing is used in image processing to extract only a specific part and exclude other parts from an image.By performing mask processing on all data used for analysis, target water areas were extracted.(citation: ASTER-VA image courtesy of NASA/METI/AIST/Japan Spacesystems, and U.S./Japan ASTER Science Team) The initial values setting in FCM Fig. 3. Flowchart of generation estimation maps by FCM.

FCM
Remote sensing data include specific disturbances such as noise from the measurement system, and it is necessary to consider each instance of fuzziness in the data.FCM, a kind of clustering method, was used to consider fuzziness in the remote sensing data.A flowchart of the FCM is shown in Fig. 4. We classified the study area into two classes (Class 1 and Class 2) using FCM.C1 and C2 were defined from the center value of Class 1 and Class 2, respectively.The weight coefficient was set at 5 because it changes from 2 to 10 at an interval of 0.1.The clustering process stops when the number of moving pixels between the two classes reaches 1% or less of the total number of pixels.

The Initial Values Setting in FCM
To perform classification by FCM, it is necessary to set initial values to C1 and C2.The procedure of initial values setting of FCM is as follows: (1) Change the water quality data to the slice levels based on the environmental standard value (8) .( 2 2. SC1 was set to Floodgate, whose all water quality parameters belonged to the lowest slice level.SC2 was set to East of adjustment pond.This is because the initial value there acquired the best estimation maps among the three sites whose all water quality parameters belonged to the highest slice level.
An example of initial values setting in FCM is shown in Fig. 5.

Level Slice Processing
To obtain estimation maps, level slice processing was performed.The degree of belonging of Class 2 was divided into LC1 to LC2 at equal intervals, where LC1 and LC2 are the slice levels in SC1 and SC2.

Analysis of Water Quality Conditions
Table 3 lists the mean DN values that were set to the initial values of C1 and C2 in the FCM.The initial value of C1 of band 1 data was a higher value than that of C2.As shown Table 1, the water quality value of Floodgate was a lower value than that of East of adjustment pond.Information of green algae on the surface of Lake Hachiroko was not obtained due to effects of the reflectance from the water surface in band 1.Therefore, band 1 data were removed from consideration.
Classified as Class 1: Classified as Class 2: Update the center of class I 1i : Number of Class 1 1 / Fig. 4. Flowchart of the FCM.
To evaluate the estimation maps by FCM, we compared the slice levels in the maps and the water quality data at each measurement site.When the each measurement sites in estimation maps has the same slice level as the water quality data, we determined as "in agreement."Table 4 lists the comparison result of the estimation maps by FCM and the water quality data.In Table 4, "A" denotes being in agreement with the water quality data, and "B" denotes not being in agreement with the data.
The estimation maps for SS using Band 3 data are shown in Fig. 6.The LC1 and LC2 used to obtain estimation maps of the water quality parameter SS are three and six, respectively.The estimation maps are in agreement with the water quality data at Ogata bridge, East of adjustment pond, West of adjustment pond, and Floodgate.The estimation map near an estuary of East of adjustment pond, which received polluted water, is in agreement with the actual pollution condition because the slice level is six.
The estimation maps for BOD and T-P using Band 3 data are shown in Fig. 7.The LC1 and LC2 used to obtain estimation maps of the water quality parameter BOD are four and six, and the LC1 and LC2 used to obtain estimation maps of the water quality parameter T-P are five and six, respectively.These estimation maps are in agreement with the water quality data at Ogata bridge, East of adjustment pond, West of adjustment pond, and Floodgate.Therefore, we showed that the estimation maps obtained by FCM are in agreement with the water quality condition in Lake Hachiroko, which is also true in the case of the setting of the slice levels for classes of FCM being different.
According to the above results and Table 4, the estimation maps by FCM at four measurements sites are in agreement with the BOD, SS, and T-P of the water quality information.This finding indicates that the water quality parameters at occurrences of green algae and the DN values have a strong correlation.

Comparison with Fuzzy Regression Analysis Results
The estimation maps by fuzzy regression analysis (3) for T-P, SS, and BOD using Band 3 data are shown in Fig. 8.The estimation map for T-P by fuzzy regression analysis is in agreement with the water quality data at Ogata bridge, East of adjustment pond, West of adjustment pond, and Floodgate.However, the estimation maps for SS and BOD by fuzzy regression analysis show a lower level than the water quality data at West of adjustment pond.Moreover, the estimation maps show a higher level than the water quality data at Floodgate.On the other hand, the estimation maps for SS and BOD by FCM are in agreement with the water quality data at West of adjustment pond and Floodgate.Therefore, the estimation maps generated by FCM are well associated with the water quality conditions.(c) BOD Fig. 8. Estimation maps by fuzzy regression analysis (3) .
(Band 3 data, Min problem) The above result indicates that the estimation maps generated by FCM reveal the detailed water quality conditions in Lake Hachiroko.

Conclusions
In this research, we studied the setting of the initial values of FCM.Additionally, to evaluate the usefulness of FCM, we compared the estimation maps with the water quality data and the result of the fuzzy regression analysis.The following conclusions were reached: (1) The estimation maps generated by FCM were in agreement with the actual pollution conditions in Lake Hachiroko.(2) The estimation maps generated by FCM had a strong relationship with the BOD, SS, and T-P measured in Lake Hachiroko.(3) The water quality parameters at occurrences of green algae and the DN values showed a strong correlation.(4) The estimation maps generated by FCM showed detailed water quality conditions in Lake Hachiroko.

Fig. 2 .
Fig. 2. ASTER data (RGB: Bands 2, 3, and 1).(citation:ASTER-VA image courtesy of NASA/METI/AIST/Japan Spacesystems, and U.S./Japan ASTER Science Team) ) Select SC1 and SC2 which are the water quality measurement sites with the minimum and maximum slice level.(3) Take a sample of DN values in 50 points from SC1 and SC2 in the ASTER data.(4) Calculate the means of the sampled DN values, defined as mean DN. (5) Set the mean DN values as the initial values of C1 and C2.Slice levels are listed in Table

C 1 : 2 :
Center of Class 1 C Center of Class 2 here, IMAX: Number of data processed X i : Input data Calculate the dissimilarity d 1i = | X i -C 1 |: Distance with C 1 Calculate the degree of belonging m: Weight coefficient

Table 4 .
Result of water quality analysis.(a) Band 2 data.results are in agreement with water quality data.B: Estimation results are not in agreement with water quality data.

( 5 )
site Water quality value → Slice level (2) Select SC1 and SC2 which are the water quality measurement sites with the minimum and maximum slice level.SC1: Floodgate SC2: East of adjustment pond (3) Take a sample of DN values in 50 points.C1 20.300 C2 30.840Initial values in FCM (1) Change the water quality data to the slice levels based on the environmental standard value.(4) Calculate the means of the sampled DN values (mean DN).Mean DN obtained from SC1: 20.300 Mean DN obtained from SC2: 30.840Set the mean DN values as the initial values of C1 and C2.

Table 3 .
Initial values of C1 and C2 for each band.