Feature-Based Opinion Mining on Smart-Phone Reviews

Feature-based opinion mining is a technique that identifies positive and negative polarity based on object features. The mining technique is different from traditional opinion mining as the feature-based technique does not only examine and summarizes the overall opinion of each review. In smart-phone market, the overall review may not be a practical recommendation for customers to choose their phones. It is essential to summarize opinions of users based on each features of smart-phones. This will be advantage for customers who want to buy smart-phones and smart-phone companies that can use this information to improve features of their smart-phone. In this paper, we propose a method for mining opinions on smart-phone reviews written in Thai. The method summarizes positive and negative polarity of each feature of smart-phones. In this paper, first, smart-phone reviews are collected from smart-phone pages on Facebook using Facebook Graph API. Second, the review dataset are clean and then perform word segmentation using a Thai segmentation technique. Then finding similarity word of each feature is performed as the same feature may write the different words. Finally, each feature is decided to be either negative or positive by considering polarity of words which are after the feature. From the experimental result, it was shown than the proposed method gives 70.17% of accuracy.


Introduction
Currently, smart-phone is a part of daily life.Thai people spend the most time day with their smart-phone because people can use it for many purposes, such as photography, communication, travel, information searching etc.The InsightExpress company, the market research in the US, reports that 98% of Thai people habitually use smart-phones.The number of thai people using smart-phones has increasing 29%.Based on increasing demand of smart-phones, many smart-phone companies tried to produce new smart-phones with user demands.If any new smart-phone releases, they will advertise it.The most popular advertisement channel is social networks such as Facebook, Twister, Linkedin etc.The reviewers are able to write their opinion on the social network websites.The most opinion is about the products and services.Usually, the reviews on smart-phones are about features of smart-phones.For examples, "beautiful camera", "run out of battery so fast", "hot machine and it will explode" etc.These opinions express the attitudes, sentiments, and emotions which is very useful for smart-phone companies and customers.The companies use the reviews to improve their products and the people who want to buy a smart-phone usually read some existing reviews on phone pages to help them make a decision.Unfortunately, the number of the existing reviews is very large and difficult to summarize each feature of smart-phone by human.Therefore, this paper proposed feature-based opinion mining on smart-phone reviews written in Thai.The reviews are summarized to identify either positive or negative opinion of each feature of smart-phone.
The remainder of the paper is organized as follows.The related works are mentioned in section 2. The proposed methodology is explained in section 3. The experimental results are shown in section 4. Finally, conclusion is given in section 5.

Related Works
Opinion mining or sentiment analysis (7)(8)(9) was proposed to analysis sentiments, emotions, and attitudes of people to their interesting topic.Opinions are normally mined for text documents which have to pass text pre-processing steps.First, cleaning text is the modification of text from wrong text to correct text or words.English text can use the spelling correction algorithm (10) , but it cannot be used in Thai text.However, Thai text cleaning can be performed by using a dictionary.Word segmentation (9) is the process to separate words from a document.Words in English are able to be separated by space, point (.), and comma (,).Writing style of Thai text is the long consecutive sentence without space or point or comma, so space, point, and comma cannot be used to separate Thai sentences.Therefore, a well-known algorithm, called Longest Matching algorithm (6) , was proposed for word segmentation of Thai text.Next, feature selection is used to select significant feature and reduce a large number of features.Feature selection methods were used in opinion mining i.e.Information Gain, Chi-square, Gain ratio etc.
After pre-processing data, pre-processed data is normally classified to specify positive or negative opinion of topics by using machine learning techniques i.e.Decision Tree (DT), Naïve Bayes (NB), Support vector machine (SVM), Maximum Entropy (ME) etc. Haddi et al., presented the opinion classification for movie review data set using SVM to classify the data into two classes, i.e. positive and negative.Bakliwal et al., (2011) (2) presented the sentiment classification with machine learning approach for movie and product review data set with NB, Multi-layer perception (MLP) and SVM.The output of the opinion classification is positive and negative.The experimental result showed that MLP is the best approach and gives 78.32% of accuracy.Diayan et al., (3) presented the machine learning techniques for cameras (comprising with 41,000 reviews) and compared with two algorithms: NB and SVM.The experimental results were shown that SVM gave 83.12% of accuracy and NB gave 81.12% of accuracy.Khainar and Kinikar (2013) (4) presented the simple data and algorithm of classification using NB, Maximum Entropy (ME) and SVM.
From the previous researches, they applied machine learning approaches to summarize overall opinions which unable to represent the result of consumer needs, because the result of machine learning classification reports only positive and negative opinion without considering opinions of each feature of interesting object.Thus, feature-based opinion mining was proposed to summarize opinion of each feature.The output of feature-based opinion mining is more general and useful in many real applications.For example, a sentence, "The actor is not good, but story of this movie is very good" will be classified by a machine learning technique that the movie is positive or negative, but it is not summarize that which feature is positive or negative.Actually, it should be summarized the actor is negative and the story is positive that will get more information for making decision.Therefore, feature-based opinion mining should be used for summarize the opinion.
Wu et al., (5) presented the interaction of customer opinion from dataset in www.tripadvisor.com,and select interesting features from property of hotels such as rooms, cleanness, service, location etc.They cab summarize opinion of each feature.Prombrut., (6) presented opinion classification on mobile phones reviews written in Thai.The reviews are collected from websites and blogs that contain comments about a mobile phone.Path-of-speech (POS tags) is used to specify feature of mobile phone (property of mobile phone) and polarity words.If a word is noun, it will be extracted to be a feature.If a word is verb or adjective or adverb, if will be extracted to be polarity word.Lui., (11) proposed feature-based opinion classification of property of a phone.The features of phone in this research consist of voice, screen, battery, size, weight.

The Research Methodology
Feature-based opinion mining in this paper consists of the process as shown in fig. 1.The details of the process is explained in the following sections.

Data collection
The 133 reviews of smart-phone are collected as a dataset from 10th of December 2014 to 22 nd of February 2015obtaining from smart-phone pages on Facebook by using graph API version 2.3.All reviews are written in Thai.In addition, this paper collects words to create three dictionaries as follows: (a) Dictionary A Dictionary A collects the most commonly misspelled words written on social networks and their spelled words.This dictionary composes of 69 misspelled words (together with their corresponding spelled words) are collected from a survey conducting with Thai teenagers who commonly use social networks (12) (13) .
(b) Dictionary B Dictionary B defines polarity words that are collected from "Mining Opinion in Product Reviews: A case Study of Mobile Phone Reviews" (6) and social networking websites.Polarity words consist of 40 positive and 27 negative words.
(c) Dictionary C Dictionary C imposes 11 feature words and their similarity feature words, which are collected from the website detailing smart-phone features in the market 12 .We collect the similarity word as some of feature words may be written in the different ways, such as screens and monitors etc.

Data pre-processing
(a) Text cleaning: in this process, misspelled words are corrected (as to obtain consistent data) before processing to the further steps.To achieve this, dictionary A is utilized.The processes of text cleaning are as follows: Step 1: Read the data text file to find misspelled words.
Step 2: Replace misspelled words with the correct words by using dictionary A.
(b) Word segmentation: in this process, sentences are separated in words using a software called LexTo provided by NECTEC3 .LexTo was developed based on the Longest Matching algorithm with the standard dictionary of NECTEC containing 42,222 Thai words.The example output of this process is demonstrated in table 1.

Feature and polarity identification
Feature identification is performed by using the third dictionary (dictionary C).Segmented words of each sentence are compared to feature dictionary C.This process is to identify the certain features matching with the words in sentences.The matched features are represented by the numbers, which are used to examine the polarity.The features are represented by using number as table 2. For polarity identification, segmented words of each sentence are compared to polarity words in dictionary B to identify positive or negative words.If the word is positive, it will be represented by (+), otherwise, it will be represented by (-).Then the remaining words in the sentence will be removed.The example of feature and polarity identification is shown in table 3.

Opinion sentence identification
Opinion sentence identification is performed by considering (+) and (-) after feature number as follows.Opinion sentence identification is show in table 4.

Feature and polarity identification
Opinion sentence identification

Summarize generation.
This method summarizes opinions of each feature on smart-phone by counting positive and negative opinions of each feature.From experimental, the summarization is shown in table 5.

Experimental Evaluation
The experiments are performed to analyze the accuracy of the proposed method.133 smart-phone reviews comprising with 171 relating smart-phone features are experimented.The proposed method gives accuracy rate in equation ( 1) and the summarization is shown in figure.2.
The accuracy rate of this experiment is 70.17%.The users can use the result from figure 2 for decision.In this experiment found that some features cannot be extracted because they are not in the dictionary and the segmentation algorithm cannot separate correct features.

Conclusions
This paper proposes feature-based opinion mining on the reviews of smart-phones, which are written in Thai.The results of this mining are demonstrated as the degree of positive and negative polarity of each feature.This result is able to benefit for both consumers and companies.In this study, the opinion mining process consists of the data pre-processing and the opinion classification with feature-based.The accuracy rate from the experimental result is 70.17%.The Future works will be focusing on the automatic generation of polarity dictionary.In addition, the application of on-line opinion classification is one of the future works.

Fig. 1 .
Fig. 1.Process of feature-based opinion mining on smart-phone

Table 2 .
Feature and feature representation.

Table 3 .
Feature and polarity identification.

Table 5 .
Summarization of the smart-phone.