Single Versus Multiple Measures for Fuzzy Association Rule Mining

Data mining can identify patterns of data, find relationships within the data to predict the outcome or predict future data trends. Association rule mining is part of data mining and it has been applied in many fields. The performance of association rule mining depends on the support and confidence measures. In this research, we perform a comparative of number of rules that is affected form different kinds of measures. We propose the idea of using a combination of measures (such as support*confidence, confidence*lift) instead of considering only support value or confidence. The main objective is to reduce the number of rules in the process of fuzzy association rule mining, in which a compact rule set is important to the real application of the fuzzy system. The experimental results reveal that our idea of combining multiple measures for fuzzy association rule mining can significantly reduce the number of association rules.


Introduction
At present, data mining has been very popular because it helps extract, search, predict, and many other knowledgeintensive tasks.Usefulness is from employing existing data to predict future trends in data, extracting patterns of the data, and finding the relationship within the data group.These benefits make data mining been widely applied.In medicine, it is used in the data analysis of the patient to see a trend whether or not the patient is likely to be ill from a disease.The business side also uses data mining to analyze the buying patterns of consumers.
Data mining techniques can be practiced in various ways.The technology that has been adopted widely is association rule mining in relation to the incident and bring those things that occur together to create the association rules.
Finding association rules was proposed by Agrawal (1) to be used in data mining.To find efficiently association rules a criterion for calculating the support of rule is necessary.A confidence criterion is also needed to reduce the number of rules.
The most relevant research to ours is to reduce the number of rules in different ways, such as the work of F. P Pach et al. (2) .They proposed the concept to reduce the number of fuzzy association rules with pruning technique, which makes the association rules smaller in their size.The research of M. M. Ballesteros et al. (3) aimed to improve the algorithm and they compared the fitness function obtained from the various developed algorithms as well as there is quite a few the traditional algorithm.However, research work that compares the number of rules based on different measures, especially a multi-criteria measure.Therefore, we propose this research that was performed to compare the number of rules for different measures and also apply more than one measures to reduce the number of discovered association rules.

Association Rule Mining
Association rule mining is a process that gained much popularity in data mining.The process to find relationships that hidden in a dataset can be explained with the following example.
From Table 1 which is a set of ten transactions, the DOI: 10.12792/iciae2015.050frequency of the purchases for each product type in association to other products are shown in table 2.
From Table 2, one can create an instance of the relationships that can be represented as "if antecedent then consequence" (or "antecedent ⇒ consequence") as follows: If customer buy milk, then they also buy bread.
If customer buy beer, then they also buy potatoes.
If customer buy bean, then they also buy potatoes.

Different Measures of Association Rule Mining
(a) Support measure Support measure is frequency of each purchased item.This measure supports usefulness of association rules.The higher the frequent that association occur.The boundary of support measure ranges between 0 to 1 and can be computed as in the following equation: where: n(X∩Y) is the number of time that the item X happened together with Y in the same transaction.
N is the total number of transaction.Example 1: finding the value of support of rule with the support measure.
From Table 1 when using the support measure to find the value of support of rule that customers buy milk and also buy bread, the computation is as follows: That means the value of support of rule "if milk then bread" is 0.5, or 50% of the whole transactions.

(b) Confidence measure
Confidence measure is the measure to indicate the reliability of the rule.The boundary of confidence measure is between 0 to 1 and can be computed as follows: where: support(X⇒Y) is the frequency that both items X and Y occur in the same transaction, support(X) is the support that X has happened counted from all transactions.Example 2: finding the value of confidence of rule with the confidence measure.
From Table 1 when using the confidence measure to find the value of confidence of rule that customers buy milk and also buy bread, the computation is as follows: That means the value of confidence of rule that if milk is bought then bread also be bought is 0.833, or 83% of confidence.

(c) Lift measure
Lift measure has its boundary of value between 0 to infinity.If the value is less than 1, it means that X and Y are related in a negative way.If a value equal 1, it means that X and Y are independent.If a value is greater than 1, it indicates that X and Y are correlated in a positive way.The computation is shown in the following equation: where: support(X ⇒ Y) is the frequency that X and Y occur in the same transaction, support(X) is the support that X happened against all transactions, support(Y) is the support that Y happened against all transactions.Example 3: finding the value of lift of rule with the lift measure.
From Table 1 when using the lift measure to find the lifted confidence that customers who buy milk also buy bread, the lift value computation can be illustrated as follows: ( ⇒ ) = 0.5 0.6 * 0.5 = 1.667 That means the value of lift of rule milk ⇒ bread is 1.667; that is, milk and bread have a positive relationship.

(d) Conviction measure
Conviction measure has its boundary value between 0 to infinity.If the value is less than 1, it means that X and Y are related in a negative way.If a value equal 1, it means that X and Y are independent.If a value is greater than 1, it indicates that X and Y are correlated in a positive way.The conviction measure can be computed as in the following equation: where: support(Y) is the support value that Y happened against all transactions, confidence(X⇒Y) is the reliability of the X and Y co-occurring in the same transaction.Example 4: finding the value of conviction of rule with the conviction measure.
From Table 1 when using the conviction measure to find the conviction of rules that customers who buy milk also buy bread, the computation is as follows: That means the conviction of rule that if milk is bought then bread is also bought is 2.99.That means milk and bread have a positive relationship.
(e) Gain measure Gain measure is a measure that is used to compute added value or change of rules.The boundary of gain measure is between -0.5 to 1.Its computation is as follows: (5) where: support(Y) is the support that Y happened against all transactions, confidence(X⇒Y) is the reliability of the X and Y co-occurring in the same transaction.Example 5: finding the gain value of rule with the gain measure.
From Table 1 when using the gain measure to compute usefulness of rules that customers who buy milk also buy bread, the computation is as follows: ( ⇒ ) = 0.833 − 0.5 = 0.333 That means the gain value of rule that if milk is bought then bread is also bought is 0.333.

(f) Leverage measure
Leverage measure is a measure that indicates the strength of the rule.If the leverage is less than 0, it means that X and Y are related in a negative way.If the value is equal to 0, it means that X and Y are independent, and any value greater than 0 indicates that X and Y are positively dependent.The computation is as follows: where: support(X⇒Y) is the support that X and Y cooccurring in the same transaction, support(X) is the support that X happened against all transactions, support(Y) is the support that Y happened against all transactions.
Example 6: finding the value of leverage of rule with the leverage measure.
From Table 1 when using the leverage measure to evaluate the rule milk ⇒ bread, the computation is as follows: ( ⇒ ) = 0.5 − (0.6 * 0.5) = 0.2 That means the value of leverage of rule that if milk is bought then bread is also bought is 0.2, which implies milk and bread have a positive relationship.
Figure 1 A research framework for comparing the effect of different measure for fuzzy association rules Figure 2 Frequent fuzzy item set algorithm (2) Figure 3 Fuzzy association rule-base generation algorithm (2) Data

Fuzzy Association Rule Generation Input
Frequencies item set Output Fuzzy association rules Method: 1. Create association rules from all item set.2. Calculated the correlation with the measure.
3. Select the association rules with the support of the rules more than the minimum.

Methodology
Researchers have designed the process and findings of the comparison of single measure and multi-measure for the discovery of fuzzy association rules as show in Figure 1.
From figure 3 can explain the process of the research are as follows: -Input data partitions In the process of the associative-based classification is the partitioning of an attribute whose value is continuous.It is a transformation of continuous values to be discrete intervals based on the concept of fuzzy sets.
-Frequent fuzzy item set generation At this process, we will search and count the frequency of items and combination of items.These items are to be used in the creation of rules, regardless of the support of the rule.The procedure is shown in the Figure 2.
-Fuzzy association rule generation In this process, we will create the rules that are screened by different measures and will vary various minimum support thresholds in order to create various final rule sets.The procedure works as shown in figure 3.

Experimental Results
This study used Breast Cancer Wisconsin data that are obtained from the UCI Machine Learning Repository.This dataset has 699 instances with 10 attributes and a target attribute of two classes.We used this data to find fuzzy association rules with various single measures and different combinations of two measures.The support of the rules has been set to be 90%, 80%, 70% 60%, and 50%.The setting of each measuring value is summarized and shown in Table 4.The experimental results are shown in Tables 5-7.
The results shown in Table 5 are the comparison of number of rules obtained from using different single measure with the support of rules ranging from 90%, 80%, 70%, 60%, and 50%.It can be seen that the each single measure gives a number of fuzzy association rules different from the value in each criteria.There are some measures of screening fuzzy association rules that appeared to produce rules less than the minimum support of the rule.We noticed that the number of rules in each measure is not equal.When using more than one measures for determining the number of fuzzy association rules, we set the support of the rule from 70% were not able to give any rule.
It can be seen from Table 6 that a comparison in terms of of number of rules obtained from using different 2 measures by using support measure as the main measure and varying support of the rule from 70% down to 60% and 50% can produce a small set of fuzzy association rules.Using more than a single measure as the condition in the selection of rules will result in less fuzzy association rules because the support of rules when combined with other calculation criteria results in the decrease of the rules that satisfy the threshold.
When we use more than a single measure to screen rules using the confidence as the main measure with the support of the rule ranging from 70%, 60%, to 50%, the results are shown as in Table 7.
It can be seen that the number of fuzzy association rules by using confidence measure as the main measure is different from using support measure as the main measure.Using the confidence as the main measure can provide a slightly larger number of fuzzy association rules than the use of support measure as the main measure.This may be can see from the fact that confidence is a more relax criteria than the support measure.

Conclusions
Association rule mining is one technique of data mining that has been widely used in a number of applications.For some numeric and continuous values, mining for association rules cannot be applied directly because of the enormous amount of possible values.Fuzzy set concept can be applied to handle this situation and gives rise to the new sub-area called fuzzy association rule mining.A small number of rules given by the fuzzy association rule mining process is important to the performance justification of the process.In the work we propose the findings from our experimentation that a combination of measures to select the final rule set gives a better result than a single measure.Moreover, we found that the two-measures such as support*lift and confidence*leverage yield a reasonable set of fuzzy association rules.

Table 1
Transactional database of customer's purchases.

Table 2
Frequency of each combination of two product types.

Table 3
Measures for finding support of rules.

Table 4
Percentage value for each measure used in the experimentation compared with the boundary value for each measure.

Table 5
Number of rules and average values obtained of each single measure (SOR is minimum support of rule).

Table 6
Number of rules and average values obtained from a combination of support and other five criteria (SOR is minimum support of rule).

Table 7
Number of rules and average values obtained from a combination of confidence and other criteria (SOR is minimum support of rule).