Implementation of Decision Tree Classifier using WEKA tool

Practical - 8
Implementation of Decision Tree Classifier using WEKA tool.

Download Practical


J48 Classifier
C4.5 (J48) is an algorithm used to generate a decision tree developed by Ross Quinlan mentioned earlier. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. It became quite popular after ranking #1 in the Top 10 Algorithms in Data Mining pre-eminent paper published by Springer LNCS in 2008.

Pseudocode
In pseudocode, the general algorithm for building decision trees is:
1. Check for the above base cases.
2. For each attribute a, find the normalized information gain ratio from splitting on a.
3. Let a_best be the attribute with the highest normalized information gain.
4. Create a decision node that splits on a_best.
5. Recur on the sublists obtained by splitting on a_best, and add those nodes as children
of node.

Implementation
1. Begin by loading weather.numeric.artf file, as seen in Figure.




2. Select the "Classify" tab and click the "Choose" button to select the J48 classifier, as depicted in Figure. Note that J48 (implementation of C4.5 algorithm) does not require discretization of numeric attributes.




3. We can specify the various parameters. These can be specified by clicking in the text box to the right of the "Choose" button, as depicted in Figure. Here, we accept the default values. The default version does perform some pruning (using the subtree raising approach), but does not perform error pruning.



4. Under the "Test options" in the main panel we select 10-fold cross-validation as our evaluation approach. Since we do not have separate evaluation data set, this is necessary to get a reasonable idea of accuracy of the generated model. We now click "Start" to generate the model. The ASCII version of the tree as well as evaluation statistics will appear in the eight panel when the model construction is completed.



5. WEKA also let's us view a graphical rendition of the classification tree. This can be done by right clicking the last result set (as before) and selecting "Visualize tree" from the pop-up menu. The tree for this example is depicted in Figure.



6. We will now use our model to classify the new instances. A portion of the new instances ARFF file is depicted in Figure.



7. In the main panel, under "Test options" click the "Supplied test set" radio button, and then click the "Set..." button. This will pop up a window which allows you to open the file containing test instances, as in Figure. In this case, we open the file "weather.numeric.arff" and upon returning to the main window.



8. click the "start" button. This, once again generates the models from our training data, but this time it applies the model to the new unclassified instances in the "weather.numeric.arff" file in order to predict the value of "pep" attribute. The result is depicted in Figure.



9. Right-click the most recent result set in the left "Result list" panel. In the resulting pop-up window select the menu item "Visualize classifier errors". This brings up a separate window containing a two-dimensional graph. These steps and the resulting window is shown in Figure.



10. We would like to "save" the classification results from which the graph is generated. In the new window, we click on the "Save" button and save the result as the file: "weather.numeric1.arff", as shown in Figure.


Comments

Popular posts from this blog

Study of DB Miner Tool

Study of WEKA tool

Create calculated member using arithmetic operators and member property of dimension member