Last modified: 2019-10-29
Abstract
This research applies Correlation-Based Feature Selection with CNK (Combination of Naïve Bayes Classifier And K-Nearest Neighbor) to predict the risk of hepatitis. CNK is a combined method of Naive Bayes and K-Nearest Neighbor to classify datasets that contain categorical and numeric attributes. Ten-Fold Cross Validation is used in this study to divide training data and testing data. The best classification performance of CFS with CNK is accuracy of 0.8129; sensitivity of 0.8699; specificity of 0.5938; and AUC of 0.7318 at k = 4. CFS apparently reduces CNK performance in the Hepatitis dataset but can be improved by balancing data using SMOTE. The best classification performance of SMOTE, CFS, and CNK is accuracy of 0.8; sensitivity of 0.7967; specificity of 0.8125; and AUC of 0.8046 at k = 3. The classification of SMOTE, CFS, and CNK in the Hepatitis dataset can be categorized as a Good classification based on AUC.