Natalia Labuda1, Julia Seeliger1, Tomasz Gedrande1, Karol Kozak1,2
1. Medical Faculty, Dresden University of Technology, Carl Gustav Carus University Hospital Dresden.
2. Faculty of Management, Finances and Informatics, Wroclaw University of Economy.
Abstract: The k-nearest neighbours (knn) is a simple but effective method of classification. K is the most important parameter in medical data classification based on k-nearest neighbor algorithm (knn). The major drawback with respect to knn is dependency on the selection of a “good value” for k. The value of k is usually determined by the cross-validation method but if k is too large, big classes will overwhelm small ones. On the other hand, if k is too small, the advantage of the knn algorithm will not be exhibited. Therefore, it is very likely that a fixed k value will result in a bias on large classes. In this paper we propose a modified k-nearest neighbor method, which uses different k values for different regions in an entire data set, rather than a fixed k value for a complete data set. The number of nearest neighbors is selected locally based on P-value Rate criteria. We apply the modified knn method to diagnose type II diabetes dataset which includes 768 samples from diabetic patients taken from Pima Indians Dataset.
Pages: 1 – 13 | Full PDF Paper