Skip to main content

Table 1 Data sets specification. Columns “min-class” and “max-class” refer to, respectively, the number of instances of the less-frequent class and the number of instances of the most-frequent class

From: Clus-DTI: improving decision-tree classification with a clustering-based decision-tree induction algorithm

Dataset Instances Numeric attributes Nominal attributes % Missing values min-class max-class Classes
anneal 898 6 32 0 % 0 684 6
audiology 226 0 69 2 % 1 57 24
autos 205 15 10 1.15 % 0 67 7
balance-scale 625 4 0 0 % 49 288 3
breast-cancer 286 0 9 0.34 % 85 201 2
breast-w 699 9 0 0.25 % 241 458 2
colic 368 7 15 23 % 136 232 2
credit-a 690 6 9 0.64 % 307 383 2
credit-g 1000 7 13 0 % 300 700 2
diabetes 768 8 0 0 % 268 500 2
heart-c 303 6 7 0.17 % 0 165 5
heart-h 294 6 7 20 % 0 188 5
heart-statlog 270 13 0 0 % 120 150 2
hepatitis 155 6 13 5.67 % 32 123 2
hypothyroid 3772 7 22 5.54 % 2 3481 4
ionosphere 351 34 0 0 % 126 225 2
iris 150 4 0 0 % 50 50 3
kr-vs-kp 3196 0 36 0 % 1527 1669 2
labor 57 8 8 35.74 % 20 37 2
lymph 148 3 15 0 % 2 81 4
mushroom 8124 0 22 1.38 % 3916 4208 2
primary-tumor 339 0 17 3.90 % 0 84 22
segment 2310 19 0 0 % 330 330 7
sick 3772 6 22 2.17 % 231 3541 2
sonar 208 60 0 0 % 97 111 2
soybean 683 0 35 9.77 % 8 92 19
waveform 5000 40 0 0 % 1653 1692 3