Skip to main content

Table 1 Data sets specification. Columns “min-class” and “max-class” refer to, respectively, the number of instances of the less-frequent class and the number of instances of the most-frequent class

From: Clus-DTI: improving decision-tree classification with a clustering-based decision-tree induction algorithm

Dataset

Instances

Numeric attributes

Nominal attributes

% Missing values

min-class

max-class

Classes

anneal

898

6

32

0 %

0

684

6

audiology

226

0

69

2 %

1

57

24

autos

205

15

10

1.15 %

0

67

7

balance-scale

625

4

0

0 %

49

288

3

breast-cancer

286

0

9

0.34 %

85

201

2

breast-w

699

9

0

0.25 %

241

458

2

colic

368

7

15

23 %

136

232

2

credit-a

690

6

9

0.64 %

307

383

2

credit-g

1000

7

13

0 %

300

700

2

diabetes

768

8

0

0 %

268

500

2

heart-c

303

6

7

0.17 %

0

165

5

heart-h

294

6

7

20 %

0

188

5

heart-statlog

270

13

0

0 %

120

150

2

hepatitis

155

6

13

5.67 %

32

123

2

hypothyroid

3772

7

22

5.54 %

2

3481

4

ionosphere

351

34

0

0 %

126

225

2

iris

150

4

0

0 %

50

50

3

kr-vs-kp

3196

0

36

0 %

1527

1669

2

labor

57

8

8

35.74 %

20

37

2

lymph

148

3

15

0 %

2

81

4

mushroom

8124

0

22

1.38 %

3916

4208

2

primary-tumor

339

0

17

3.90 %

0

84

22

segment

2310

19

0

0 %

330

330

7

sick

3772

6

22

2.17 %

231

3541

2

sonar

208

60

0

0 %

97

111

2

soybean

683

0

35

9.77 %

8

92

19

waveform

5000

40

0

0 %

1653

1692

3