Skip to main content

Table 3 Most frequent FER confusion percentages in GMM-HMM and CNN-HTSVM models where the true phoneme was confused as being the predicted phoneme

From: Theoretical learning guarantees applied to acoustic modeling

GMM-HMM

CNN-HTSVM

True

Pred

Conf (%)

True

Pred

Conf (%)

s

z

33.14

s

z

15.16

ih

uw

16.00

ay

ae

39.64

t

ch

17.58

ao

aa

26.58

er

r

32.46

r

er

18.84

ao

l

28.00

sh

s

26.01

iy

y

14.23

aa

ae

16.07

s

sh

10.09

ah

ae

14.79

ae

t

14.32

t

s

7.61

ih

z

10.07

iy

ih

6.48

w

ao

45.52

er

r

7.64

iy

uw

12.80

er

r

10.64

k

eh

11.55

z

s

18.01

ih

t

7.70

ay

aa

15.78

ah

l

16.67

t

s

10.61

d

t

14.46

iy

ih

9.48