Skip to main content

Table 3 Most frequent FER confusion percentages in GMM-HMM and CNN-HTSVM models where the true phoneme was confused as being the predicted phoneme

From: Theoretical learning guarantees applied to acoustic modeling

GMM-HMM CNN-HTSVM
True Pred Conf (%) True Pred Conf (%)
s z 33.14 s z 15.16
ih uw 16.00 ay ae 39.64
t ch 17.58 ao aa 26.58
er r 32.46 r er 18.84
ao l 28.00 sh s 26.01
iy y 14.23 aa ae 16.07
s sh 10.09 ah ae 14.79
ae t 14.32 t s 7.61
ih z 10.07 iy ih 6.48
w ao 45.52 er r 7.64
iy uw 12.80 er r 10.64
k eh 11.55 z s 18.01
ih t 7.70 ay aa 15.78
ah l 16.67 t s 10.61
d t 14.46 iy ih 9.48