Documenti accessibili
|
Informazioni generali |
Autore |
Zhao, Yue; Wang, Hui; Ji, Qiang |
Pubblicato |
InTech Open Access Publisher, 2012
|
edizione |
|
Volume |
|
ISBN |
|
Abstract |
Audio‐visual speech recognition is a natural and
robust approach to improving human‐robot interaction in
noisy environments. Although multi‐stream Dynamic
Bayesian Network and coupled HMM are widely used
for audio‐visual speech recognition, they fail to learn the
shared features between modalities and ignore the
dependency of features among the frames within each
discrete state. In this paper, we propose a Deep Dynamic
Bayesian Network (DDBN) to perform unsupervised
extraction of spatial‐temporal multimodal features from
Tibetan audio‐visual speech data and build an accurate
audio‐visual speech recognition model under a no frameindependency
assumption. The experiment results on
Tibetan speech data from some real‐world environments
showed the proposed DDBN outperforms the state‐of‐art
methods in word recognition accuracy. |
|
|
|
International Journal of Advanced Robotic Systems
Autore: Ottaviano, Erika; Ceccarelli, Marco; Husty, Manfred; Yu, Sung-Hoon; Kim, Yong-Tae; Park, Chang-Woo; Hyun, Chang-Ho; Chen, Xiulong; Feng, Weiming; Sun, Xianyang; Gao, Qing; Grigorescu, Sorin M.; Pozna, Claudiu; Liu, Wanli; Zhankui, Wang; Guo, Meng; Fu, Guoyu; Zhang, Jin; Chen, Wenyuan; Peng, Fengchao; Yang, Pei; Chen, Chunlin; Ding, Rui; Yu, Junzhi; Yang, Qinghai; Tan, Min; Polden, Joseph; Pan, [...]
Pubblicato: 2004
|
|