Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction : Riconoscimento audio-visuale della lingua tibetana basata su Deep Dynamic Bayesian Network per una interazione naturale con umani, in: International Journal of Advanced Robotic Systems

General information
Author	Zhao, Yue; Wang, Hui; Ji, Qiang
Published	InTech Open Access Publisher, 2012
Edition
Extend
ISBN
Abstract	Audio‐visual speech recognition is a natural and robust approach to improving human‐robot interaction in noisy environments. Although multi‐stream Dynamic Bayesian Network and coupled HMM are widely used for audio‐visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN) to perform unsupervised extraction of spatial‐temporal multimodal features from Tibetan audio‐visual speech data and build an accurate audio‐visual speech recognition model under a no frameindependency assumption. The experiment results on Tibetan speech data from some real‐world environments showed the proposed DDBN outperforms the state‐of‐art methods in word recognition accuracy.

Collections

		Journal articles
		2000 and later

Superordinate work

International Journal of Advanced Robotic Systems
Author: Ottaviano, Erika; Ceccarelli, Marco; Husty, Manfred; Yu, Sung-Hoon; Kim, Yong-Tae; Park, Chang-Woo; Hyun, Chang-Ho; Chen, Xiulong; Feng, Weiming; Sun, Xianyang; Gao, Qing; Grigorescu, Sorin M.; Pozna, Claudiu; Liu, Wanli; Zhankui, Wang; Guo, Meng; Fu, Guoyu; Zhang, Jin; Chen, Wenyuan; Peng, Fengchao; Yang, Pei; Chen, Chunlin; Ding, Rui; Yu, Junzhi; Yang, Qinghai; Tan, Min; Polden, Joseph; Pan, [...]
Published: 2004

Linked items

Documents:

International Journal of Advanced Robotic Systems

Permanent links

DMG-Lib		https://www.dmg-lib.org/dmglib/handler?docum=32746009
Europeana		http://www.europeana.eu/portal/record/2020801/dmglib_handler_docum_32746009.html
PDF		Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

Data provider

Univ. Cassino

http://webuser.unicas.it/weblarm/larmindex.htm

Administrative information

Time of publication		2012
License information		This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

Follow us

Newsletter

General information