Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction : Riconoscimento audio-visuale della lingua tibetana basata su Deep Dynamic Bayesian Network per una interazione naturale con umani, in: International Journal of Advanced Robotic Systems

Allgemeine Angaben
Autor	Zhao, Yue; Wang, Hui; Ji, Qiang
Erschienen	InTech Open Access Publisher, 2012
Ausgabe
Umfang
ISBN
Kurzbeschreibung	Audio‐visual speech recognition is a natural and robust approach to improving human‐robot interaction in noisy environments. Although multi‐stream Dynamic Bayesian Network and coupled HMM are widely used for audio‐visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN) to perform unsupervised extraction of spatial‐temporal multimodal features from Tibetan audio‐visual speech data and build an accurate audio‐visual speech recognition model under a no frameindependency assumption. The experiment results on Tibetan speech data from some real‐world environments showed the proposed DDBN outperforms the state‐of‐art methods in word recognition accuracy.

Sammlungen

		Zeitschriftenartikel
		ab 2000

Übergeordnete Werke

International Journal of Advanced Robotic Systems
Autor: Ottaviano, Erika; Ceccarelli, Marco; Husty, Manfred; Yu, Sung-Hoon; Kim, Yong-Tae; Park, Chang-Woo; Hyun, Chang-Ho; Chen, Xiulong; Feng, Weiming; Sun, Xianyang; Gao, Qing; Grigorescu, Sorin M.; Pozna, Claudiu; Liu, Wanli; Zhankui, Wang; Guo, Meng; Fu, Guoyu; Zhang, Jin; Chen, Wenyuan; Peng, Fengchao; Yang, Pei; Chen, Chunlin; Ding, Rui; Yu, Junzhi; Yang, Qinghai; Tan, Min; Polden, Joseph; Pan, [...]
Erschienen: 2004

Verknüpfte Datensätze

Dokumente:

International Journal of Advanced Robotic Systems

Permanentlinks

DMG-Lib		https://www.dmg-lib.org/dmglib/handler?docum=32746009
Europeana		http://www.europeana.eu/portal/record/2020801/dmglib_handler_docum_32746009.html
PDF		Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

Datenbereitsteller

Univ. Cassino

http://webuser.unicas.it/weblarm/larmindex.htm

Verwaltungsinformationen

Publikationsdatum		2012
Lizenzinformation		This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

Folgen Sie uns

Newsletter

Allgemeine Angaben