Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction : Riconoscimento audio-visuale della lingua tibetana basata su Deep Dynamic Bayesian Network per una interazione naturale con umani, in: International Journal of Advanced Robotic Systems

Informazioni generali
Autore	Zhao, Yue; Wang, Hui; Ji, Qiang
Pubblicato	InTech Open Access Publisher, 2012
edizione
Volume
ISBN
Abstract	Audio‐visual speech recognition is a natural and robust approach to improving human‐robot interaction in noisy environments. Although multi‐stream Dynamic Bayesian Network and coupled HMM are widely used for audio‐visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN) to perform unsupervised extraction of spatial‐temporal multimodal features from Tibetan audio‐visual speech data and build an accurate audio‐visual speech recognition model under a no frameindependency assumption. The experiment results on Tibetan speech data from some real‐world environments showed the proposed DDBN outperforms the state‐of‐art methods in word recognition accuracy.

Collections

		Articoli a Rivista
		2000 ed oltre

Superordinate work

International Journal of Advanced Robotic Systems
Autore: Ottaviano, Erika; Ceccarelli, Marco; Husty, Manfred; Yu, Sung-Hoon; Kim, Yong-Tae; Park, Chang-Woo; Hyun, Chang-Ho; Chen, Xiulong; Feng, Weiming; Sun, Xianyang; Gao, Qing; Grigorescu, Sorin M.; Pozna, Claudiu; Liu, Wanli; Zhankui, Wang; Guo, Meng; Fu, Guoyu; Zhang, Jin; Chen, Wenyuan; Peng, Fengchao; Yang, Pei; Chen, Chunlin; Ding, Rui; Yu, Junzhi; Yang, Qinghai; Tan, Min; Polden, Joseph; Pan, [...]
Pubblicato: 2004

Linked items

Documents:

International Journal of Advanced Robotic Systems

Permanent links

DMG-Lib		https://www.dmg-lib.org/dmglib/handler?docum=32746009
Europeana		http://www.europeana.eu/portal/record/2020801/dmglib_handler_docum_32746009.html
PDF		Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

Data provider

Univ. Cassino

http://webuser.unicas.it/weblarm/larmindex.htm

Administrative information

Time of publication		2012
License information		This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

Follow us

Newsletter

Informazioni generali