Main > Laser & Optoelectronics Progress >  Volume 57 >  Issue 18 >  Page 181702 > Article
  • Abstract
  • Abstract
  • View Summary
  • Figures (5)
  • Tables (4)
  • Equations (0)
  • References (27)
  • Get PDF(in Chinese)
  • Paper Information
  • Received: Feb. 5, 2020

    Accepted: Mar. 19, 2020

    Posted: Sep. 1, 2020

    Published Online: Sep. 2, 2020

    The Author Email: Wang Yi (koala_wy@tju.edu.cn)

    DOI: 10.3788/LOP57.181702

  • Get Citation
  • Copy Citation Text

    Kailong Ren, Yi Wang, Xiaodong Chen, Huaiyu Cai. Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181702

    Download Citation

  • Category
  • Medical Optics and Biotechnology
  • Share
Laser & Optoelectronics Progress, Vol. 57, Issue 18, 181702 (2020)

Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control

Ren Kailong, Wang Yi*, Chen Xiaodong, and Cai Huaiyu

Author Affiliations

  • School of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China

Abstract

A long short-term memory (LSTM) recurrent neural network based on an i-vector feature is presented for speech control of laparoscopic supporter to realize short-term isolated word command recognition from the speech of a specific doctor using small training samples. In this model, LSTM recurrent neural network is used as the basic model, Mel-frequency cepstrum coefficient (MFCC) is used as the input characteristic parameter, i-vector feature is used as the deep input information of LSTM recurrent neural network, and the deep feature information behind LSTM layer in the neural network is spliced to achieve the purpose of parameter fusion, so as to realize the accurate recognition of the voice instructions of the specific surgeon and the rejection recognition of the voice instructions of the non surgeon. This approach offers a secure and intelligent speech recognition scheme for laparoscopic surgeries. Further, a self-built speech database is used as a training library to verify speech recognition performance of the proposed algorithm as well as its rejection performance for the speech not included in the training library. Experiments show that compared with dynamic time warping(DTW)and Gaussian mixture model-Hidden Markov model (GMM-HMM), the proposed model exhibits a 99.6% correct recognition rate for voice commands of specific people recorded in the training library while maintaining a false acceptance rate of 0%, with an average false acceptance rate of 2.5% for voices not included in the training library. The proposed model meets the requirements of accuracy and safety expected by laparoscopic supporter control standards.

keywords

Please Enter Your Email: