Leveraging Region based Convolutional Neural Network (RCNN) with GMM Model Assisted by Hidden Markov Model for Speaker Detection

Aditya B Dhruva

doi:10.21467/proceedings.7.4.2

Authors

Aditya B Dhruva National Institute of Technology Tiruchirappalli Author

DOI:

https://doi.org/10.21467/proceedings.7.4.2

Keywords:

Gaussian Mixture model, Region based Convolutional Neural Network, Speaker Detection and Diarization

Abstract

This study suggests a novel deep learning and established methods-based speaker detection and identification system. A region-based CNN analyzes spectrograms for speaker detection, guiding a Gaussian Mixture Model (GMM) for improved speaker clustering. This approach aims to achieve higher accuracy and efficiency compared to traditional diarization methods.

References

[1] X. Anguera, S. Bozonnet, N. Evans, C. Fredouille, and G. Friedl, “Speaker diarization: A review of recent research,” IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 2, pp. 357–370, Feb. 2012. doi: 10.1109/TASL.2011.2143856.

[2] M. Diez, L. Burget, F. Landini, and J. Černocký, “Analysis of speaker diarization based on Bayesian HMM with eigenvoice priors,” in Proc. 2019 IEEE Autom. Speech Recognit. Understand. Workshop (ASRU), Singapore, Singapore, Dec. 2019, pp. 913–920. doi: 10.1109/ASRU46091.2019.9003823.

[3] J. I. De La Rosa and A. Becerra, “Speech recognition in a dialog system: From conventional to deep processing. A case study a pplied to Spanish,” Multim. Tools Appl., vol. 77, no. 10, pp. 13019–13043, May 2018. doi: 10.1007/s11042-017-5160-5.

[4] S. Nemade, Y. K. Sharma, and R. D. Patil, “To improve voice recognition systems using GMM and HMM classification models,” Int. J. Innov. Technol. Explor. Eng. (IJITEE), vol. 8, no. 11, pp. 4683-4686, Sep. 2019. doi: 10.35940/ijitee.K2204.0981119.

[5] P. C. Woodland, H. Y. Yu, and C. V. Ramamurthy, “Speaker recognition and diarization,” in The Handbook of Speech Production, M. A. Redford, Ed. Hoboken, NJ, USA: Wiley-Blackwell, 2015, pp. 461–486.

[6] R. Girshick, “Fast R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 1, pp. 144–154, Jan. 2016. doi:10.1109/TPAMI.2015.2437391.

Leveraging Region based Convolutional Neural Network (RCNN) with GMM Model Assisted by Hidden Markov Model for Speaker Detection

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite