Leveraging Region based Convolutional Neural Network (RCNN) with GMM Model Assisted by Hidden Markov Model for Speaker Detection

Authors

  • Aditya B Dhruva National Institute of Technology Tiruchirappalli image/svg+xml Author

DOI:

https://doi.org/10.21467/proceedings.7.4.2

Keywords:

Gaussian Mixture model, Region based Convolutional Neural Network, Speaker Detection and Diarization

Abstract

This study suggests a novel deep learning and established methods-based speaker detection and identification system. A region-based CNN analyzes spectrograms for speaker detection, guiding a Gaussian Mixture Model (GMM) for improved speaker clustering. This approach aims to achieve higher accuracy and efficiency compared to traditional diarization methods.

References

[1] X. Anguera, S. Bozonnet, N. Evans, C. Fredouille, and G. Friedl, “Speaker diarization: A review of recent research,” IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 2, pp. 357–370, Feb. 2012. doi: 10.1109/TASL.2011.2143856.

[2] M. Diez, L. Burget, F. Landini, and J. Černocký, “Analysis of speaker diarization based on Bayesian HMM with eigenvoice priors,” in Proc. 2019 IEEE Autom. Speech Recognit. Understand. Workshop (ASRU), Singapore, Singapore, Dec. 2019, pp. 913–920. doi: 10.1109/ASRU46091.2019.9003823.

[3] J. I. De La Rosa and A. Becerra, “Speech recognition in a dialog system: From conventional to deep processing. A case study a pplied to Spanish,” Multim. Tools Appl., vol. 77, no. 10, pp. 13019–13043, May 2018. doi: 10.1007/s11042-017-5160-5.

[4] S. Nemade, Y. K. Sharma, and R. D. Patil, “To improve voice recognition systems using GMM and HMM classification models,” Int. J. Innov. Technol. Explor. Eng. (IJITEE), vol. 8, no. 11, pp. 4683-4686, Sep. 2019. doi: 10.35940/ijitee.K2204.0981119.

[5] P. C. Woodland, H. Y. Yu, and C. V. Ramamurthy, “Speaker recognition and diarization,” in The Handbook of Speech Production, M. A. Redford, Ed. Hoboken, NJ, USA: Wiley-Blackwell, 2015, pp. 461–486.

[6] R. Girshick, “Fast R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 1, pp. 144–154, Jan. 2016. doi:10.1109/TPAMI.2015.2437391.

Downloads

Published

2025-06-30

How to Cite

[1]
A. B. Dhruva, “Leveraging Region based Convolutional Neural Network (RCNN) with GMM Model Assisted by Hidden Markov Model for Speaker Detection”, AIJR Proc., vol. 7, no. 4, pp. 12–21, Jun. 2025, doi: 10.21467/proceedings.7.4.2.