Unsupervised Machine Learning to Detect Abnormal Activities using CNN & 3D Spatial-Temporal AutoEncoder (3DSTAE)

Authors

  • Mandakini Ingle Department of Computer Science and Engineering, Medi-Caps University, A.B. Road, Rau, Indore – 453331 Author
  • Pinky Rane Department of Computer Science and Engineering, Medi-Caps University, A.B. Road, Rau, Indore – 453331 Author
  • Harshraj Sharma Department of Computer Science and Engineering, Medi-Caps University, A.B. Road, Rau, Indore – 453331 Author
  • Ishika Goyal Department of Computer Science and Engineering, Medi-Caps University, A.B. Road, Rau, Indore – 453331 Author
  • Kapil Nakum Department of Computer Science and Engineering, Medi-Caps University, A.B. Road, Rau, Indore – 453331 Author

DOI:

https://doi.org/10.21467/proceedings.7.6.39

Keywords:

camera, long term care, autoencoder

Abstract

Live video feeds are very much important now for security, traffic, and keeping factories in check. Deep learning, using things like CNNs and Autoencoders, is really useful for keeping an eye on these videos. This paper talks about using CNNs and Autoencoders to watch live video and catch anything weird. We are teaching a CNN to focus on what matters, and then using an Autoencoder to spot things that do not seem right. The CNN learns what's normal by watching lots of regular video clips. The Autoencoder gets very well at copying those normal clips. When we test it, the Autoencoder looks for trouble by comparing its rebuilt clips with the real ones. The research has been carried out with the UCSD Pedestrian dataset, which has tons of walking scenes. The results say our system is spot-on and better than other ways of finding odd stuff in live video. This could be a game-changer for security, traffic, and factories where you need to catch problems ASAP. So, this study says that CNNs and Autoencoders are a good team for watching video and finding weird action as it happens. It also says that deep learning can really help with video checking in all sorts of places.

References

[1] Du Tran, Rainer Sorokin, Gerard Medioni. Long short-term memory over observation times for activity recognition. European Conference on Computer Vision (ECCV). https://doi.org/10.1007/978-3-319-10599-4_18

[2] Nawaratne, R., Alahakoon, D., De Silva, D., & Yu, X. (2020). Spatiotemporal Anomaly Detection Using Deep Learning for Real-Time Video Surveillance. IEEE Transactions on Industrial Informatics, 16(1), 393-402. https://doi.org/10.1109/tii.2019.2938527

[3] Nguyen, H., Loan, T.T.K., Mao, B. D., & Huh, E (2015). Low cost real-time system monitoring using Raspberry Pi. In International Conference on Ubiquitous and Future Networks. https://doi.org/10.1109/icufn.2015.7182665

[4] Kim, J., & Grauman, K. (2009). Observe Locally, Infer Globally: A Space-Time MRF for Detecting Abnormal activities with Incremental Updates. https://doi.org/10.1109/CVPR.2009.5206757

[5] Ko, T. H. (2008). A survey on behavior analysis in video surveillance for homeland security applications. In Applied Imagery Pattern Recognition Workshop. https://doi.org/10.1109/aipr.2008.4906450

[6] Xu, D., Ricci, E., Yan, Y., Song, J., & Sebe, N. (2015). Learning Deep Representations of Appearance and Motion for Anomalous Event Detection. https://doi.org/10.5244/C.29.8

[7] Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A. K., & Davis, L. S. (2016). Learning Temporal Regularity in Video Sequences. https://doi.org/10.1109/CVPR.2016.597

[8] Zhang et al., 2024. https://doi.org/10.48550/arXiv.2410.05900

[9] Nejad & Haque, 2024. https://doi.org/10.48550/arXiv.2411.08755

[10] Wu et al., 2024. https://doi.org/10.48550/arXiv.2408.05905

[11] Poirier, 2024. https://doi.org/10.48550/arXiv.2410.15909

[12] Rezaee, K., Rezakhani, S. M., Khosravi, M. R., & Moghimi, M. M. (2021). A survey on deep learning-based real-time crowd anomaly detection for secure distributed video surveillance. Personal and Ubiquitous Computing. https://doi.org/10.1007/s00779-021-01586-5

[13] Yun et al., 2024. https://doi.org/10.48550/arXiv.2406.18815

[14] Liu, W., Luo, W., Lian, D., & Gao, S. (2018). Future Frame Prediction for Anomaly Detection – A New Baseline. https://doi.org/10.1109/CVPR.2018.00957

[15] Tang, Y. L., Zhao, L., Zhang, S., Gong, C., Li, G., & Yang, J. (2020). Integrating prediction and reconstruction for anomaly detection. Pattern Recognition Letters, 129, 123-130. https://doi.org/10.1016/j.patrec.2019.11.024

[16] Sadeghi-Tehran, P., & Angelov, P. (2012). A real-time approach for novelty detection and trajectories analysis for anomaly recognition in video surveillance systems. In 2012 IEEE Conference on Evolving and Adaptive Intelligent Systems. https://doi.org/10.1109/eais.2012.6232814

[17] Wang, J., & Xu, Z. (2016). Spatio-temporal texture modelling for real-time crowd anomaly detection. Computer Vision and Image Understanding, 144, 177-187. https://doi.org/10.1016/j.cviu.2015.08.010

[18] https://paperswithcode.com/dataset/ucsd

[19] Tang, Y. L., Zhao, L., Zhang, S., Gong, C., Li, G., & Yang, J. (2020). Integrating prediction and reconstruction for anomaly detection. Pattern Recognition Letters, 129,123-130. https://doi.org/10.1016/j.patrec.2019.11.024

[20] Khan, S. S., Mishra, P. K., Javed, N., Ye, B., Newman, K., Mihailidis, A., & Iaboni, A. (2022). Unsupervised Deep Learning to Detect Agitation From Videos in People With Dementia. IEEE Access, 10, 10349-10358. https://doi.org/10.1109/access.2022.3143990

[21] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention augmented convolutional networks. https://doi.org/10.1109/ICCV.2017.488

[22] Wang, Y., Fan, C., Cheng, K., & Deng, P.S. (2011).

Real-time camera anomaly detection for real-world video surveillance. In International Conference on Machine Learning and Cybernetics. https://doi.org/10.1109/icmlc.2011.6017032

[23] Bertini, M., Del Bimbo, A., & Seidenari, L. (2012). Multi-scale and real-time non-parametric approach for anomaly detection and localization. Computer Vision and Image Understanding, 116(3), 320-329. https://doi.org/10.1016/j.cviu.2011.09.009

Downloads

Published

2025-11-21

How to Cite

[1]
M. Ingle, P. Rane, H. Sharma, I. Goyal, and K. Nakum, “Unsupervised Machine Learning to Detect Abnormal Activities using CNN & 3D Spatial-Temporal AutoEncoder (3DSTAE)”, AIJR Proc., vol. 7, no. 6, pp. 341–348, Nov. 2025, doi: 10.21467/proceedings.7.6.39.