Transfer Learning on Animal Pose Estimation Using YoloV8 and FineTuning

Authors

  • Roki Fauzi Telkom University
  • Bedy Purnama Telkom University
  • Bayu Erfianto Telkom University

Abstract

Kemajuan dalam teknologi pengolahan citra dan kecerdasan buatan telah membuka peluang baru dalam analisis citra, terutama dalam konteks estimasi pose hewan. Penelitian ini bertujuan menggabungkan keunggulan YOLOV8 dalam deteksi objek dengan akurasi estimasi pose hewan melalui pendekatan transfer learning. Dengan melakukan finetuning pada YOLOV8 menggunakan dataset khusus untuk estimasi pose hewan, penelitian ini berupaya meningkatkan kemampuan model dalam mengenali dan menentukan posisi berbagai bagian tubuh hewan dengan lebih tepat. Suksesnya penelitian ini diharapkan dapat memberikan kontribusi pada pengembangan estimasi pose hewan, membuka peluang dalam pengelolaan kesehatan hewan, studi perilaku hewan, dan aplikasi lain yang membutuhkan analisis citra yang kompleks. Namun, penelitian ini memiliki batasan, termasuk fokus eksklusif pada estimasi pose hewan melalui teknik transfer learning dan fine-tuningg.

Kata Kunci: Stanford Dog Dataset, YOLOV8, finetuning, transfer learning.

References

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., & Joulin, A. (2021). SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. Advances in Neural Information Processing Systems, 34.

He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum Contrast for Unsupervised Visual Representation Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9729-9738.

Kocabas, M., Athanasiou, N., & Black, M. J. (2018). VIBE: Video Inference for Human Body Pose and Shape Estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8798-8807.

Mathis, A., Mamidanna, P., Cury, K. M., Abe, T., Murthy, V. N., Mathis, M. W., & Bethge, M. (2018). DeepLabCut: Markerless Pose Estimation of User-Defined Body Parts with Deep Learning. Nature Neuroscience, 21(9), 1281-1289.

Si, C., Chen, W., Wang, W., Wang, L., & Tan, T. (2020). An Attention Enhanced Graph Convolutional LSTM Network for SkeletonBased Action Recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1227-1236.

Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. Proceedings of the IEEE International Conference on Computer Vision, 843-852.

Tian, Y., Krishnan, D., & Isola, P. (2020). Contrastive Multiview Coding. Proceedings of the European Conference on Computer Vision (ECCV), 776-793.

Wang, Y., Zhou, Z., & Liu, S. (2020). Data Augmentation for Medical Image Segmentation with Spatial and Appearance Transformations. Journal of Healthcare Engineering, 2020.

Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2021). Deformable DETR: Deformable Transformers for End-to-End Object Detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14585-14594.

Zuffi, S., Kanazawa, A., & Black, M. J. (2017). Lions and Tigers and Bears: Capturing NonRigid, 3D, Articulated Shape from Images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3955- 3963.

Ruggero Ronchi, M., & Perona, P. (2017). Benchmarking and error diagnosis in multiinstance pose estimation. In Proceedings of the IEEE international conference on computer vision (pp. 369-378).

Wu, Tianyong, and Youkou Dong. (2023). "YOLO-SE: Improved YOLOv8 for Remote Sensing Object Detection and Recognition" Applied Sciences 13, no. 24: 12977.

Published

2024-10-21

Issue

Section

Program Studi S1 Informatika