Skip to main content

Deep Learning Techniques for Skeleton-Based Action Recognition: A Survey

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2024 (ICCSA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14814))

Included in the following conference series:

  • 393 Accesses

Abstract

Interpreting human behavior from entirely performed actions is called human action recognition (HAR). HAR applications rapidly expand into robotics, CCTV surveillance, self-driving vehicles, gaming, and video retrieval. Among different data modalities, skeleton data offers compact representation and computational efficiency. In recent years, much work has gone into developing a robust and accurate deep-learning framework for skeleton-based HAR. The paper reviews state-of-the-art methods for skeleton-based HAR. The survey also summarizes evaluation results on a large-scale benchmark dataset. Trends in action recognition research are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
CHF34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
CHF 24.95
Price includes VAT (Switzerland)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
CHF 70.50
Price excludes VAT (Switzerland)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
CHF 87.50
Price excludes VAT (Switzerland)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13359–13368 (2021)

    Google Scholar 

  2. Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)

    Google Scholar 

  3. Johansson, G.: Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 14(2), 201–211 (1973)

    Article  Google Scholar 

  4. Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P., et al.: The Kinetics human action video dataset, pp. 1–22. arXiv preprint arXiv:1705.06950 (2017)

  5. Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3D action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3288–3297 (2017)

    Google Scholar 

  6. Kim, T.S., Reiter, A.: Interpretable 3D human action analysis with temporal convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1623–1631. IEEE (2017)

    Google Scholar 

  7. Kong, Y., Fu, Y.: Human action recognition and prediction: a survey. Int. J. Comput. Vision 130(5), 1366–1401 (2022)

    Article  Google Scholar 

  8. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: 2011 International Conference on Computer Vision, pp. 2556–2563. IEEE (2011)

    Google Scholar 

  9. Li, B., Dai, Y., Cheng, X., Chen, H., Lin, Y., He, M.: Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In: 2017 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 601–604. IEEE (2017)

    Google Scholar 

  10. Li, C., Zhong, Q., Xie, D., Pu, S.: Skeleton-based action recognition with convolutional neural networks. In: 2017 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 597–600. IEEE (2017)

    Google Scholar 

  11. Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3595–3603 (2019)

    Google Scholar 

  12. Li, S., Li, W., Cook, C., Gao, Y.: Deep independently recurrent neural network (IndRNN), pp. 1–18. arXiv preprint arXiv:1910.06251 (2019)

  13. Li, S., Li, W., Cook, C., Zhu, C., Gao, Y.: Independently recurrent neural network (IndRNN): Building a longer and deeper RNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5457–5466 (2018)

    Google Scholar 

  14. Li, T., Liu, J., Zhang, W., Ni, Y., Wang, W., Li, Z.: UAV-Human: a large benchmark for human behavior understanding with unmanned aerial vehicles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16266–16275 (2021)

    Google Scholar 

  15. Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 9–14. IEEE (2010)

    Google Scholar 

  16. Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 816–833. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_50

    Chapter  Google Scholar 

  17. Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn. 68, 346–362 (2017)

    Article  Google Scholar 

  18. Oreifej, O., Liu, Z.: HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 716–723 (2013)

    Google Scholar 

  19. Ren, B., Liu, M., Ding, R., Liu, H.: A survey on 3D skeleton-based action recognition using learning method, pp. 1–8. arXiv preprint arXiv:2002.05907 (2020)

  20. Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)

    Google Scholar 

  21. Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019)

    Google Scholar 

  22. Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans. Image Process. 32 9532–9545 (2020)

    Google Scholar 

  23. Si, C., Jing, Y., Wang, W., Wang, L., Tan, T.: Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 106–121. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_7

    Chapter  Google Scholar 

  24. Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end Spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, pp. 1–7 (2017)

    Google Scholar 

  25. Song, Y.F., Zhang, Z., Shan, C., Wang, L.: Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans. Circuits Syst. Video Technol. 31(5), 1915–1925 (2020)

    Article  Google Scholar 

  26. Song, Y.F., Zhang, Z., Shan, C., Wang, L.: Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45, 1474–1488 (2022)

    Article  Google Scholar 

  27. Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild, pp. 1–6. arXiv preprint arXiv:1212.0402 (2012)

  28. Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5323–5332 (2018)

    Google Scholar 

  29. Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–27. IEEE (2012)

    Google Scholar 

  30. Xu, H., Gao, Y., Hui, Z., Li, J., Gao, X.: Language knowledge-assisted representation learning for skeleton-based action recognition. arXiv preprint arXiv:2305.12398 (2023)

  31. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence, pp. 1–20 (2018)

    Google Scholar 

  32. Yue, R., Tian, Z., Du, S.: Action recognition based on RGB and skeleton data sets: a survey. Neurocomputing 512, 287–306 (2022)

    Google Scholar 

  33. Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2117–2126 (2017)

    Google Scholar 

  34. Zheng, W., Li, L., Zhang, Z., Huang, Y., Wang, L.: Relational network for skeleton-based action recognition. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 826–831. IEEE (2019)

    Google Scholar 

Download references

Acknowledgments

This research is funded by International School, Vietnam National University, Hanoi, Vietnam.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dinh-Tan Pham .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pham, DT. (2024). Deep Learning Techniques for Skeleton-Based Action Recognition: A Survey. In: Gervasi, O., Murgante, B., Garau, C., Taniar, D., C. Rocha, A.M.A., Faginas Lago, M.N. (eds) Computational Science and Its Applications – ICCSA 2024. ICCSA 2024. Lecture Notes in Computer Science, vol 14814. Springer, Cham. https://doi.org/10.1007/978-3-031-64608-9_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-64608-9_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-64607-2

  • Online ISBN: 978-3-031-64608-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

  NODES
INTERN 9
Note 2