Deep Learning Techniques for Skeleton-Based Action Recognition: A Survey

Pham, Dinh-Tan

doi:10.1007/978-3-031-64608-9_29

Dinh-Tan Pham ORCID: orcid.org/0000-0003-1366-0617³⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14814))

Included in the following conference series:

International Conference on Computational Science and Its Applications

393 Accesses

Abstract

Interpreting human behavior from entirely performed actions is called human action recognition (HAR). HAR applications rapidly expand into robotics, CCTV surveillance, self-driving vehicles, gaming, and video retrieval. Among different data modalities, skeleton data offers compact representation and computational efficiency. In recent years, much work has gone into developing a robust and accurate deep-learning framework for skeleton-based HAR. The paper reviews state-of-the-art methods for skeleton-based HAR. The survey also summarizes evaluation results on a large-scale benchmark dataset. Trends in action recognition research are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

CHF34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: CHF 24.95; Price includes VAT (Switzerland)

eBook: CHF 70.50; Price excludes VAT (Switzerland)

Softcover Book: CHF 87.50; Price excludes VAT (Switzerland)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Skeleton-Based Activity Recognition: Preprocessing and Approaches

Understanding the limits of 2D skeletons for action recognition

Article 07 February 2021

A Survey on Deep Neural Networks for Human Action Recognition based on Skeleton Information

References

Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13359–13368 (2021)
Google Scholar
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)
Google Scholar
Johansson, G.: Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 14(2), 201–211 (1973)
Article Google Scholar
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P., et al.: The Kinetics human action video dataset, pp. 1–22. arXiv preprint arXiv:1705.06950 (2017)
Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3D action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3288–3297 (2017)
Google Scholar
Kim, T.S., Reiter, A.: Interpretable 3D human action analysis with temporal convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1623–1631. IEEE (2017)
Google Scholar
Kong, Y., Fu, Y.: Human action recognition and prediction: a survey. Int. J. Comput. Vision 130(5), 1366–1401 (2022)
Article Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: 2011 International Conference on Computer Vision, pp. 2556–2563. IEEE (2011)
Google Scholar
Li, B., Dai, Y., Cheng, X., Chen, H., Lin, Y., He, M.: Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In: 2017 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 601–604. IEEE (2017)
Google Scholar
Li, C., Zhong, Q., Xie, D., Pu, S.: Skeleton-based action recognition with convolutional neural networks. In: 2017 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 597–600. IEEE (2017)
Google Scholar
Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3595–3603 (2019)
Google Scholar
Li, S., Li, W., Cook, C., Gao, Y.: Deep independently recurrent neural network (IndRNN), pp. 1–18. arXiv preprint arXiv:1910.06251 (2019)
Li, S., Li, W., Cook, C., Zhu, C., Gao, Y.: Independently recurrent neural network (IndRNN): Building a longer and deeper RNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5457–5466 (2018)
Google Scholar
Li, T., Liu, J., Zhang, W., Ni, Y., Wang, W., Li, Z.: UAV-Human: a large benchmark for human behavior understanding with unmanned aerial vehicles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16266–16275 (2021)
Google Scholar
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 9–14. IEEE (2010)
Google Scholar
Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 816–833. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_50
Chapter Google Scholar
Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn. 68, 346–362 (2017)
Article Google Scholar
Oreifej, O., Liu, Z.: HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 716–723 (2013)
Google Scholar
Ren, B., Liu, M., Ding, R., Liu, H.: A survey on 3D skeleton-based action recognition using learning method, pp. 1–8. arXiv preprint arXiv:2002.05907 (2020)
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)
Google Scholar
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019)
Google Scholar
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans. Image Process. 32 9532–9545 (2020)
Google Scholar
Si, C., Jing, Y., Wang, W., Wang, L., Tan, T.: Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 106–121. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_7
Chapter Google Scholar
Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end Spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, pp. 1–7 (2017)
Google Scholar
Song, Y.F., Zhang, Z., Shan, C., Wang, L.: Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans. Circuits Syst. Video Technol. 31(5), 1915–1925 (2020)
Article Google Scholar
Song, Y.F., Zhang, Z., Shan, C., Wang, L.: Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45, 1474–1488 (2022)
Article Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild, pp. 1–6. arXiv preprint arXiv:1212.0402 (2012)
Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5323–5332 (2018)
Google Scholar
Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–27. IEEE (2012)
Google Scholar
Xu, H., Gao, Y., Hui, Z., Li, J., Gao, X.: Language knowledge-assisted representation learning for skeleton-based action recognition. arXiv preprint arXiv:2305.12398 (2023)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence, pp. 1–20 (2018)
Google Scholar
Yue, R., Tian, Z., Du, S.: Action recognition based on RGB and skeleton data sets: a survey. Neurocomputing 512, 287–306 (2022)
Google Scholar
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2117–2126 (2017)
Google Scholar
Zheng, W., Li, L., Zhang, Z., Huang, Y., Wang, L.: Relational network for skeleton-based action recognition. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 826–831. IEEE (2019)
Google Scholar

Download references

Acknowledgments

This research is funded by International School, Vietnam National University, Hanoi, Vietnam.

Author information

Authors and Affiliations

International School, Vietnam National University, Hanoi, Vietnam
Dinh-Tan Pham

Authors

Dinh-Tan Pham
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dinh-Tan Pham .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
School of Engineering, University of Basilicata, Potenza, Italy
Beniamino Murgante
Department of Civil and Environmental Engineering and Architecture, University of Cagliari, Cagliari, Italy
Chiara Garau
Faculty of Information Technology, Monash University, Clayton, VIC, Australia
David Taniar
Algoritmi Research Centre, University of Minho, Braga, Portugal
Ana Maria A. C. Rocha
Department of Chemistry, Biology and Biotechnology, University of Perugia, Perugia, Italy
Maria Noelia Faginas Lago

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pham, DT. (2024). Deep Learning Techniques for Skeleton-Based Action Recognition: A Survey. In: Gervasi, O., Murgante, B., Garau, C., Taniar, D., C. Rocha, A.M.A., Faginas Lago, M.N. (eds) Computational Science and Its Applications – ICCSA 2024. ICCSA 2024. Lecture Notes in Computer Science, vol 14814. Springer, Cham. https://doi.org/10.1007/978-3-031-64608-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-64608-9_29
Published: 02 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64607-2
Online ISBN: 978-3-031-64608-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Learning Techniques for Skeleton-Based Action Recognition: A Survey

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Skeleton-Based Activity Recognition: Preprocessing and Approaches

Understanding the limits of 2D skeletons for action recognition

A Survey on Deep Neural Networks for Human Action Recognition based on Skeleton Information

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Deep Learning Techniques for Skeleton-Based Action Recognition: A Survey

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Skeleton-Based Activity Recognition: Preprocessing and Approaches

Understanding the limits of 2D skeletons for action recognition

A Survey on Deep Neural Networks for Human Action Recognition based on Skeleton Information

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation