Skip to main content

Feature Pyramid Full Granularity Attention Network for Object Detection in Remote Sensing Imagery

  • Conference paper
  • First Online:
Advanced Intelligent Computing Technology and Applications (ICIC 2024)

Abstract

With the rapid advancement of deep learning, particularly the emergence of attention mechanisms applied to convolutional neural networks (CNNs), object detection in high-resolution remote sensing images has seen significant progress. However, due to the CNNs’ inability to capture long-range dependencies and the high computational cost of the attention mechanism, object detection in remote sensing images remains a challenging task. To address these issues, this paper introduces a novel feature pyramid full granularity attention module (FPFGAM) designed to learn long-range dependencies, dynamically attend to strongly correlated features, and reduce GPU memory overhead. Initially, we perform adaptive filtering of feature regions at the coarse-grained level. This process reduces the computational burden caused by weakly correlated features. Subsequently, we perform fine-grained pixel-level queries on several strongly correlated regions to enhance long-range dependent feature learning. We propose a feature pyramid full granularity attention network (FPFGANet) by embedding the feature pyramid full granularity attention module into the backbone network ResNet50 and the feature pyramid network (FPN). FPFGAM can be easily inserted into different layers to improve object detection accuracy in remote sensing images. Finally, we evaluate our method on three commonly used public remote sensing object detection datasets: NWPU VHR-10 and DIOR. The empirical results confirm the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
CHF34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
CHF 24.95
Price includes VAT (Switzerland)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
CHF 78.00
Price excludes VAT (Switzerland)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
CHF 97.00
Price excludes VAT (Switzerland)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Wang, Q., Gao, J., Yuan, Y.: Embedding structured contour and location prior in siamesed fully convolutional networks for road detection. IEEE Trans. Intell. Transp. Syst. 19(1), 230–241 (2017)

    Article  Google Scholar 

  2. Hu, J., Huang, Z., Shen, F., He, D., Xian, Q.: A bag of tricks for fine-grained roof extraction. In: IGARSS 2023–2023 IEEE International Geoscience and Remote Sensing Symposium. IEEE (2023)

    Google Scholar 

  3. Weng, W., Wei, M., Ren, J., Shen, F.: Enhancing aerial object detection with selective frequency interaction network. IEEE Trans. Artif. Intell. 1(01), 1–12 (2024)

    Google Scholar 

  4. Fu, X., Shen, F., Du, X., Li, Z.: Bag of tricks for “vision meet alage” object detectionchallenge. In: 2022 6th International Conference on Universal Village (UV) , pp. 1–4. IEEE (2022)

    Google Scholar 

  5. Shen, F., et al.: An efficient multiresolution network for vehicle reidentification. IEEE Internet Things J. 9(11), 9049–9059 (2021)

    Article  Google Scholar 

  6. Shi, G., Zhang, J., Liu, J., Zhang, C., Zhou, C., Yang, S.: Global context-augmented objection detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 59(12), 10604–10617 (2020)

    Article  Google Scholar 

  7. Qiao, F.S., Wang, X., Wang, R., Cao, F., Zhao, S., Li, C.: A novel multi-frequency coordinated module for SAR ship detection. In: 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 804–811. IEEE (2022)

    Google Scholar 

  8. Li, M., Wei, M., He, X., Shen, F.: Enhancing part features via contrastive attention module for vehicle re-identification. In: 2022 IEEE International Conference on Image Processing (ICIP) , pp. 1816–1820. IEEE (2022)

    Google Scholar 

  9. Li, Y., Huang, Q., Pei, X., Chen, Y., Jiao, L., Shang, R.: Cross-layer attention network for small object detection in remote sensing imagery. IEEE J. Select. Topics Appl. Earth Observ. Remote Sens. 14, 2148–2161 (2021)

    Article  Google Scholar 

  10. Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 783–792 (2021)

    Google Scholar 

  11. Tian, Z., Zhan, R., Hu, J., Wang, W., He, Z., Zhuang, Z.: Generating anchor boxes based on attention mechanism for object detection in remote sensing images. Remote Sensing 12(15), 2416 (2020)

    Article  Google Scholar 

  12. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  13. Dong, X., et al.: Cswin transformer: a general vision transformer backbone with cross-shaped windows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12124–12134 (2022)

    Google Scholar 

  14. Tu, Z., et al.: Maxvit: Multi-axis vision transformer. In: European Conference on Computer Vision, pp. 459–479. Springer (2022)

    Google Scholar 

  15. Wang, W., et al.: Crossformer++: a versatile vision transformer hinging on cross-scale attention. IEEE Trans. Pattern Anal. Mach. Intell. 46(5), 3123–3136 (2023)

    Google Scholar 

  16. Dai, X., Chen, Y., Yang, J., Zhang, P., Yuan, L., Zhang, L.: Dynamic detr: end-to-end object detection with dynamic attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2988–2997 (2021)

    Google Scholar 

  17. Chen, C., Yu, J., Ling, Q.: Sparse attention block: aggregating contextual information for object detection. Pattern Recogn. 124, 108418 (2022)

    Article  Google Scholar 

  18. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

    Google Scholar 

  19. Shen, F., Shu, X., Du, X., Tang, J.: Pedestrian-specific bipartite-aware similarity learning for text-based person retrieval. In: Proceedings of the 31th ACM International Conference on Multimedia (2023)

    Google Scholar 

  20. Shen, F., Zhu, J., Zhu, X., Xie, Y., Huang, J.: Exploring spatial significance via hybrid pyramidal graph network for vehicle re-identification. IEEE Trans. Intell. Transp. Syst. 23(7), 8793–8804 (2021)

    Article  Google Scholar 

  21. Chen, C., Gong, W., Chen, Y., Li, W.: Object detection in remote sensing images based on a scene-contextual feature pyramid network. Remote Sensing 11(3), 339 (2019)

    Article  Google Scholar 

  22. Huang, W., Li, G., Chen, Q., Ju, M., Qu, J.: Cf2pn: a cross-scale feature fusion pyramidnetwork based remote sensing _target detection. Remote Sensing 13(5), 847 (2021)

    Article  Google Scholar 

  23. Yang, X., et al.: Scrdet: towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8232–8241 (2019)

    Google Scholar 

  24. Xie, W., Lei, J., Fang, S., Li, Y., Jia, X., Li, M.: Dual feature extraction network for hyperspectral image analysis. Pattern Recogn. 118, 107992 (2021)

    Article  Google Scholar 

  25. Xie, W., Lei, J., Cui, Y., Li, Y., Du, Q.: Hyperspectral pansharpening with deep priors. IEEE Trans. Neural Netw. Learn. Syst. 31(5), 1529–1543 (2019)

    Article  MathSciNet  Google Scholar 

  26. Li, K., Wan, G., Cheng, G., Meng, L., Han, J.: Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J. Photogramm. Remote. Sens. 159, 296–307 (2020)

    Article  Google Scholar 

  27. Ding, J., et al.: Object detection in aerial images: a large-scale benchmark and challenges. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7778–7796 (2021)

    Article  Google Scholar 

  28. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  29. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

    Google Scholar 

  30. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  31. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  32. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  33. Pang, J., et al.: Towards balanced learning for instance recognition. Int. J. Comput. Vision 129, 1376–1393 (2021)

    Article  Google Scholar 

  34. Chen, J., Luo, B., Wu, Q., Chen, J., Peng, X.: Overlap sampler for region-based object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 767–775 (2020)

    Google Scholar 

  35. Deng, C., Wang, M., Liu, L., Liu, Y., Jiang, Y.: Extended feature pyramid network for small object detection. IEEE Trans. Multim. 24, 1968–1979 (2021)

    Article  Google Scholar 

  36. Kong, T., Sun, F., Tan, C., Liu, H., Huang, W.: Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 169–185 (2018)

    Google Scholar 

  37. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)

    Google Scholar 

  38. Guo, C., Fan, B., Zhang, Q., Xiang, S., Pan, C.: Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12595–12604 (2020)

    Google Scholar 

  39. Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 10781–10790 (2020)

    Google Scholar 

  40. Luo, Y., et al.: CE-FPN: enhancing channel information for object detection. Multimedia Tools Appl. 81(21), 30685–30704 (2022)

    Google Scholar 

  41. Shen, F., Xie, Y., Zhu, J., Zhu, X., Zeng, H.: Git: graph interactive transformer for vehicle re-identification. IEEE Trans. Image Process. 32, 1039–1051 (2023)

    Google Scholar 

  42. Huang, S., Lu, Z., Cheng, R., He, C.: FAPN: feature-aligned pyramid network for dense image prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 864–873 (2021)

    Google Scholar 

  43. Cao, J., Chen, Q., Guo, J., Shi, R.: Attention-guided context feature pyramid network for object detection. arXiv preprint arXiv:2005.11475 (2020)

  44. Hu, M., Li, Y., Fang, L., Wang, S.: A2-fpn: attention aggregation based feature pyramid network for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15343–15352 (2021)

    Google Scholar 

  45. Zhu, L., Wang, X., Ke, Z., Zhang, W., Lau, R.W.: Biformer: vision transformer with bilevel routing attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10323–10333 (2023)

    Google Scholar 

  46. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  47. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training dataefficient image transformers and distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)

    Google Scholar 

  48. Beal, J., Kim, E., Tzeng, E., Park, D.H., Zhai, A., Kislyuk, D.: Toward transformer-based object detection. arXiv preprint arXiv:2012.09958 (2020)

  49. Shen, F., Du, X., Zhang, L., Tang, J.: Triplet contrastive learning for unsupervised vehicle re-identification. arXiv preprint arXiv:2301.09498 (2023)

  50. Gao, P., Lu, J., Li, H., Mottaghi, R., Kembhavi, A.: Container: context aggregation network. arXiv preprint arXiv:2106.01401 (2021)

  51. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  52. Pan, X., et al.: On the integration of self-attention and convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 815–825 (2022)

    Google Scholar 

  53. Cheng, G., Han, J., Zhou, P., Guo, L.: Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 98, 119–132 (2014)

    Article  Google Scholar 

  54. Chen, K., et al.: Mmdetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)

  55. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  56. Liu, W., et al.: SSD: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, 11–14 October 2016, Proceedings, Part I 14, pp. 21–37. Springer, Cham (2016)

    Google Scholar 

  57. Wang, P., Sun, X., Diao, W., Fu, K.: FMSSD: feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 58(5), 3377–3390 (2020)

    Article  Google Scholar 

  58. Ye, X., Xiong, F., Lu, J., Zhou, J., Qian, Y.: F3-net: feature fusion and filtration network for object detection in optical remote sensing images. Remote Sensing 12(24) (2020)

    Google Scholar 

  59. Li, Y., et al.: A framework of maximum feature exploration oriented remote sensing object detection. IEEE Geosci. Remote Sens. Lett. 20, 1–5 (2023)

    Article  Google Scholar 

  60. Zhong, Y., Han, X., Zhang, L.: Multi-class geospatial object detection based on a position-sensitive balancing framework for high spatial resolution remote sensing imagery. ISPRS J. Photogramm. Remote. Sens. 138, 281–294 (2018)

    Article  Google Scholar 

  61. Chen, J., Wan, L., Zhu, J., Xu, G., Deng, M.: Multi-scale spatial and channel-wise attention for improving object detection in remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 17(4), 681–685 (2020)

    Article  Google Scholar 

  62. Liu, D., Zhang, J., Li, T., Qi, Y., Wu, Y., Zhang, Y.: A lightweight object detection and recognition method based on light global-local module for remote sensing images. IEEE Geosci. Remote Sens. Lett. 20, 1–5 (2023)

    Google Scholar 

  63. Li, Q., Chen, Y., Zeng, Y.: Transformer with transfer CNN for remote-sensing image object detection. Remote Sensing 14(4) (2022)

    Google Scholar 

  64. Li, Y., Huang, Q., Pei, X., Jiao, L., Shang, R.: Radet: refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images. Remote Sensing 12(3) (2020)

    Google Scholar 

  65. Zhu, D., et al.: Spatial hierarchy perception and hard samples metric learning for high-resolution remote sensing image object detection. Appl. Intell. 52(3), 3193–3208 (2022)

    Article  Google Scholar 

  66. Zhang, T., Zhang, X., Zhu, P., Jia, X., Tang, X., Jiao, L.: Generalized fewshot object detection in remote sensing images. ISPRS J. Photogramm. Remote Sens. 195, 353–364 (2023)

    Article  Google Scholar 

  67. Yang, Y., et al.: Adaptive knowledge distillation for lightweight remote sensing object detectors optimizing. IEEE Trans. Geosci. Remote Sens. 60, 1–15 (2022)

    Google Scholar 

  68. Gong, Y., et al.: Context-aware convolutional neural network for object detection in VHR remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 58(1), 34–44 (2020)

    Article  Google Scholar 

  69. Zhang, K., Shen, H.: Multi-stage feature enhancement pyramid network for detecting objects in optical remote sensing images. Remote Sensing 14(3), 579 (2022)

    Article  Google Scholar 

  70. Zhang, G., Lu, S., Zhang, W.: Cad-net: a context-aware detection network for objects in remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 57(12), 10015–10024 (2019)

    Google Scholar 

  71. Wang, J., Wang, Y., Wu, Y., Zhang, K., Wang, Q.: Frpnet: a feature-reflowing pyramid network for object detection of remote sensing images. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022)

    Google Scholar 

  72. Yao, Y., et al.: On improving bounding box representations for oriented object detection. IEEE Trans. Geosci. Remote Sens. 61, 1–11 (2022)

    Google Scholar 

  73. Huang, Z., Li, W., Xia, X.-G., Wang, H., Jie, F., Tao, R.: Lo-det: lightweight oriented object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 60, 1–15 (2022)

    Google Scholar 

  74. Cheng, G., Si, Y., Hong, H., Yao, X., Guo, L.: Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geosci. Remote Sens. Lett. 18(3), 431–435 (2021)

    Article  Google Scholar 

  75. Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M., Zhang, L.: Dn-detr: accelerate detr training by introducing query denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13619–13627 (2022)

    Google Scholar 

  76. Yue, C., Yan, J., Zhang, Y., Luo, Z., Liu, Y., Guo, P.: SCFNET: semantic correction and focus network for remote sensing image object detection. Expert Syst. Appl. 224, 119980 (2023)

    Article  Google Scholar 

  77. Yuan, Z., Liu, Z., Zhu, C., Qi, J., Zhao, D.: Object detection in remote sensing images via multi-feature pyramid network with receptive field block. Remote Sensing 13(5), 862 (2021)

    Google Scholar 

  78. Wang, G., et al.: FSOD-NET: full-scale object detection from optical remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 60, 1–18 (2022)

    Google Scholar 

  79. Tian, Z., Zhan, R., Hu, J., Wang, W., He, Z., Zhuang, Z.: Generating anchor boxes based on attention mechanism for object detection in remote sensing images. Remote Sensing 12(15), 2416 (2020)

    Article  Google Scholar 

  80. Wang, Y., Xu, C., Liu, C., Li, Z.: Context information refinement for few-shot object detection in remote sensing images. Remote Sensing 14(14), 3255 (2022)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chang Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, C., Qi, X., Yin, H., Song, B., Li, K., Shen, F. (2024). Feature Pyramid Full Granularity Attention Network for Object Detection in Remote Sensing Imagery. In: Huang, DS., Zhang, C., Guo, J. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science, vol 14871. Springer, Singapore. https://doi.org/10.1007/978-981-97-5609-4_26

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5609-4_26

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5608-7

  • Online ISBN: 978-981-97-5609-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

  NODES
INTERN 13
Note 2