Abstract
With the rapid advancement of deep learning, particularly the emergence of attention mechanisms applied to convolutional neural networks (CNNs), object detection in high-resolution remote sensing images has seen significant progress. However, due to the CNNs’ inability to capture long-range dependencies and the high computational cost of the attention mechanism, object detection in remote sensing images remains a challenging task. To address these issues, this paper introduces a novel feature pyramid full granularity attention module (FPFGAM) designed to learn long-range dependencies, dynamically attend to strongly correlated features, and reduce GPU memory overhead. Initially, we perform adaptive filtering of feature regions at the coarse-grained level. This process reduces the computational burden caused by weakly correlated features. Subsequently, we perform fine-grained pixel-level queries on several strongly correlated regions to enhance long-range dependent feature learning. We propose a feature pyramid full granularity attention network (FPFGANet) by embedding the feature pyramid full granularity attention module into the backbone network ResNet50 and the feature pyramid network (FPN). FPFGAM can be easily inserted into different layers to improve object detection accuracy in remote sensing images. Finally, we evaluate our method on three commonly used public remote sensing object detection datasets: NWPU VHR-10 and DIOR. The empirical results confirm the effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wang, Q., Gao, J., Yuan, Y.: Embedding structured contour and location prior in siamesed fully convolutional networks for road detection. IEEE Trans. Intell. Transp. Syst. 19(1), 230–241 (2017)
Hu, J., Huang, Z., Shen, F., He, D., Xian, Q.: A bag of tricks for fine-grained roof extraction. In: IGARSS 2023–2023 IEEE International Geoscience and Remote Sensing Symposium. IEEE (2023)
Weng, W., Wei, M., Ren, J., Shen, F.: Enhancing aerial object detection with selective frequency interaction network. IEEE Trans. Artif. Intell. 1(01), 1–12 (2024)
Fu, X., Shen, F., Du, X., Li, Z.: Bag of tricks for “vision meet alage” object detectionchallenge. In: 2022 6th International Conference on Universal Village (UV) , pp. 1–4. IEEE (2022)
Shen, F., et al.: An efficient multiresolution network for vehicle reidentification. IEEE Internet Things J. 9(11), 9049–9059 (2021)
Shi, G., Zhang, J., Liu, J., Zhang, C., Zhou, C., Yang, S.: Global context-augmented objection detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 59(12), 10604–10617 (2020)
Qiao, F.S., Wang, X., Wang, R., Cao, F., Zhao, S., Li, C.: A novel multi-frequency coordinated module for SAR ship detection. In: 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 804–811. IEEE (2022)
Li, M., Wei, M., He, X., Shen, F.: Enhancing part features via contrastive attention module for vehicle re-identification. In: 2022 IEEE International Conference on Image Processing (ICIP) , pp. 1816–1820. IEEE (2022)
Li, Y., Huang, Q., Pei, X., Chen, Y., Jiao, L., Shang, R.: Cross-layer attention network for small object detection in remote sensing imagery. IEEE J. Select. Topics Appl. Earth Observ. Remote Sens. 14, 2148–2161 (2021)
Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 783–792 (2021)
Tian, Z., Zhan, R., Hu, J., Wang, W., He, Z., Zhuang, Z.: Generating anchor boxes based on attention mechanism for object detection in remote sensing images. Remote Sensing 12(15), 2416 (2020)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Dong, X., et al.: Cswin transformer: a general vision transformer backbone with cross-shaped windows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12124–12134 (2022)
Tu, Z., et al.: Maxvit: Multi-axis vision transformer. In: European Conference on Computer Vision, pp. 459–479. Springer (2022)
Wang, W., et al.: Crossformer++: a versatile vision transformer hinging on cross-scale attention. IEEE Trans. Pattern Anal. Mach. Intell. 46(5), 3123–3136 (2023)
Dai, X., Chen, Y., Yang, J., Zhang, P., Yuan, L., Zhang, L.: Dynamic detr: end-to-end object detection with dynamic attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2988–2997 (2021)
Chen, C., Yu, J., Ling, Q.: Sparse attention block: aggregating contextual information for object detection. Pattern Recogn. 124, 108418 (2022)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Shen, F., Shu, X., Du, X., Tang, J.: Pedestrian-specific bipartite-aware similarity learning for text-based person retrieval. In: Proceedings of the 31th ACM International Conference on Multimedia (2023)
Shen, F., Zhu, J., Zhu, X., Xie, Y., Huang, J.: Exploring spatial significance via hybrid pyramidal graph network for vehicle re-identification. IEEE Trans. Intell. Transp. Syst. 23(7), 8793–8804 (2021)
Chen, C., Gong, W., Chen, Y., Li, W.: Object detection in remote sensing images based on a scene-contextual feature pyramid network. Remote Sensing 11(3), 339 (2019)
Huang, W., Li, G., Chen, Q., Ju, M., Qu, J.: Cf2pn: a cross-scale feature fusion pyramidnetwork based remote sensing _target detection. Remote Sensing 13(5), 847 (2021)
Yang, X., et al.: Scrdet: towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8232–8241 (2019)
Xie, W., Lei, J., Fang, S., Li, Y., Jia, X., Li, M.: Dual feature extraction network for hyperspectral image analysis. Pattern Recogn. 118, 107992 (2021)
Xie, W., Lei, J., Cui, Y., Li, Y., Du, Q.: Hyperspectral pansharpening with deep priors. IEEE Trans. Neural Netw. Learn. Syst. 31(5), 1529–1543 (2019)
Li, K., Wan, G., Cheng, G., Meng, L., Han, J.: Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J. Photogramm. Remote. Sens. 159, 296–307 (2020)
Ding, J., et al.: Object detection in aerial images: a large-scale benchmark and challenges. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7778–7796 (2021)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Pang, J., et al.: Towards balanced learning for instance recognition. Int. J. Comput. Vision 129, 1376–1393 (2021)
Chen, J., Luo, B., Wu, Q., Chen, J., Peng, X.: Overlap sampler for region-based object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 767–775 (2020)
Deng, C., Wang, M., Liu, L., Liu, Y., Jiang, Y.: Extended feature pyramid network for small object detection. IEEE Trans. Multim. 24, 1968–1979 (2021)
Kong, T., Sun, F., Tan, C., Liu, H., Huang, W.: Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 169–185 (2018)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Guo, C., Fan, B., Zhang, Q., Xiang, S., Pan, C.: Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12595–12604 (2020)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 10781–10790 (2020)
Luo, Y., et al.: CE-FPN: enhancing channel information for object detection. Multimedia Tools Appl. 81(21), 30685–30704 (2022)
Shen, F., Xie, Y., Zhu, J., Zhu, X., Zeng, H.: Git: graph interactive transformer for vehicle re-identification. IEEE Trans. Image Process. 32, 1039–1051 (2023)
Huang, S., Lu, Z., Cheng, R., He, C.: FAPN: feature-aligned pyramid network for dense image prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 864–873 (2021)
Cao, J., Chen, Q., Guo, J., Shi, R.: Attention-guided context feature pyramid network for object detection. arXiv preprint arXiv:2005.11475 (2020)
Hu, M., Li, Y., Fang, L., Wang, S.: A2-fpn: attention aggregation based feature pyramid network for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15343–15352 (2021)
Zhu, L., Wang, X., Ke, Z., Zhang, W., Lau, R.W.: Biformer: vision transformer with bilevel routing attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10323–10333 (2023)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training dataefficient image transformers and distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
Beal, J., Kim, E., Tzeng, E., Park, D.H., Zhai, A., Kislyuk, D.: Toward transformer-based object detection. arXiv preprint arXiv:2012.09958 (2020)
Shen, F., Du, X., Zhang, L., Tang, J.: Triplet contrastive learning for unsupervised vehicle re-identification. arXiv preprint arXiv:2301.09498 (2023)
Gao, P., Lu, J., Li, H., Mottaghi, R., Kembhavi, A.: Container: context aggregation network. arXiv preprint arXiv:2106.01401 (2021)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Pan, X., et al.: On the integration of self-attention and convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 815–825 (2022)
Cheng, G., Han, J., Zhou, P., Guo, L.: Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 98, 119–132 (2014)
Chen, K., et al.: Mmdetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Liu, W., et al.: SSD: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, 11–14 October 2016, Proceedings, Part I 14, pp. 21–37. Springer, Cham (2016)
Wang, P., Sun, X., Diao, W., Fu, K.: FMSSD: feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 58(5), 3377–3390 (2020)
Ye, X., Xiong, F., Lu, J., Zhou, J., Qian, Y.: F3-net: feature fusion and filtration network for object detection in optical remote sensing images. Remote Sensing 12(24) (2020)
Li, Y., et al.: A framework of maximum feature exploration oriented remote sensing object detection. IEEE Geosci. Remote Sens. Lett. 20, 1–5 (2023)
Zhong, Y., Han, X., Zhang, L.: Multi-class geospatial object detection based on a position-sensitive balancing framework for high spatial resolution remote sensing imagery. ISPRS J. Photogramm. Remote. Sens. 138, 281–294 (2018)
Chen, J., Wan, L., Zhu, J., Xu, G., Deng, M.: Multi-scale spatial and channel-wise attention for improving object detection in remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 17(4), 681–685 (2020)
Liu, D., Zhang, J., Li, T., Qi, Y., Wu, Y., Zhang, Y.: A lightweight object detection and recognition method based on light global-local module for remote sensing images. IEEE Geosci. Remote Sens. Lett. 20, 1–5 (2023)
Li, Q., Chen, Y., Zeng, Y.: Transformer with transfer CNN for remote-sensing image object detection. Remote Sensing 14(4) (2022)
Li, Y., Huang, Q., Pei, X., Jiao, L., Shang, R.: Radet: refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images. Remote Sensing 12(3) (2020)
Zhu, D., et al.: Spatial hierarchy perception and hard samples metric learning for high-resolution remote sensing image object detection. Appl. Intell. 52(3), 3193–3208 (2022)
Zhang, T., Zhang, X., Zhu, P., Jia, X., Tang, X., Jiao, L.: Generalized fewshot object detection in remote sensing images. ISPRS J. Photogramm. Remote Sens. 195, 353–364 (2023)
Yang, Y., et al.: Adaptive knowledge distillation for lightweight remote sensing object detectors optimizing. IEEE Trans. Geosci. Remote Sens. 60, 1–15 (2022)
Gong, Y., et al.: Context-aware convolutional neural network for object detection in VHR remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 58(1), 34–44 (2020)
Zhang, K., Shen, H.: Multi-stage feature enhancement pyramid network for detecting objects in optical remote sensing images. Remote Sensing 14(3), 579 (2022)
Zhang, G., Lu, S., Zhang, W.: Cad-net: a context-aware detection network for objects in remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 57(12), 10015–10024 (2019)
Wang, J., Wang, Y., Wu, Y., Zhang, K., Wang, Q.: Frpnet: a feature-reflowing pyramid network for object detection of remote sensing images. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022)
Yao, Y., et al.: On improving bounding box representations for oriented object detection. IEEE Trans. Geosci. Remote Sens. 61, 1–11 (2022)
Huang, Z., Li, W., Xia, X.-G., Wang, H., Jie, F., Tao, R.: Lo-det: lightweight oriented object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 60, 1–15 (2022)
Cheng, G., Si, Y., Hong, H., Yao, X., Guo, L.: Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geosci. Remote Sens. Lett. 18(3), 431–435 (2021)
Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M., Zhang, L.: Dn-detr: accelerate detr training by introducing query denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13619–13627 (2022)
Yue, C., Yan, J., Zhang, Y., Luo, Z., Liu, Y., Guo, P.: SCFNET: semantic correction and focus network for remote sensing image object detection. Expert Syst. Appl. 224, 119980 (2023)
Yuan, Z., Liu, Z., Zhu, C., Qi, J., Zhao, D.: Object detection in remote sensing images via multi-feature pyramid network with receptive field block. Remote Sensing 13(5), 862 (2021)
Wang, G., et al.: FSOD-NET: full-scale object detection from optical remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 60, 1–18 (2022)
Tian, Z., Zhan, R., Hu, J., Wang, W., He, Z., Zhuang, Z.: Generating anchor boxes based on attention mechanism for object detection in remote sensing images. Remote Sensing 12(15), 2416 (2020)
Wang, Y., Xu, C., Liu, C., Li, Z.: Context information refinement for few-shot object detection in remote sensing images. Remote Sensing 14(14), 3255 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, C., Qi, X., Yin, H., Song, B., Li, K., Shen, F. (2024). Feature Pyramid Full Granularity Attention Network for Object Detection in Remote Sensing Imagery. In: Huang, DS., Zhang, C., Guo, J. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science, vol 14871. Springer, Singapore. https://doi.org/10.1007/978-981-97-5609-4_26
Download citation
DOI: https://doi.org/10.1007/978-981-97-5609-4_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5608-7
Online ISBN: 978-981-97-5609-4
eBook Packages: Computer ScienceComputer Science (R0)