Abstract
In human vision system, the contrast of image regions to their surroundings is highly sensitive. Motivated by this intuition, many contrast based salient object detection methods have been proposed in recent years. In these previous methods, saliency for one region is generally measured by sum of contrast from its surrounding. We find that the spatial distribution of contrast is one important cue of saliency. Salient region usually shows high contrast in the most of directions, while background region present high contrast in a few directions. Inspired by this phenomenon, we propose one salient object detection method via distribution of contrast. In proposed method, input image will be segmented into superpixels first, for each superpixel, contrast between this superpixel and its surrounding will be computed in every direction, relative standard deviation (RSD) of contrast in all directions is measured as cue of saliency. Experimental results on four benchmark dataset demonstrate the proposed method performs well when against the state-of-the-art methods.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Salient object detection aims to simulate the human visual system for detecting pixels or regions that most attractive. The detection result can be used for numerous computer vision task such as image classification [1, 2], object detection and recognition [3, 4], image compression [5], and image segmentation [6, 7].
Motivated by increasing application demand, a number of algorithms have been proposed [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]. These methods can be divided into two categories: bottom-up (stimulus-driven) [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] methods and top-down (goal-driven) [24, 25] methods. Most bottom-up detection methods rely on low level visual cues such as color, intensity, and orientation, the main advantage of such methods is fast and need not specific prior knowledge. On the contrary, top-down methods usually learn models from training examples with manually labeled ground truth. Based on the supervised learning frame work, such methods require domain specific prior knowledge. A comprehensive survey of saliency detection can be found in [26, 27], and a quantitative comparison of different methods was provided in [28, 29].
Recently, works belonging to the bottom-up methods have made significant progress due to simple and fast implementation. One representative research field is based on the contrast between image pixels or regions and their surroundings. These contrast based methods can be roughly divided into local methods [14, 16, 30] and global methods [17, 18, 20]. Local contrast-based methods consider the contrast between the pixels or regions and their local neighborhoods, whereas global contrast-based methods consider contrast relationships over the entire image.
In all of these contrast based methods, saliency is measured by sum of contrast between the pixels or regions and their surrounding. We find that spatial distribution of contrast is one important cue of saliency. Figure 1(a) is one source image with two marked _target pixels (one foreground pixel in center of red box and one background pixel in center of blue box). The spatially weighted contrast from entire image to these two _target pixels are shown in Figs. 1(c) and 1(d), darker green indicate higher contrast. We find that the sum of contrast in Fig. 1(c) and the sum of contrast in Fig. 1(d) is comparable, but the spatial distribution of contrast differ greatly. If the _target pixel is regarded as center of view, high contrast in Fig. 1(c) is concentrated in some directions(mainly in right-bottom), while Fig. 1(d) show high contrast in almost every directions. If saliency is simply defined as the sum of contrast from entire image (such as global contrast method [18]), the saliency of these two _target pixels is comparable. This result can be seen from saliency result of global contrast method [18] in Fig. 1(e). In our vision system, object with high contrast in more directions usually have higher saliency. Inspired by this phenomenon, we propose one saliency detection method via distribution of contrast in this paper. Figure 1(f). is the proposed saliency result without any post-processing, the _target pixel in red box shows greatly high saliency than the _target pixel in blue box, this result is more close to the ground-truth result Fig. 1(b). Final saliency result of Figs. 1(e) and (f) with post-processing are also shown in Figs. 1(g) and (h).
In proposed method, input image will be segmented into superpixels first, for each superpixel, contrast between this superpixel and its surrounding will be computed in every direction, relative standard deviation (RSD) of contrast in all directions is measured as cue of saliency. The main flowchart is shown in Fig. 2.
We evaluated the proposed approach on four datasets. Experimental results demonstrate the proposed method performs well when against the state-of-the-art methods.
The remainder of this paper is organized as following. Section 2 contains a review of contrast based saliency detection methods. Proposed saliency detection method and experimental results are described in Sects. 3 and 4. Finally, Sect. 5 is conclusion and future work of this paper.
2 Related Work
In human vision system, the contrast of image regions to their surroundings is highly sensitive. Many contrast based saliency detection methods have been proposed in recent years. These contrast based methods can be roughly divided into local methods and global methods.
Local contrast based methods investigate the rarity of image regions with respect to local neighborhoods. Itti et al. [14] use a difference of Gaussians approach to extract multi-scale color, intensity, and orientation information from images. This information was then used to define saliency by calculating center-surround differences. Based on this work, Harel et al. [15] propose a bottom-up visual saliency model to highlight conspicuous parts and permit combination with other importance maps. Liu et al. [31] propose multi-scale contrast by linearly combining contrast in a Gaussian image pyramid.
Global contrast based methods evaluate saliency of pixels or regions using contrast relationships over the entire image. Zhai and Shah [32] define pixel-wise saliency as the pixel’s contrast to all other pixels. Achanta et al. [17] present a frequency tuned algorithm that defines pixel saliency as its color difference from the average image color. This simple but very fast approach usually failed in complex natural images. In [18], a region-based saliency algorithm is introduced by measuring the global contrast between the _target region with respect to all other regions in the image, the saliency of one region is defined as the sum of contrast to all other regions. By avoiding the hard decision boundaries of superpixels, a soft abstraction is proposed in [33] to generate a set of large scale perceptually homogeneous regions using histogram quantization and a global Gaussian Mixture Model (GMM), such approach provides large spatial support and can more uniformly highlight the salient object.
3 Proposed Detection Method
3.1 Main Flowchart
Since directly computing pixel-level contrast is computationally expensive, we use superpixels to represent the image. Currently existing edge-preserving superpixel segment algorithms mainly include [34,35,36], in this paper we adopt the SLIC algorithm [36] for its high efficiency and apply it to over-segment the input image \( I \) into \( K \) (e.g., \( K \) = 200 in this work) parts which denoted as \( R_{i} \) (\( i = 1..K \)). One source image and superpixel generation result are shown in Figs. 2(a) and (b).
The region contrast is usually defined as color distance between _target region and other regions. Besides contrast, spatial relationships are also important in human vision system. One near region with high contrast is usually more sensitive than one far region with comparable contrast. The spatially weighted contrast between region \( R_{i} \) and \( R_{j} \) can be defined as following:
where \( D(R_{i} ,R_{j} ) \) is the color Euclidean distance between region \( R_{i} \) and \( R_{j} \) in CIE-Lab color space, \( D_{s} (R_{i} ,R_{j} ) \) is the space Euclidean distance between region \( R_{i} \) and \( R_{j} \), \( \delta_{s} \) controls the strength of spatial distance weighting.
As discussed in introduction part, region with high contrast in more directions usually have higher saliency. In order to estimate saliency of each region \( R_{i} \), we first need to calculate contrast in every direction, then we need one measure to represent the distribution of contrast in all directions.
We first considerate contrast from other regions to region \( R_{i} \) in every direction. In order to reduce computational cost, we divide all surrounding regions of _target region \( R_{i} \) into \( N \) directions. Direction from region \( R_{j} \) to \( R_{i} \) can be determined by coordinate of region center. Although there are many regions in each direction, human usually pay more attention to the region with maximum contrast. This maximum contrast can be regarded as the contrast in this direction. The maximum contrast of region \( R_{i} \) in direction \( n \) can be computed as following:
One example of the maximum contrast surrounded in each direction is shown in Fig. 2(c). Blue radical line indicate all directions, darker green indicate higher contrast, lighter green or even white indicate lower contrast.
Since region with high contrast in more directions usually have higher saliency. One measure to represent the distribution of maximum contrast surrounded is relative standard deviation (RSD) which is determined as following:
where \( Cov(MaxContrast_{i} ) \) and \( mean(MaxContrast_{i} ) \) are the variance and average of maximum contrast surrounded. If one region show high contrast in most of directions, RSD value will be lower, one example is shown in top row of Fig. 2(d). On the contrary, if this region show high contrast only in a few directions, RSD value will be higher, one example is shown in bottom row of Fig. 2(d). RSD map of all superpixels is shown in Fig. 2(e), we can find that foreground regions show lower RSD and background regions show higher RSD.
According to explanation of RSD, we can define one decreasing function to estimate saliency:
where \( \lambda \) is parameter to control saliency. Example of saliency map of all superpixels is shown in Fig. 2(f).
3.2 Saliency Smoothing
In order to reduce noisy saliency results we use a smoothing procedure to refine the saliency value. We first smooth saliency at region level as following:
where \( W_{j} \) means weight of region \( R_{j} \) defined by the number of pixels, \( D(R_{i} ,R_{j} ) \) and \( D_{s} (R_{i} ,R_{j} ) \) are the color and space distance between region \( R_{i} \) and \( R_{j} \), \( \delta_{smooth\_space} \) and \( \delta_{smooth\_color} \) controls the strength of smoothing in spatial and color domain.
Superpixel segmentation usually results in discontinuous saliency at boundary of superpixel. In order to get refined saliency, we smooth saliency at pixel level with guided-filter [37, 38]:
One proposed saliency map is shown in Fig. 3(c), region level smoothed saliency and pixel level smoothed saliency is shown in Figs. 3(d) and (e).
4 Experimental Results
4.1 Datasets
We evaluate the proposed method on four benchmark datasets. The first MSRA-10000 dataset [18, 33] consists of 10000 images, each of which has an unambiguous salient object with pixel-wise ground truth labeling. Since this widely used dataset include all images of ASD-1000 dataset [17], the evaluation on ASD-1000 dataset [17] is omitted in this paper. The second ECSSD-1000 [10] contains more salient objects under complex scenes and some images come from the challenging Berkeley-300 dataset. The third DUT-OMRON dataset [23] contains 5166 challenging images with pixel-wise ground truth annotated by five users. The final PASCAL-S dataset [42] constructed on the validation set of recent PASCAL VOC segmentation challenge. It contains 850 natural images with multiple complex objects and cluttered backgrounds. Unlike the traditional benchmarks, the PASCAL-S is believed to eliminate the dataset design bias (e.g., center bias and color contrast bias).
4.2 Evaluation Metrics
-
(1)
Precision-recall (PR) curve
For a saliency map S, we can convert it to a binary mask M and compute Precision and Recall by comparing M with ground-truth G:
where \( | \cdot | \) represent the number of non-zero entries in the mask.
One common way to bipartite S is to use a fixed threshold which changes from 0 to 255. On each threshold, a pair of precision/recall scores are computed, and are finally combined to form a precision-recall (PR) curve to describe the model performance at different situations.
-
(2)
F-measure
The F-measure is the overall performance measurement computed by the weighted harmonic of precision and recall:
As suggested by many salient object detection works, where \( \beta^{2} \) is set to 0.3 to raise more importance to the precision value.
F-Measure can be computed with adaptive or fixed threshold. We use a fixed threshold which changes from 0 to 255, the resulted PR curve can be scored by its maximal \( F_{\beta } \), which is a good summary of the detection performance.
-
(3)
Receiver operating characteristic (ROC) curve
Similar to Precision and Recall, false positive rate (FPR) and true positive rate (TPR) also can be computed when saliency map S is converted to a binary mask M with a set of fixed threshold:
where \( \overline{G} \) represent the opposite of ground-truth.
The ROC curve is the plot of TPR versus FPR by a fixed threshold which changes from 0 to 255.
-
(4)
AUC
This metric represents the area under the ROC curve and can effectively reflect the global properties of different algorithms.
4.3 Parameter Setting
We set the number of superpixels \( K \) = 200 in all the experiments. Besides, there are four parameters in the proposed algorithm are empirically chosen: \( \delta_{s} \) in (1) which controls the strength of spatial distance weighting is selected as 1.6, \( \delta_{smooth\_space} \) and \( \delta_{smooth\_color} \) in (5) which controls the strength of smoothing in spatial and color domain is selected as 0.2 and 16, and the saliency control parameter \( \lambda \) in (4) is selected as 6.0.
Since the saliency is derived from maximum contrast in N directions, the number of directions N is one important parameter. In order to demonstrate the effects of N in our approach, we evaluated saliency on widely used MSRA-10000 dataset [18, 33]. The PR curve, ROC curve, F-measure and AUC are shown in Fig. 4 with different values of N. It shows that, increasing N up to 16 significantly improves the performance, but it makes little difference beyond this. This result is consistent with our intuition. Consequently, we use 16 as the optimal value for N in all subsequent experiments.
4.4 Comparison with State-of-Art
The proposed algorithm is compared with both the classic and newest state-of-the-arts: SR [16], FT [17], RC [18], SF [20], GS [21], GC [33], MR [23], MC [39], MAP [40], MBD [41], Water [43]. To evaluate these methods, we either use results provided by authors or run their implementations based on the available codes or software.
Figure 5 shows quantitative comparison (including PR curve, ROC curve, F-measure and AUC measure) between the proposed method and some previous methods. On MSRA-10000 [18], [33] dataset, our method outperforms all state-of-the-arts including Water [43], MBD [41], and MC [39] which were the top-performing methods for saliency detection. For the DUTOMRON [23] and PASCAL-S [42] dataset, our method outperform the most of previous method except MC [39]. We present comparable F-measure with MC [39] but obviously better PR curve, ROC curve and AUC measure. On the ECSSD [10] dataset, only MAP [40] demonstrate comparable F-measure with our approach but slight worse PR curve, ROC curve and AUC measure. Our method outperforms all of other methods. Note that the RC [18] and GC [33] method is also based on region contrast while neglect distribution of contrast, the proposed method significantly outperforms these two methods on all datasets. This experimental results prove that distribution of contrast is important cue of saliency.
Figure 6 shows a few saliency maps of the evaluated methods. In first row of Fig. 6, one flower is located at center of input image, sum of contrast to petal region is larger than sum of contrast to pistil region, but petal and pistil both have high contrast surrounded in every direction. That is to say, although sum of contrast differ greatly, distribution of contrast at petal and pistil is similar. Proposed method based on distribution of contrast shows high saliency both in petal and pistil region. RC [18] and GC [33] methods based on sum of contrast demonstrate that petal have significantly higher saliency than pistil. In second row of Fig. 6, proposed method shows high saliency both in coat and skirt, while RC [18] and GC [33] methods demonstrate that skirt have higher saliency than coat. What’s more, some state-of-the-arts method (MC [39], MAP [40], and MR [23]) only can effectively handle cases with homogenous objects that result in coat is regarded as background. Besides, our model can tackle even more complicated scenarios while other methods failed, two examples is presented at sixth row and seventh row of Fig. 6.
5 Conclusions
We find that the spatial distribution of contrast is one important cue of salient object detection. Then we propose one saliency detection method via distribution of contrast in this paper. Experimental results on four benchmark dataset demonstrate the proposed method performs well when against the state-of-the-art methods. Next, optimal algorithms will be considered.
References
Siagian, C., Itti, L.: Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans. Pattern Anal. Mach. Intell. 29(2), 300–312 (2007)
Sharma, G., Jurie, F., Schmid, C.: Discriminative spatial saliency for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3506–3513, June 2012
Walther, D., Rutishauser, U., Koch, C., Perona, P.: Selective visual attention enables learning and recognition of multiple objects in cluttered scenes. Comput. Vis. Image Underst. 100(1–2), 41–63 (2005)
Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012)
Guo, C., Zhang, L.: A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans. Image Process. 19(1), 185–198 (2010)
Wang, L., Xue, J., Zheng, N., Hua, G.: Automatic salient object extraction with contextual cue. In: Proceedings of IEEE International Conference Computer Vision, pp. 105–112, November 2011
Jung, C., Kim, C.: A unified spectral-domain approach for saliency detection and its application to automatic object segmentation. IEEE Trans. Image Process. 21(3), 1272–1283 (2012)
Shen, X., Wu, Y.: A unified approach to salient object detection via low rank matrix recovery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 853–860, June 2012
Jiang, P., Ling, H., Yu, J., Peng, J.: Salient region detection by UFO: Uniqueness, focusness and objectness. In: Proceedings of IEEE International Conference of Computer Vision, pp. 1976–1983, December 2013
Yan, Q., Xu, L., Shi, J., Jia, J.: Hierarchical saliency detection. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 1155–1162, June 2013
Li, X., Lu, H., Zhang, L., Ruan, X., Yang, M.-H.: Saliency detection via dense and sparse reconstruction. In: Proceedings of IEEE International Conference of Computer Vision, pp. 2976–2983, December 2013
Liu, R., Cao, J., Lin, Z., Shan, S.: Adaptive partial differential equation learning for visual saliency detection. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 3866–3873, June 2014
Li, N., Ye, J., Ji, Y., Ling, H., Yu, J.: Saliency detection on light field. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 2806–2813 (2014)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Proceedings Advanced Neural Information Processing System, pp. 545–552 (2006)
Hou, X., Zhang, L.: Saliency detection: a spectral residual approach. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007
Achanta, R., Hemami, S., Estrada, F., Süsstrunk, S.: Frequency-tuned salient region detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1597–1604, June 2009
Cheng, M.-M., Zhang, G.-X., Mitra, N.J., Huang, X., Hu, S.-M.: Global contrast based salient region detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 409–416, June 2011
Jiang, H., Wang, J., Yuan, Z., Liu, T., Zheng, N., Li, S.: Automatic salient object segmentation based on context and shape prior. In: Proceedings of British Machine Vision Conference, pp. 1–12 (2011)
Perazzi, F., Krahenbuhl, P., Pritch, Y., Hornung, A.: Saliency filters: contrast based filtering for salient region detection. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 733–740, June 2012
Wei, Y., Wen, F., Zhu, W., Sun, J.: Geodesic saliency using background priors. In: Proceedings of 12th European Conference on Computer Vision, pp. 29–42 (2012)
Xie, Y., Lu, H., Yang, M.-H.: Bayesian saliency via low and mid level cues. IEEE Trans. Image Process. 22(5), 1689–1698 (2013)
Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.-H.: Saliency detection via graph-based manifold ranking. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 3166–3173, June 2013
Kanan, C., Tong, M.H., Zhang, L., Cottrell, G.W.: SUN: top-down saliency using natural statistics. Vis. Cognit. 17(6–7), 979–1003 (2009)
Yang, J., Yang, M.-H.: Top-down visual saliency via joint CRF and dictionary learning. In: Proceedings of IEEE Conference of Computer Vision Pattern Recognition, pp. 2296–2303, June 2012
Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 185–207 (2013)
Borji, A., Cheng, M.M., Jiang, H., Li, J.: Salient object detection: a survey, arXiv preprint. http://arxiv.org/pdf/1411.5878.pdf
Borji, A., Sihite, D.N., Itti, L.: Salient object detection: a benchmark. In: Proceedings of 12th European Conference Computer Vision, pp. 414–429 (2012)
Borji, A., Cheng, M.M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE Trans. Image Process. 24(8), 5706–5722 (2015)
Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. In: Proceedings of IEEE Conference Computer Vision Pattern Recognition, pp. 2376–2383, June 2010
Liu, T., et al.: Learning to detect a salient object. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 353–367 (2011)
Zhai, Y., Shah, M.: Visual attention detection in video sequences using spatiotemporal cues. In: Proceedings of 14th Annual ACM International Conference Multimedia, pp. 815–824 (2006)
Cheng, M.-M., Warrell, J., Lin, W.-Y., Zheng, S., Vineet, V., Crook, N.: Efficient salient region detection with soft image abstraction. In: Proceedings of IEEE International Conference of Computer Vision, pp. 1529–1536, December 2013
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)
He, K., Sun, J., Tang, X.: Guided image filtering. In: Proceedings of the IEEE European Conference Computer Vision, pp. 1–14 (2010)
He, K., Sun, J., Tang, X.: Guided Image Filtering. IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2013)
Jiang, F., Kong, B., Adeel, A., Xiao, Y., Hussain, A.: Saliency detection via bidirectional absorbing markov chain. In: Ren, J., et al. (eds.) BICS 2018. LNCS (LNAI), vol. 10989, pp. 495–505. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00563-4_48
Sun, J., Lu, H., Liu, X.: Saliency region detection based on markov absorption probabilities. IEEE Trans. Image Process. 24(5), 1639–1649 (2015)
Zhang, S.A.J., Sclaroff, S., Lin, Z., Shen, X., Price, B., Mech, R.: Minimum barrier salient object detection at 80 FPS. In: Proceedings of International Conference on Computer Vision, pp. 1404–1412, December 2015
Li, Y., Hou, X., Koch, C., Rehg, J., Yuille, A.: The secrets of salient object segmentation. In: CVPR, vol. 5 (2014)
Huang, X., Zhang, Y.: Water flow driven salient object detection at 180 FPS. Pattern Recognit. 76, 95–107 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, X. (2019). Salient Object Detection via Distribution of Contrast. In: Zhao, Y., Barnes, N., Chen, B., Westermann, R., Kong, X., Lin, C. (eds) Image and Graphics. ICIG 2019. Lecture Notes in Computer Science(), vol 11901. Springer, Cham. https://doi.org/10.1007/978-3-030-34120-6_45
Download citation
DOI: https://doi.org/10.1007/978-3-030-34120-6_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34119-0
Online ISBN: 978-3-030-34120-6
eBook Packages: Computer ScienceComputer Science (R0)