Keywords

1 Introduction

Salient object detection aims to simulate the human visual system for detecting pixels or regions that most attractive. The detection result can be used for numerous computer vision task such as image classification [1, 2], object detection and recognition [3, 4], image compression [5], and image segmentation [6, 7].

Motivated by increasing application demand, a number of algorithms have been proposed [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]. These methods can be divided into two categories: bottom-up (stimulus-driven) [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] methods and top-down (goal-driven) [24, 25] methods. Most bottom-up detection methods rely on low level visual cues such as color, intensity, and orientation, the main advantage of such methods is fast and need not specific prior knowledge. On the contrary, top-down methods usually learn models from training examples with manually labeled ground truth. Based on the supervised learning frame work, such methods require domain specific prior knowledge. A comprehensive survey of saliency detection can be found in [26, 27], and a quantitative comparison of different methods was provided in [28, 29].

Recently, works belonging to the bottom-up methods have made significant progress due to simple and fast implementation. One representative research field is based on the contrast between image pixels or regions and their surroundings. These contrast based methods can be roughly divided into local methods [14, 16, 30] and global methods [17, 18, 20]. Local contrast-based methods consider the contrast between the pixels or regions and their local neighborhoods, whereas global contrast-based methods consider contrast relationships over the entire image.

In all of these contrast based methods, saliency is measured by sum of contrast between the pixels or regions and their surrounding. We find that spatial distribution of contrast is one important cue of saliency. Figure 1(a) is one source image with two marked _target pixels (one foreground pixel in center of red box and one background pixel in center of blue box). The spatially weighted contrast from entire image to these two _target pixels are shown in Figs. 1(c) and 1(d), darker green indicate higher contrast. We find that the sum of contrast in Fig. 1(c) and the sum of contrast in Fig. 1(d) is comparable, but the spatial distribution of contrast differ greatly. If the _target pixel is regarded as center of view, high contrast in Fig. 1(c) is concentrated in some directions(mainly in right-bottom), while Fig. 1(d) show high contrast in almost every directions. If saliency is simply defined as the sum of contrast from entire image (such as global contrast method [18]), the saliency of these two _target pixels is comparable. This result can be seen from saliency result of global contrast method [18] in Fig. 1(e). In our vision system, object with high contrast in more directions usually have higher saliency. Inspired by this phenomenon, we propose one saliency detection method via distribution of contrast in this paper. Figure 1(f). is the proposed saliency result without any post-processing, the _target pixel in red box shows greatly high saliency than the _target pixel in blue box, this result is more close to the ground-truth result Fig. 1(b). Final saliency result of Figs. 1(e) and (f) with post-processing are also shown in Figs. 1(g) and (h).

Fig. 1.
figure 1

Motivation of proposed method. (a) Input image with two marked _target pixels, one foreground pixel in the center of red box and one background pixel in the center of blue box. (b) Saliency ground truth. (c) (d) Contrast from entire image to two _target pixels, higher contrast is shown in darker green. The sum of contrast for two _target pixels is comparable, while distribution differ greatly. (e) Saliency computed by sum of contrast [18] that two _target pixels show comparable saliency. (f) Proposed saliency computed by distribution of contrast. (g) (h) Final result of (e) (f) with post-processing. (Color figure online)

In proposed method, input image will be segmented into superpixels first, for each superpixel, contrast between this superpixel and its surrounding will be computed in every direction, relative standard deviation (RSD) of contrast in all directions is measured as cue of saliency. The main flowchart is shown in Fig. 2.

Fig. 2.
figure 2

Main flowchart of proposed method.

We evaluated the proposed approach on four datasets. Experimental results demonstrate the proposed method performs well when against the state-of-the-art methods.

The remainder of this paper is organized as following. Section 2 contains a review of contrast based saliency detection methods. Proposed saliency detection method and experimental results are described in Sects. 3 and 4. Finally, Sect. 5 is conclusion and future work of this paper.

2 Related Work

In human vision system, the contrast of image regions to their surroundings is highly sensitive. Many contrast based saliency detection methods have been proposed in recent years. These contrast based methods can be roughly divided into local methods and global methods.

Local contrast based methods investigate the rarity of image regions with respect to local neighborhoods. Itti et al. [14] use a difference of Gaussians approach to extract multi-scale color, intensity, and orientation information from images. This information was then used to define saliency by calculating center-surround differences. Based on this work, Harel et al. [15] propose a bottom-up visual saliency model to highlight conspicuous parts and permit combination with other importance maps. Liu et al. [31] propose multi-scale contrast by linearly combining contrast in a Gaussian image pyramid.

Global contrast based methods evaluate saliency of pixels or regions using contrast relationships over the entire image. Zhai and Shah [32] define pixel-wise saliency as the pixel’s contrast to all other pixels. Achanta et al. [17] present a frequency tuned algorithm that defines pixel saliency as its color difference from the average image color. This simple but very fast approach usually failed in complex natural images. In [18], a region-based saliency algorithm is introduced by measuring the global contrast between the _target region with respect to all other regions in the image, the saliency of one region is defined as the sum of contrast to all other regions. By avoiding the hard decision boundaries of superpixels, a soft abstraction is proposed in [33] to generate a set of large scale perceptually homogeneous regions using histogram quantization and a global Gaussian Mixture Model (GMM), such approach provides large spatial support and can more uniformly highlight the salient object.

3 Proposed Detection Method

3.1 Main Flowchart

Since directly computing pixel-level contrast is computationally expensive, we use superpixels to represent the image. Currently existing edge-preserving superpixel segment algorithms mainly include [34,35,36], in this paper we adopt the SLIC algorithm [36] for its high efficiency and apply it to over-segment the input image \( I \) into \( K \) (e.g., \( K \) = 200 in this work) parts which denoted as \( R_{i} \) (\( i = 1..K \)). One source image and superpixel generation result are shown in Figs. 2(a) and (b).

The region contrast is usually defined as color distance between _target region and other regions. Besides contrast, spatial relationships are also important in human vision system. One near region with high contrast is usually more sensitive than one far region with comparable contrast. The spatially weighted contrast between region \( R_{i} \) and \( R_{j} \) can be defined as following:

$$ Contrast(R_{i} ,R_{j} ) = exp(\frac{{D_{s} (R_{i} ,R_{j} )}}{{ - \delta_{s}^{2} }})D(R_{i} ,R_{j} ) $$
(1)

where \( D(R_{i} ,R_{j} ) \) is the color Euclidean distance between region \( R_{i} \) and \( R_{j} \) in CIE-Lab color space, \( D_{s} (R_{i} ,R_{j} ) \) is the space Euclidean distance between region \( R_{i} \) and \( R_{j} \), \( \delta_{s} \) controls the strength of spatial distance weighting.

As discussed in introduction part, region with high contrast in more directions usually have higher saliency. In order to estimate saliency of each region \( R_{i} \), we first need to calculate contrast in every direction, then we need one measure to represent the distribution of contrast in all directions.

We first considerate contrast from other regions to region \( R_{i} \) in every direction. In order to reduce computational cost, we divide all surrounding regions of _target region \( R_{i} \) into \( N \) directions. Direction from region \( R_{j} \) to \( R_{i} \) can be determined by coordinate of region center. Although there are many regions in each direction, human usually pay more attention to the region with maximum contrast. This maximum contrast can be regarded as the contrast in this direction. The maximum contrast of region \( R_{i} \) in direction \( n \) can be computed as following:

$$ MaxContrast_{i} (n) = \mathop {max}\limits_{{R_{j} \;at\;direction\;n\;of\;R_{i} }} (Contrast(R_{i} ,R_{j} )) $$
(2)

One example of the maximum contrast surrounded in each direction is shown in Fig. 2(c). Blue radical line indicate all directions, darker green indicate higher contrast, lighter green or even white indicate lower contrast.

Since region with high contrast in more directions usually have higher saliency. One measure to represent the distribution of maximum contrast surrounded is relative standard deviation (RSD) which is determined as following:

$$ RSD_{i} = \frac{{Cov(MaxContrast_{i} )}}{{mean(MaxContrast_{i} )}} $$
(3)

where \( Cov(MaxContrast_{i} ) \) and \( mean(MaxContrast_{i} ) \) are the variance and average of maximum contrast surrounded. If one region show high contrast in most of directions, RSD value will be lower, one example is shown in top row of Fig. 2(d). On the contrary, if this region show high contrast only in a few directions, RSD value will be higher, one example is shown in bottom row of Fig. 2(d). RSD map of all superpixels is shown in Fig. 2(e), we can find that foreground regions show lower RSD and background regions show higher RSD.

According to explanation of RSD, we can define one decreasing function to estimate saliency:

$$ S_{i} = e^{{ - \lambda \cdot RSD_{i} }} $$
(4)

where \( \lambda \) is parameter to control saliency. Example of saliency map of all superpixels is shown in Fig. 2(f).

3.2 Saliency Smoothing

In order to reduce noisy saliency results we use a smoothing procedure to refine the saliency value. We first smooth saliency at region level as following:

$$ S_{i}^{'} = \sum\limits_{j = 1}^{K} {S_{j} \cdot W_{j} \, \cdot \,\exp ( - \frac{{D_{s} (R_{i} ,R_{j} )}}{{\delta_{smooth\_space}^{2} }} - \frac{{D(R_{i} ,R_{j} )}}{{\delta_{smooth\_color}^{2} }})} $$
(5)

where \( W_{j} \) means weight of region \( R_{j} \) defined by the number of pixels, \( D(R_{i} ,R_{j} ) \) and \( D_{s} (R_{i} ,R_{j} ) \) are the color and space distance between region \( R_{i} \) and \( R_{j} \), \( \delta_{smooth\_space} \) and \( \delta_{smooth\_color} \) controls the strength of smoothing in spatial and color domain.

Superpixel segmentation usually results in discontinuous saliency at boundary of superpixel. In order to get refined saliency, we smooth saliency at pixel level with guided-filter [37, 38]:

One proposed saliency map is shown in Fig. 3(c), region level smoothed saliency and pixel level smoothed saliency is shown in Figs. 3(d) and (e).

Fig. 3.
figure 3

Saliency smoothing.

4 Experimental Results

4.1 Datasets

We evaluate the proposed method on four benchmark datasets. The first MSRA-10000 dataset [18, 33] consists of 10000 images, each of which has an unambiguous salient object with pixel-wise ground truth labeling. Since this widely used dataset include all images of ASD-1000 dataset [17], the evaluation on ASD-1000 dataset [17] is omitted in this paper. The second ECSSD-1000 [10] contains more salient objects under complex scenes and some images come from the challenging Berkeley-300 dataset. The third DUT-OMRON dataset [23] contains 5166 challenging images with pixel-wise ground truth annotated by five users. The final PASCAL-S dataset [42] constructed on the validation set of recent PASCAL VOC segmentation challenge. It contains 850 natural images with multiple complex objects and cluttered backgrounds. Unlike the traditional benchmarks, the PASCAL-S is believed to eliminate the dataset design bias (e.g., center bias and color contrast bias).

4.2 Evaluation Metrics

  1. (1)

    Precision-recall (PR) curve

For a saliency map S, we can convert it to a binary mask M and compute Precision and Recall by comparing M with ground-truth G:

$$ Precision = \frac{|M \cap G|}{|M|},\;\;Recall = \frac{|M \cap G|}{|G|} $$
(6)

where \( | \cdot | \) represent the number of non-zero entries in the mask.

One common way to bipartite S is to use a fixed threshold which changes from 0 to 255. On each threshold, a pair of precision/recall scores are computed, and are finally combined to form a precision-recall (PR) curve to describe the model performance at different situations.

  1. (2)

    F-measure

The F-measure is the overall performance measurement computed by the weighted harmonic of precision and recall:

$$ F_{\beta } = \frac{{(1 + \beta^{2} )\Pr ecision \times \text{Re} call}}{{\beta^{2} \Pr ecision + \text{Re} call}} $$
(7)

As suggested by many salient object detection works, where \( \beta^{2} \) is set to 0.3 to raise more importance to the precision value.

F-Measure can be computed with adaptive or fixed threshold. We use a fixed threshold which changes from 0 to 255, the resulted PR curve can be scored by its maximal \( F_{\beta } \), which is a good summary of the detection performance.

  1. (3)

    Receiver operating characteristic (ROC) curve

Similar to Precision and Recall, false positive rate (FPR) and true positive rate (TPR) also can be computed when saliency map S is converted to a binary mask M with a set of fixed threshold:

$$ TPR = \frac{|M \cap G|}{|G|},\;\;FPR = \frac{{|M \cap \overline{G} |}}{{|\overline{G} |}} $$
(8)

where \( \overline{G} \) represent the opposite of ground-truth.

The ROC curve is the plot of TPR versus FPR by a fixed threshold which changes from 0 to 255.

  1. (4)

    AUC

This metric represents the area under the ROC curve and can effectively reflect the global properties of different algorithms.

4.3 Parameter Setting

We set the number of superpixels \( K \) = 200 in all the experiments. Besides, there are four parameters in the proposed algorithm are empirically chosen: \( \delta_{s} \) in (1) which controls the strength of spatial distance weighting is selected as 1.6, \( \delta_{smooth\_space} \) and \( \delta_{smooth\_color} \) in (5) which controls the strength of smoothing in spatial and color domain is selected as 0.2 and 16, and the saliency control parameter \( \lambda \) in (4) is selected as 6.0.

Since the saliency is derived from maximum contrast in N directions, the number of directions N is one important parameter. In order to demonstrate the effects of N in our approach, we evaluated saliency on widely used MSRA-10000 dataset [18, 33]. The PR curve, ROC curve, F-measure and AUC are shown in Fig. 4 with different values of N. It shows that, increasing N up to 16 significantly improves the performance, but it makes little difference beyond this. This result is consistent with our intuition. Consequently, we use 16 as the optimal value for N in all subsequent experiments.

Fig. 4.
figure 4

Number of contrast directions.

4.4 Comparison with State-of-Art

The proposed algorithm is compared with both the classic and newest state-of-the-arts: SR [16], FT [17], RC [18], SF [20], GS [21], GC [33], MR [23], MC [39], MAP [40], MBD [41], Water [43]. To evaluate these methods, we either use results provided by authors or run their implementations based on the available codes or software.

Figure 5 shows quantitative comparison (including PR curve, ROC curve, F-measure and AUC measure) between the proposed method and some previous methods. On MSRA-10000 [18], [33] dataset, our method outperforms all state-of-the-arts including Water [43], MBD [41], and MC [39] which were the top-performing methods for saliency detection. For the DUTOMRON [23] and PASCAL-S [42] dataset, our method outperform the most of previous method except MC [39]. We present comparable F-measure with MC [39] but obviously better PR curve, ROC curve and AUC measure. On the ECSSD [10] dataset, only MAP [40] demonstrate comparable F-measure with our approach but slight worse PR curve, ROC curve and AUC measure. Our method outperforms all of other methods. Note that the RC [18] and GC [33] method is also based on region contrast while neglect distribution of contrast, the proposed method significantly outperforms these two methods on all datasets. This experimental results prove that distribution of contrast is important cue of saliency.

Fig. 5.
figure 5

Quantitative comparison between the proposed method and some previous methods. (a) precision-recall curves. (b) ROC curves. (c) F-Measure and AUC.

Figure 6 shows a few saliency maps of the evaluated methods. In first row of Fig. 6, one flower is located at center of input image, sum of contrast to petal region is larger than sum of contrast to pistil region, but petal and pistil both have high contrast surrounded in every direction. That is to say, although sum of contrast differ greatly, distribution of contrast at petal and pistil is similar. Proposed method based on distribution of contrast shows high saliency both in petal and pistil region. RC [18] and GC [33] methods based on sum of contrast demonstrate that petal have significantly higher saliency than pistil. In second row of Fig. 6, proposed method shows high saliency both in coat and skirt, while RC [18] and GC [33] methods demonstrate that skirt have higher saliency than coat. What’s more, some state-of-the-arts method (MC [39], MAP [40], and MR [23]) only can effectively handle cases with homogenous objects that result in coat is regarded as background. Besides, our model can tackle even more complicated scenarios while other methods failed, two examples is presented at sixth row and seventh row of Fig. 6.

Fig. 6.
figure 6

Visual comparison between the proposed method and some state-of-the-art methods on four datasets.

5 Conclusions

We find that the spatial distribution of contrast is one important cue of salient object detection. Then we propose one saliency detection method via distribution of contrast in this paper. Experimental results on four benchmark dataset demonstrate the proposed method performs well when against the state-of-the-art methods. Next, optimal algorithms will be considered.