A positive feedback method based on F-measure value for Salient Object Detection

Ailing Pan ¹ Chao Dai ¹ Chen Pan ¹ Corresponding author. Email: pc916@cjlu.edu.cn Dongping Zhang ¹ This work was supported by Key Research and Development Projects in Zhejiang Province of China (NO.2021C03192, 2023C01032). Yunchao Xu ¹
¹Department of Computer Engineering, China JiLiang University, Hangzhou, Zhejiang, 310018, China

Abstract

The majority of current salient object detection (SOD) models are focused on designing a series of decoders based on fully convolutional networks (FCNs) or Transformer architectures and integrating them in a skillful manner. These models have achieved remarkable high performance and made significant contributions to the development of SOD. Their primary research objective is to develop novel algorithms that can outperform state-of-the-art models, a task that is extremely difficult and time-consuming. In contrast, this paper proposes a positive feedback method based on F-measure value for SOD, aiming to improve the accuracy of saliency prediction using existing methods. Specifically, our proposed method takes an image to be detected and inputs it into several existing models to obtain their respective prediction maps. These prediction maps are then fed into our positive feedback method to generate the final prediction result, without the need for careful decoder design or model training. Through applying the positive feedback method for decision fusion of multi-model perception results. Moreover, our method is adaptive and can be implemented based on existing models without any restrictions. Experimental results on five publicly available datasets show that our proposed positive feedback method outperforms the latest 12 methods in five evaluation metrics for saliency map prediction. Additionally, we conducted a robustness experiment, which shows that when at least one good prediction result exists in the selected existing model, our proposed approach can ensure that the prediction result is not worse. Our approach achieves a prediction speed of 20 frames per second (FPS) when evaluated on a low configuration host and after removing the prediction time overhead of inserted models. These results highlight the effectiveness, efficiency, and robustness of our proposed approach for salient object detection. Code and Saliency maps will be available.

1 Introduction

Salient object detection (SOD) aims to mimic the human visual perception system to capture the most prominent regions in given images or videos. It can serve as a pre-processing step for other realated computer vision tasks, including object tracking[1], action recognition[2], video segmentation[3], and image captioning[4].

Existing SOD models can be mainly divided into traditional algorithm-based and deep learning-based approaches. Traditional SOD methods rely on handcrafted features and use these features to predict saliency maps in a bottom-up manner. Common handcrafted features include center prior[5] and distance transform[6], which only contain low-level clues and perform poorly in complex scenes. With the emergence of deep learning techniques, SOD has made remarkable progress. Recent SOD models are mainly implemented based on fully convolutional networks (FCNs) and Transformer architectures, among which FCN is still the mainstream SOD architecture. SOD FCNs-based models mainly design a series of decoders around feature extraction, refinement or enhancement, and multi-level feature fusion and set specific loss functions to assist model training[7][8][9][10][11][12][13][14][15][16]. Inspired by the impressive performance of the Transformer in the field of natural language processing (NLP), researchers have also carried out some innovative work using the Transformer architecture in the SOD research field and achieved impressive performance improvements. These models mainly rely on the powerful long-distance feature correlation capture ability of the Transformer architecture and help the model generate more complete predictions by effectively extracting global contextual clues[17][18]. In addition, a hybrid architecture of FCN and Transformer has been proposed by integrating the advantages of both models[19][20].

The main research content of the above work is to design different network models carefully based on the characteristics of the SOD task and demonstrate the performance beyond the previous proposed models, playing an extremely important role in the rapid development process of the SOD research field. However, the process of carrying out these works is usually challenging, time-consuming, and requires difficultly obtaining higher performance than the latest methods. Therefore, in contrast to the above work, this paper proposes a positive feedback approach based on F-measure value for SOD, which is based on existing methods to obtain more accurate prediction results. The approach allows any existing method to be inserted, such as traditional algorithms and various deep learning methods, and can achieve performance beyond the inserted methods.

In summary, the main contributions of this paper are:

•

Under the current SOD research background, we have explored how to utilize existing methods rarely. This paper proposes a positive feedback approach based on F-measure value for SOD, which allows any method to be inserted and performs better than the inserted method in terms of performance. The prediction process of the approach is training-free, adaptive weights, and contains only one tunable hyperparameter.
•

Based on the F-measure value, a positive feedback process is designed, which does not require human involvement during the calculation process, and is a completely self-updating process.

2 Related Work

Existing SOD models can be mainly divided into traditional algorithm-based and deep learning-based approaches, with the latter being the current mainstream SOD method.

2.1 FCNs-based network

FCNs-based SOD models mainly design a series of decoders around feature extraction, refinement or enhancement, and multi-level feature fusion and set specific loss functions to assist model training.

The successful performance of SOD models depends on the effective integration of multi-level features. Several studies have highlighted the significant differences between the low-level and high-level features extracted from backbone networks. To address this issue, Chen et al.[21] proposed a progressive context aware feature aggregation module, while Dai et al.[10] developed a middle layer feature extraction module to achieve better feature fusion results. Moreover, the integration of multiscale features has been shown to improve SOD detection performance in various studies. For instance, Zhang et al.[15] developed a neural structure based search unit to automatically determine the multiscale features that need to be aggregated. Fang et al.[9] proposed a densely nested-based network framework to utilize multiscale high-level feature maps effectively. Meanwhile, Zhuge et al.[8] used various multiscale feature extraction methods to obtain diverse multiscale features and designed an integrity channel enhancement module to highlight salient objects. Base on work[22], Wu et al. [14] proposed a dynamic pyramid convolution to extract multiscale features instead of fixed size convolutions. Effective supervision strategy have also been employed to help models predict salient objects more accurately. For instance, Wei et al.[22] proposed pixel position aware losss to guide networks to pay more attention to local details, while Yang et al.[23] proposed progressive self-guided loss to guide network learning to more complete salient regions. Additionally, Xu et al.[16] designed a knowledge review network to roughly locate salient regions first and then finely segment salient objects. Wu et al. [7] proposed a decomposition and completion network to predict the saliency, edge, and skeleton maps respectively, and then filled the saliency map using edge and skeleton maps.

2.2 Transformer-based network

In the field of SOD, the incorporation of global context information is extremely crucial for accurately and completely predicting salient regions. Transformer-based network architectures, renowned for their ability to capture long-range dependencies, have been successfully introduced in this domain.

To this end, Liu et al.[17] proposed a unified transformer-based model for SOD, which facilitates the propagation of global context information among image patches. Meanwhile, Zhang et al.[18] developed a generative vision transformer network that generates a pixel-level uncertainty map, effectively representing the significance confidence of salient objects. Additionally, Ren et al.[24] proposed a simple yet effective deeply-transformer network that preserves more unifying global-local representations to gradually restore spatial details.

2.3 Hybrid framework-based network

To enhance the accuracy and completeness of prediction results in the field of SOD, it is important to effectively extract and utilize both global context from deep features and local context from shallow features. However, achieving this in network frameworks can be challenging. Therefore, some hybrid network frameworks have emerged as the times require.

Zhu et al.[25] proposed a deep supervised fusion transformer network, extending the applicability of FCN to the transformer architecture for the first time. They employed a transformer encoder to extract multiscale features and designed a multiscale aggregation module to aggregate these features in a coarse-to-fine manner. Similarly, Yun et al.[19] proposed a self-refined transformer network that leverages the transformer encoder to capture long-distance dependencies and designed a context refinement module. The module is employed to integrate global context with decoder features and refine and locate local details automatically. In contrast, Wang et al.[26] still used FCN as an encoder and utilized a transformer module for multi-level feature fusion to address the limited receptive field of FCN.

3 Method

3.1 Overview of the proposed Method

Refer to caption — Figure 1: The overall pipeline of the proposed method.

Figure 1 shows structure of the proposed method, which includes a multi-branch model structure and a positive feedback prediction structure. The former is used to place existing SOD model sequences, including traditional algorithm models, deep-learning supervised or unsupervised models. The positive feedback prediction structure will iteratively calculate the respective weights according to the outputs of all branches to obtain the final saliency maps, where the weight calculation process is completely adaptive. In order to verify our method more conveniently, we use the precious models in advance to generate saliency maps, which are marked as $Smp_{i}(i=1,2,3,4)$ .

3.2 Positive feedback prediction algorithm

The positive feedback prediction algorithm is summarized in Table 1, and its conceptual principle is illustrated in Figure 2. Hereinafter, $smp_{n},N$ refers to the saliency maps generated by the multi-branch models and the number of branch. $\varepsilon$ is a threshold, which is set to 0.95. The $imbinarize$ ( $\cdot$ ) is a binarization function and the $mat2grey$ is a graying function. The $computerFmeasure$ ( $\cdot$ ) is the function for calculating $F$ - $measure$ [27].

The key steps of the algorithm:
Step1. Input: outputs ( $smp_{n}$ ) of the multi-branch model structure are used as inputs of the positive feedback prediction structure, and perform the binarization function to get binary maps ( $b\_smp_{n}$ ).
Step2. Initialization: the above inputs are fused in the way of pixel-level addition, and perform the binarization function to get binary maps ( $B\_Smp_{0}$ ). At this time, the weights of each branch are consistent.
Step3. Iteration: calculate $F$ - $measure$ with the input ( $b\_smp_{n}$ ) of each brach and the latest fusion result ( $B\_Smp_{i-1}$ ). Update the weights of each branch, recalculate the fusion result ( $Smp_{i}$ ) from the new weights and perform the binarization function to get binary maps ( $B\_Smp_{i}$ ).
Step4. Judgement: calculate $F$ - $measure$ with the latest and last fusion result and compare it with $\varepsilon$ . If it is greater than the threshold, the two successive resluts are considered to be simmilar enough, and the latest fusion result is taken as the final output; Otherwise, execute Step3 again.

Table 1: Positive feedback prediction algorithm.

Input:
	$smp_{n},N$ ;
Output:
	$Smp_{i}$ ;
1:	$B\_Smp_{i-1}=imbinarize(mat2grey(\sum_{n=1}^{N}smp_{n})),i=1$ ;
2:	$b\_smp_{n}=imbinarize(smp_{n})$ ;
3:	while True do
4:	$F_{n}=computerFmeasure(b\_smp_{n},B\_Smp_{i-1})$ ;
5:	$F_{sum}=\sum_{n=1}^{N}F_{n}$ ; $\alpha_{n}={F_{n}}\setminus{F_{sum}}$ ;
6:	$Smp_{i}=\sum_{n=1}^{N}{\alpha_{n}}\cdot smp_{n};B\_Smp_{i}=imbinarize(Smp_{i})$ ;
7:	$F=computerFmeasure(B\_Smp_{i},B\_Smp_{i-1})$ ;
8:	if $F\geq\varepsilon$ then
9:	$break$ ;
10:	else
11:	$i=i+1$ ;
12:	end if
13:	end while
14:	$Smp_{i}=mat2grey(Smp_{i})$ ;
15:	return $Smp_{i}$ .

4 Experimental results and analysis

4.1 Datasets and Evaluation Metrics

To validate the efficacy of the proposed method in this paper, we performed a series of experiments on five publicly available datasets. The datasets utilized in this study are briefly described as follows: DUTS[28] comprises a total of 10,000 images for training and 5,000 images for testing, with only the test set being used in this study. DUT-OMRON[29] consists of 5,019 images with complex structures and backgrounds. HKU-IS[30] contains 4,447 maps with multiple salient objects. PASCAL-S[31] contains 850 natural images. ECSSD[32] comprises a collection of 1,000 images obtained from the internet.

To quantitatively evaluate the proposed method, we adopt six evaluation metrics as the performance measures, including mean absolute error ( $MAE$ )[33], maximum F-measure ( $mF$ ) score, and S-measure ( $Sm$ ) score[34], precision-recall ( $PR$ ) curves.

4.2 Implementation Details

To facilitate the testing of the proposed positive feedback method, we implemented the following steps. Firstly, we generated prediction maps of a multi-branch model using the model disclosed by the author and Python tools. Secondly, we evaluated the positive feedback method using Matlab tools.

To evaluate the effectiveness and characteristics of our proposed method, we conducted comparative, ablation, and robustness experiments. In the comparative experiment, we compared the performance with other state-of-the arts models. Results showed that the positive feedback mechanism improved the model’s performance on multiple metrics. In the ablation experiment, we compared the performance of the positive feedback mechanism with pixel-level addition fusion on five datasets. In the robustness experiment, we manually selected good and bad renderings from each model and input them into the positive feedback mechanism to observe the visual comparison results.

In total, we conducted two sets of experiments. In the first set, we used four branches, namely MSFNet (2021)[15], PAKRNet (2021)[16], ICONet (2022)[8], and SelReformer (2022)[19]. The performance of the 2021 methods are weaker than that of the 2022 methods. In the second set, we used two branches, namely DPNet (2022)[14] and SelReformer. The performance of these two methods is relatively close. For detailed experimental procedures and analysis, please refer to each subsection.

Table 2: Quantitative evaluation. The mean absolute error (MAE, smaller is better), maximum F-measure (mF, larger is better) and S-measure (Sm, larger is better). The results for each saliency detection method are reported on five different datasets, with the top three performing methods highlighted in red, green, and blue. It is important to note that ”Our-SSSS” and ”Our-SS” denote the results of the four-branch and two-branch models, respectively.

Methods	DUTS			ECSSD			HKU-IS			PASCAL-S			DUT-OMRON
Methods	mF	MAE	Sm	mF	MAE	Sm	mF	MAE	Sm	mF	MAE	Sm	mF	MAE	Sm
DCNet[7]	.894	.035	.895	.952	.032	.928	.939	.027	.922	.872	.062	.861	.823	.051	.845
BiconNet[35]	.888	.038	.890	.949	.034	.927	.939	.029	.923	.877	.063	.863	.817	.053	.842
DNTD[9]	.892	.033	.891	.946	.034	.922	.938	.028	.920	.878	.064	.857	.803	.051	.828
EDNet[12]	.895	.035	.892	.951	.032	.927	.941	.026	.924	.886	.062	.865	.828	.049	.850
EFNet[10]	.898	.034	.895	.948	.034	.925	.939	.027	.922	.875	.063	.864	.822	.054	.843
MRINet[11]	.899	.035	.894	.950	.032	.927	.941	.027	.922	.877	.060	.864	.829	.054	.848
RCSBNet[13]	.899	.035	.881	.944	.034	.922	.938	.027	.919	.882	.059	.860	.810	.049	.835
MSFNet[15]	.878	.034	.877	.941	.033	.915	.927	.027	.908	.863	.061	.852	.799	.050	.832
PAKRNet[16]	.907	.033	.900	.953	.032	.928	.943	.027	.924	.873	.067	.858	.834	.050	.853
ICONet[8]	.893	.037	.892	.951	.031	.931	.942	.027	.925	.884	.060	.870	.830	.059	.846
DPNet[14]	.917	.028	.912	.954	.031	.931	.950	.023	.934	.894	.054	.877	.834	.049	.853
SelReformer[19]	.916	.027	.911	.958	.027	.936	.947	.024	.931	.894	.051	.881	.837	.043	.861
Our-SSSS	.922	.032	.909	.960	.029	.941	.950	.025	.935	.898	.055	.884	.863	.043	.877
Our-SS	.931	.028	.919	.962	.028	.943	.954	.023	.939	.899	.052	.889	.859	.046	.869

4.3 Comparison with the State-of-the-arts

We compare the proposed method with 12 state-of-the-art methods, including DCNet[7], BiconNet[35], DNTD[9], EDNet[12], EFNet[10], MRINet[11], RCSBNet[13], MSFNet[15], PAKRNet[16], ICONet[8], DPNet[14], SelReformer[19]. To ensure a fair and objective comparison, we will utilize the saliency maps provided by the authors and employ the same standardized evaluation function to calculate each metric.

In Table 2, we present the quantitative comparison results based on MAE, maximum F-measure, and S-measure. Our methods demonstrate the most comprehensive performance across all five datasets, as evidenced by our superior results across all three metrics. Notably, our method, Our-SS, consistently outperforms the second-best result in terms of mF and Sm by 1.5%, 0.4%, 0.4%, 0.5%, and 0.9%, 0.8%, 0.6%, 0.9% on DUTS, ECSSD, HKU-IS, and PASCAL-S datasets, respectively. Additionally, our method, Our-SSSS, performs exceptionally well on ECSSD, HKU-IS, and PASCAL-S datasets, particularly on the DUT-OMRON dataset, where it consistently outperforms SelReformer[19], the strongest model among the four-branch structures. Moreover, in Figure 3, we provide precision-recall curves (PR) for all five datasets. Our curves demonstrate exceptional performance across most thresholds, particularly on the DUT-OMRON and DUTS datasets.

4.4 Ablation experiment

Within this subsection, we have performed ablation experiments on two datasets to demonstrate the efficacy and distinctive attributes of the proposed approach. Specifically, we have replaced the proposed positive feedback fusion with pixel-level addition and have presented the experimental outcomes in Table 3. The results reveal that our proposed method yields superior overall performance compared to the direct addition way. Notably, in the ablation experiment pertaining to the four-branch architecture, our proposed approach has demonstrated a considerably more substantial advantage, especially for the mean absolute error (MAE) metric, which has reduced by 7.2%and 7.7%, correspondingly.

Table 3: Ablation evaluation on two datasets.

Methods	PASCAL-S			DUT-OMRON
Methods	mF	MAE	Sm	mF	MAE	Sm
ablation-4S	.891	.060	.882	.863	.047	.874
Our-SSSS	.898	.055	.884	.863	.043	.877
ablation-2S	.897	.053	.888	.858	.046	.869
Our-SS	.899	.052	.889	.859	.046	.869

Based on the results of the two sets of comparative experiments on ablation, we can derive several professional conclusions. Firstly, in cases where there is a significant difference in performance between the selected methods and their placement in a multi-branch structure, the proposed method exhibits greater advantages over direct fusion. Conversely, when the performance difference between the selected methods is minor, the proposed method can only exhibit limited superiority over direct fusion. Additionally, it is important to note that the performance synthesis level achieved through positive feedback fusion is consistently superior to that of each individual model within a multi-branch structure.

4.5 Robustness experiment

To assess the proposed method’s robustness against interference, we conducted adversarial experiments on a four-branch structure. Despite the outstanding detection performance of many existing methods, their capabilities are not consistently strong, as depicted in Figure 4. To better showcase the superiority of our approach, we manually selected several sets of input predictions with both desirable and undesirable effects for the proposed method, and presented visual comparisons in Figure 4. This not only explains why the proposed method is superior to methods introduced into a multi-branch structure, but also superior to direct fusion, as it allows for automatic weight updating to achieve satisfactory results, instead of treating all branches equally. Furthermore, it should be noted that even in the absence of good predictions in the four-branch structural model, a favorable outcome may still be yielded, as demonstrated in the seventh row.

Conclusion

Distinguished from most existing efforts in salient object detection (SOD), this study focuses on enhancing the accuracy of previously established algorithms. Specifically, we introduce a positive feedback approach based F-measure value for SOD that comprises a multi-branch model structure and a positive feedback prediction structure. The method involves feeding input images into the multi-branch model structure to generate their corresponding saliency maps, and then processing these maps through the positive feedback prediction structure for obtaining the final result via positive feedback calculation. Notably, our method requires no model training, entails minimal hyperparameter tuning, and features automatic calculation. By integrating multiple existing models, our method surpasses the performance of individual models. Our proposed method achieves a prediction speed of 20 frames per second (FPS) even on a low-end host, while removing the prediction time overhead of inserted models. Our findings suggest that higher performance of the inserted models leads to better overall results, and the initialization stage is crucial due to the positive feedback mechanism. Selecting models with insignificant performance differences weakly showcases the advantages of our approach. We believe that our study is not only of significant research value but also has practical applications, and thus, merits the attention and scrutiny of future researchers.

Conflict of interests

The authors declare that they have no conflict of interests.

Acknowledgments

This work was supported by Key Research and Development Projects in Zhejiang Province of China (NO.2021C03192, 2023C01032).

Author contributions

Ailing Pan wrote the paper, conceived and designed the experiments;
Chao Dai assisted in revising paper, collected the data, performed the experiments and recorded experimental results;
Chen Pan proposed the core idea of the paper and will act as the corresponding author;
Dongping Zhang and Yunchao Xu are participants in the fund projects.
All authors agree to this submission.

Code and data availability

Code and Saliency maps will be available at: https://github.com/dc3234/PF/tree/main.

References

[1] Lee, Hyemin, and Daijin Kim. Salient region-based online object tracking. Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2018: 1170-1177.
[2] Atrish, Abhay, et al. An automated hierarchical framework for player recognition in sports image. Proceedings of the international conference on video and image processing. 2017: 103-108.
[3] Zhang, Dingwen, et al. SPFTN: A joint learning framework for localizing and segmenting objects in weakly labeled videos. IEEE transactions on pattern analysis and machine intelligence, 42(2), 2018: 475-489.
[4] Fang, Hao, et al. From captions to visual concepts and back. Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1473-1482.
[5] Jiang, Zhuolin, and Larry S. Davis. Submodular salient region detection. Proceedings of the IEEE conference on computer vision and pattern recognition, 2013: 2043-2050.
[6] Tu, Wei-Chih, et al. Real-time salient object detection with a minimum spanning tree. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 2334-2342.
[7] Wu, Zhe, Li Su, and Qingming Huang. Decomposition and completion network for salient object detection. IEEE Transactions on Image Processing, 30, 2021: 6226-6239.
[8] Zhuge, Mingchen, et al. Salient object detection via integrity learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3), 2023: 3738-3752.
[9] Fang, Chaowei, et al. Densely nested top-down flows for salient object detection. Science China Information Sciences, 65(8), 2022: 1-13.
[10] Dai, Chao, Chen Pan, and Wei He. Feature extraction and fusion network for salient object detection. Multimedia Tools and Applications, 81(23), 2022: 33955-33969.
[11] Dai, Chao, et al. Multiple refinement and integration network for Salient Object Detection. AI Communications, 35(1), 2022: 31-44.
[12] Wu, Yu-Huan, et al. EDN: Salient object detection via extremely-downsampled network. IEEE Transactions on Image Processing, 31, 2022: 3125-3136.
[13] Ke, Yun Yi, and Takahiro Tsubono. Recursive contour-saliency blending network for accurate salient object detection. Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2022: 2940-2950.
[14] Wu, Zhenyu, et al. Salient Object Detection via Dynamic Scale Routing. IEEE Transactions on Image Processing, 31, 2022: 6649-6663.
[15] Zhang, Miao, et al. Auto-msfnet: Search multi-scale fusion network for salient object detection. Proceedings of the 29th ACM international conference on multimedia. 2021.
[16] Xu, Binwei, et al. Locate globally, segment locally: A progressive architecture with knowledge review network for salient object detection. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 2021: 3004-3012.
[17] Liu, N., Zhang, N., Wan, K., Shao, L., Han, J. Visual saliency transformer. In Proceedings of the IEEE international conference on computer vision, 2021: 4722-4732.
[18] Zhang, J., Xie, J., Barnes, N., Li, P. Learning generative vision transformer with energy-based latent space for saliency prediction. Advances in Neural Information Processing Systems, 34, 2021: 15448-15463.
[19] Yun, Y. K., Lin, W. SelfReformer: Self-Refined Network with Transformer for Salient Object Detection. arXiv preprint arXiv:2205.11283, 2022.
[20] Xie, C., Xia, C., Ma, M., Zhao, Z., Chen, X., Li, J. Pyramid grafting network for one-stage high resolution saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020: 11717-11726.
[21] Chen, Z., Xu, Q., Cong, R., Huang, Q. Global context-aware progressive aggregation network for salient object detection. In Proceedings of the AAAI conference on artificial intelligence, 34(7), 2020: 10599-10606.
[22] Wei, J., Wang, S., Huang, Q. F³Net: fusion, feedback and focus for salient object detection. In Proceedings of the AAAI conference on artificial intelligence, 34(7), 2020: 12321-12328.
[23] Yang, Sheng, et al. Progressive self-guided loss for salient object detection. IEEE Transactions on Image Processing, 30, 2021: 8426-8438.
[24] Ren, S., Wen, Q., Zhao, N., Han, G., He, S. Unifying global-local representations in salient object detection with transformer. arXiv preprint arXiv:2108.02759, 2021.
[25] Zhu, H., Sun, X., Li, Y., Ma, K., Zhou, S., Zheng, Y. DFTR: Depth-supervised Fusion Transformer for Salient Object Detection. 2022.
[26] Wang, Z., Zhang, Y., Liu, Y., Wang, Z., Coleman, S., Kerr, D. TF-SOD: a novel transformer framework for salient object detection. Neural Computing and Applications, 34(14), 2022: 11789-11806.
[27] Achanta, R., Hemami, S., Estrada, F., Susstrunk, S. (2009, June). Frequency-tuned salient region detection. In 2009 IEEE conference on computer vision and pattern recognition (pp. 1597-1604). IEEE.
[28] Wang, L., Lu, H., Wang, Y., Feng, M., Wang, D., Yin, B., Ruan, X. Learning to detect salient objects with image-level supervision. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2013: 136-145.
[29] Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M. H. Saliency detection via graph-based manifold ranking. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2013: 3166-3173).
[30] Li, G., Yu, Y. Visual saliency based on multiscale deep features. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015: 5455-5463.
[31] Li, Y., Hou, X., Koch, C., Rehg, J. M., Yuille, A. L. The secrets of salient object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2014: 280-287.
[32] Shi, J., Yan, Q., Xu, L., Jia, J. Hierarchical image saliency detection on extended CSSD. IEEE transactions on pattern analysis and machine intelligence, 38(4), 2015: 717-729.
[33] Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A. Saliency filters: Contrast based filtering for salient region detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2012: 733-740.
[34] Fan, D. P., Cheng, M. M., Liu, Y., Li, T., Borji, A. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision, 2017: 4548-4557.
[35] Yang, Ziyun, Somayyeh Soltanian-Zadeh, and Sina Farsiu. BiconNet: An edge-preserved connectivity-based approach for salient object detection. Pattern recognition, 121, 2022: 1-13.