This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Rmote Sensing Image Change Detection
With Graph Interaction

Chenglong Liu
Abstract

Modern remote sensing image change detection (CD) has witnessed substantial advancements by harnessing the potent feature extraction capabilities of CNNs and Transforms. Yet, prevailing change detection techniques consistently prioritize extracting semantic features related to significant alterations, overlooking the viability of directly interacting with bitemporal image features.In this letter, we propose a bitemporal image graph Interaction network for remote sensing change detection, namely BGINet-CD. More specifically, by leveraging the concept of non-local operations and mapping the features obtained from the backbone network to the graph structure space, we propose a unified self-focus mechanism for bitemporal images. This approach enhances the information coupling between the two temporal images while effectively suppressing task-irrelevant interference.Based on a streamlined backbone architecture, namely ResNet18, our model demonstrates superior performance compared to other state-of-the-art methods (SOTA) on the GZ CD dataset. Moreover, the model exhibits an enhanced trade-off between accuracy and computational efficiency, further improving its overall effectiveness.

Index Terms:
Change detection, deep learning, Graph convolutional, remote sensing (RS) images.

I Introduction

THE change detection is an important research topic in remote sensing, as it aims to identify changes that occur between two images acquired at different times in the same geographical location. With the growing availability and utilization of remote sensing satellites, change detection has found widespread application in various fields. It is commonly used for monitoring urban sprawl[11], assessing damage caused by natural disasters[10], and conducting surveys of urban and rural areas[9].Multi-temporal remote sensing images often contain a variety of interferences due to different imaging conditions and shooting times. These interferences include spectral differences caused by varying light intensity and seasonal changes, as well as differences in shooting angles that result in varying shapes of buildings within the scene. Consequently, these factors can introduce pseudo-change during the detection process.

A strong model should accurately identify unrelated disturbances in diachronic images and distinguish natural changes from complex uncorrelated ones[2]. Existing methods for change detection can be broadly categorized into two main groups: traditional change detection methods and deepl learning methods. Traditional change detection methods encompass various approaches, including algebraic operation-based, transform-based, and classification-based methods. Algebraic operation-based methods involve direct pixel-wise comparison in multi-temporal images and the selection of an appropriate threshold to classify pixels as changed or unchanged. Image transformation techniques, such as principal component analysis (PCA)[3] and change vector analysis[4], are also commonly used. On the other hand, machine learning-based methods, such as support vector machines, random forests, and kernel regression, have emerged as alternative approaches in recent years.

Deep learning-based approaches have gained prominence due to their powerful nonlinear feature extraction capabilities. These approaches have witnessed the proposal of several attention mechanisms, such as spatial attention[8], channel attention[7], and self-attention[12], aimed at obtaining improved feature representations. Chen et al. effectively modeled the context in the visual space-time domain by visually representing the high-level concept of interest change[2]. Fang et al. addressed the exact spatial location loss from continuous sampling by combining DenseNet and NestedUnet[5]. Additionally, Chen et al. proposed a novel edge loss that enhances the network’s attention to details like boundary regions and small regions[6].

Although the methods above have shown promising results, none have explored the possibility of feature interaction between bi-temporal images prior to extracting different features. Drawing inspiration from non-local operations and DMINet [13], we have proposed a bi-temporal image graph Interaction network(BGINet) to facilitate feature interaction between bi-temporal images. This approach enhances the information coupling between bi-temporal images by extracting bi-temporal features using a backbone network and routing them through a graph interaction module, effectively suppressing uncorrelated changes.

To demonstrate the effectiveness of our method, we have utilized a simple backbone network (ResNet18) in BGINet-CD. Initially, the features obtained from the backbone network are subjected to soft clustering, with each cluster being mapped to a vertex in the graph space. The Graph Interaction Module (GIM) captures the coupling relationship between the bi-temporal images, thus enhancing the information coupling. Finally, the clusters are reprojected to their original spatial coordinates.

The contributions of this letter are mainly as follows:

  1. 1.

    We propose BGINet-CD, a graph convolutional neural network for remote sensing change detection, which effectively enhances the information coupling between diachronic images.

  2. 2.

    The introduction of the GIM module enables the capture of the coupling relationship between biphasic images, enhancing information coupling and suppressing uncorrelated changes.

  3. 3.

    We conducted quantitative and qualitative experiments on two datasets, our experiments demonstrated that our proposed BGINet-CD achieves a desirable balance between accuracy and efficiency, achieving state-of-the-art performance on the GZ dataset.Code is available at https://github.com/JackLiu-97/BGINet.git.

Refer to caption
Figure 1: An overview of the proposed BGINet.

II Proposed Method

II-A Overall Architecture

Figure 1 illustrates the architecture of the proposed BGINet. The network comprises two main components: a generic feature extractor and a Bitemporal Graph Interaction Module. To ensure model efficiency, we utilize the first three stages of ResNet-18 [15] as the generic feature extractor. The Graph Interaction Module (GIM) branch takes the bitemporal generic features extracted by the feature extractor as input and maps them into a graph structure. Inspired by nonlocal operations, we incorporate self-attentiveness in the graph space to efficiently capture remote dependencies between bitemporal features. The evolved bitemporal features are then integrated with the original features. Finally, a 1 × 1 convolutional layer with a sigmoid activation function is applied to obtain the final difference map, which serves as an indicator of change.

II-B Graph Interaction Module (GIM)

Here, we detail the Graph Interaction Module (GIM). As shown in Figure 2, GIM are composed of three operations, namely, graph projection, graph interaction, and graph reprojection. Given bitemporal 2D feature map X1h×w×cX^{1}\in\mathbb{R}^{h\times w\times c},X2h×w×cX^{2}\in\mathbb{R}^{h\times w\times c} , 𝒙ij1\boldsymbol{x}_{ij}^{1},𝒙ij2d\boldsymbol{x}_{ij}^{2}\in\mathbb{R}^{d} denotes the d-dimensional feature at (i,j)(i,j)of bitemporal. The graph embedding can be denoted as 𝒢1=(𝒱1,𝒵1,𝒜1)\mathcal{G}_{1}=(\mathcal{V}_{1},\mathcal{Z}_{1},\mathcal{A}_{1}) or 𝒢2=(𝒱2,𝒵2,𝒜2)\mathcal{G}_{2}=(\mathcal{V}_{2},\mathcal{Z}_{2},\mathcal{A}_{2}), where 𝒱1{\mathcal{V}_{1}},𝒱2{\mathcal{V}_{2}} is a set of nodes, 𝒵1{\mathcal{Z}_{1}},𝒵2{\mathcal{Z}_{2}} is the feature matrix, and 𝒜1{\mathcal{A}_{1}},𝒜2{\mathcal{A}_{2}} is the affinity matrix between the nodes.

1) Graph Projection: To establish a correspondence between maps X1X^{1} and X2X^{2}, we perform a feature mapping to obtain graphs 𝒢1\mathcal{G}_{1} and 𝒢2\mathcal{G}_{2}. In this mapping, pixels with similar features are assigned to the same vertex in the graph. For simplicity, let’s consider the feature mapping of the t1t_{1} time phase as an example. Following the approach in [14], we parameterize two matrices 𝒲|𝒱|×d\mathcal{W}\in\mathbb{R}^{|\mathcal{V}|\times d} and Σ|𝒱|×d\Sigma\in\mathbb{R}^{|\mathcal{V}|\times d}, where |𝒱||\mathcal{V}| represents the number of vertices, which can be pre-specified.

Each row wkw_{k} of 𝒲\mathcal{W} serves as the anchor point for vertex kk. To compute the soft assignment qijkq_{ij}^{k} of feature vector 𝒙ij\boldsymbol{x}_{ij} to wkw_{k}, we use the following equation:

qijk=exp((xijwk)σk22/2)kexp((xijwk)σk22/2)q_{ij}^{k}=\frac{\exp\left(-\left\|\frac{\left(x_{ij}-w_{k}\right)}{\sigma_{k}}\right\|_{2}^{2}/2\right)}{\sum_{k}\exp\left(-\left\|\frac{\left(x_{ij}-w_{k}\right)}{\sigma_{k}}\right\|_{2}^{2}/2\right)} (1)

In this equation, σk\sigma_{k} is the row vector of Σ\Sigma and is constrained to the range (0,1)(0,1) using a sigmoid function. The numerator measures the similarity between the feature vector and the anchor point, while the denominator normalizes the assignment across all vertices. Next, we encode the vertex feature matrix 𝒵R|𝒱|×d\mathcal{Z}\in R^{|\mathcal{V}|\times d} using the corresponding pixel features. For vertex kk, we compute zkz_{k}, which represents the weighted average of the residuals between feature vectors xijx_{ij} and wkw_{k}. Then, we normalize zkz_{k} to obtain zkz^{\prime}_{k}, the unit vector that forms the row vector of the feature matrix 𝒵\mathcal{Z} of graph 𝒢\mathcal{G}:

zk=zkzk2,zk=(ijqijk(𝒙ijwk)ijqijk)/σkz^{\prime}_{k}=\dfrac{z_{k}}{\|z_{k}\|_{2}},\quad z_{k}=\left(\dfrac{\sum_{ij}q_{ij}^{k}\left(\boldsymbol{x}_{ij}-w_{k}\right)}{\sum_{ij}q_{ij}^{k}}\right)/\sigma_{k} (2)

Finally, the graph affinity matrix 𝒜\mathcal{A} is computed using the equation:

𝒜=𝒵𝒵T|𝒱|×|𝒱|\mathcal{A}=\ \mathcal{ZZ}^{\text{T}}\in\mathbb{R}^{|\mathcal{V}|\times|\mathcal{V}|} (3)
Refer to caption
Figure 2: Graph Interaction Module (GIM).

2) Graph Interaction: The proposed GIM receiver map embeddings 𝒢1=(𝒱1,𝒵1,𝒜1)\mathcal{G}_{1}=(\mathcal{V}_{1},\mathcal{Z}_{1},\mathcal{A}_{1}), and 𝒢2=(𝒱2,𝒵2,𝒜2)\mathcal{G}_{2}=(\mathcal{V}_{2},\mathcal{Z}_{2},\mathcal{A}_{2}) as inputs form the feature mapping of t1t_{1} time phase and t2t_{2} time phase, respectively, and models the between graph interaction and guides the inter-graph message passing from V1V_{1} to V2V_{2} and V2V_{2} to V1V_{1}. This goal leads us to take inspiration from non-local operations and DMINet [51] and compute inter-graph dependencies with a concern mechanism. This operation of ours significantly reduces the number and computational complexity of parameters and achieves better results

GIM models the betweengraph interaction and guides the inter-graph message passing from 𝒵1\mathcal{Z}_{1} to 𝒵2\mathcal{Z}_{2} or from 𝒵2\mathcal{Z}_{2} to 𝒵1\mathcal{Z}_{1} As shown in Figure3,we use different multi-layer perceptrons (MLPs) to transform 𝒵1\mathcal{Z}_{1} to the query graph 𝒵1q\mathcal{Z}_{1}^{q},key graph 𝒵1k\mathcal{Z}_{1}^{k} and value graph 𝒵1v\mathcal{Z}_{1}^{v} transform 𝒵2\mathcal{Z}_{2} to the query graph 𝒵2q\mathcal{Z}_{2}^{q},key graph 𝒵2k\mathcal{Z}_{2}^{k} and value 𝒵2v\mathcal{Z}_{2}^{v}.Next, we unify 𝒵1q\mathcal{Z}_{1}^{q},𝒵2q\mathcal{Z}_{2}^{q} as:

𝒵q=concat(𝒵1q,𝒵2q)\mathcal{Z}^{q}=concat(\mathcal{Z}_{1}^{q},\mathcal{Z}_{2}^{q}) (4)

Then, the similarity matrix 𝒜12inter \mathcal{A}_{1\rightarrow 2}^{\text{inter }} and 𝒜21inter \mathcal{A}_{2\rightarrow 1}^{\text{inter }}K×K\in\mathbb{R}^{K\times K}is calculated by a matrix multiplication as:

𝒜12inter =fnorm (𝒵qT×𝒵2k)\displaystyle\mathcal{A}_{1\rightarrow 2}^{\text{inter }}=f_{\text{norm }}\left(\mathcal{Z}^{q\mathrm{T}}\times\mathcal{Z}_{2}^{k}\right) (5)
𝒜21inter =fnorm (𝒵qT×𝒵1k)\displaystyle\mathcal{A}_{2\rightarrow 1}^{\text{inter }}=f_{\text{norm }}\left(\mathcal{Z}^{q\mathrm{T}}\times\mathcal{Z}_{1}^{k}\right) (6)

where 𝒜CEinter K×K\mathcal{A}_{C\rightarrow E}^{\text{inter }}\in\mathbb{R}^{K\times K}. After that, we can transfer semantic information from 𝒵1\mathcal{Z}_{1} to 𝒵2\mathcal{Z}_{2} and 𝒵2\mathcal{Z}_{2} to 𝒵1\mathcal{Z}_{1} by

𝒵1=fGIM (𝒵2,𝒵1)=(𝒜21inter ×𝒵1vT)+𝒵1\displaystyle\mathcal{Z}_{1}^{\prime}=f_{\text{GIM }}\left(\mathcal{Z}_{2},\mathcal{Z}_{1}\right)=\left(\mathcal{A}_{2\rightarrow 1}^{\text{inter }}\times\mathcal{Z}_{1}^{v\mathrm{T}}\right)+\mathcal{Z}_{1} (7)
𝒵2=fGIM (𝒵1,𝒵2)=(𝒜12inter ×𝒵2vT)+𝒵2\displaystyle\mathcal{Z}_{2}^{\prime}=f_{\text{GIM }}\left(\mathcal{Z}_{1},\mathcal{Z}_{2}\right)=\left(\mathcal{A}_{1\rightarrow 2}^{\text{inter }}\times\mathcal{Z}_{2}^{v\mathrm{T}}\right)+\mathcal{Z}_{2} (8)

After performing inter-graph interaction, we conduct the intra-graph reasoning by taking𝒵1\mathcal{Z}_{1}^{\prime} and 𝒵2\mathcal{Z}_{2}^{\prime} as inputs to obtain enhanced graph representations.

𝒵1~=fGCN(𝒵1)=g(𝒜21inter 𝒵1W1)C×K\displaystyle\widetilde{\mathcal{Z}_{1}^{\prime}}=f_{\mathrm{GCN}}\left(\mathcal{Z}_{1}^{\prime}\right)=g\left(\mathcal{A}_{2\rightarrow 1}^{\text{inter }}\mathcal{Z}_{1}^{\prime}W_{1}\right)\in\mathbb{R}^{C\times K} (9)
𝒵2~=fGCN(𝒵2)=g(𝒜12inter 𝒵2W2)C×K\displaystyle\widetilde{\mathcal{Z}_{2}^{\prime}}=f_{\mathrm{GCN}}\left(\mathcal{Z}_{2}^{\prime}\right)=g\left(\mathcal{A}_{1\rightarrow 2}^{\text{inter }}\mathcal{Z}_{2}^{\prime}W_{2}\right)\in\mathbb{R}^{C\times K} (10)

3) Graph Reprojection: To map the enhanced graph representations back to the original coordinate space, we revisit the assignments from the graph projection step.

X1new=Q1𝒵1~+X1\displaystyle X_{1}^{new}=Q_{1}\widetilde{\mathcal{Z}_{1}^{\prime}}+X_{1} (11)
X2new=Q2𝒵2~+X2\displaystyle X_{2}^{new}=Q_{2}\widetilde{\mathcal{Z}_{2}^{\prime}}+X_{2} (12)

III Experiments and Results

III-A Experimental Dataset and Parameter Setting

The WHU Building Change Detection Dataset[16]:The data consists of two aerial images of two different time phases and the exact location, which contains 12796 buildings in 20.5 km2 with a resolution of 0.2 m and a size of 32570x15354. We crop the images to 256x256 size and randomly divide the training, validation, and test sets: 6096/762/762.

Guangzhou Dataset(GZ-CD)[17]: The dataset was collected from 2006-2019, covering the suburbs of Guangzhou, China, and to facilitate the generation of image pairs, the Google Earth service of BIGEMAP software was used to collect 19 seasonally varying VHR image pairs with a spatial resolution of 0.55 m and a size range of 1006x1168 pixels to 4936x5224.We crop the images to 256x256 size and randomly divide the training, validation, and test sets: 2876/353/374.

In the experiment, the number of vertices |𝒱||\mathcal{V}| is set to 32. We utilize the AdamW optimizer with weight decay 1e-4 and a polynomial schedule, where the initial learning rate is set to 0.0004. The total number of iterations is set to 100. The GPU used for the experiment is an NVIDIA V100.In this experiment, we employ the joint loss function consisting of Focal loss and Dice loss. For Focal loss, we set the parameters γ\gamma and α\alpha to 2.0 and 0.2, respectively.The overall loss function is formulated as follows:

Ltotal=λ1Focal(GT,σ(p))+λ2Dice(GT,σ(p))L_{total}=\lambda_{1}\operatorname{Focal}(GT,\sigma(p))+\lambda_{2}\operatorname{Dice}(GT,\sigma(p)) (13)

Here, σ\sigma represents the sigmoid activation function. We denote the model prediction as pp, and λ1\lambda_{1} and λ2\lambda_{2} as the coefficients of the Focal loss and Dice loss, respectively. In this experiment, we set λ1\lambda_{1} and λ2\lambda_{2} to 0.5 and 1, respectively.

III-B Experimental Results and Comparison

Our comparison experiments evaluate the trade-off between accuracy, number of parameters, and floating-point operations (FLOP). The quantitative results for the two datasets are presented in Table I and Table II, respectively. The best-performing model in each column is highlighted in bold, while the second-best model is underlined. The tables provide a comprehensive view of the performance metrics, allowing us to analyze the accuracy and efficiency of different models

TABLE I: QUANTITATIVE RESULTS OF THE CD METHODS ON THE WHU DATASET
Model Precision (%) Recall (%) F1-score Params (m)
FC-EF[19] 91.19 85.30 88.15 1.1
FC-Siam-conc[19] 69.04 84.93 76.17 1.55
FC-Siam-diff[19] 60.66 91.24 72.87 1.35
STANet[18] 91.73 73.39 81.54 12.18
SNUNet 75.23 89.12 81.59 12.04
BIT 87.44 90.24 88.82 3.55
DMINet 93.84 86.25 88.69 6.24
BGINEt 91.84 90.22 91.02 2.88
TABLE II: QUANTITATIVE RESULTS OF THE CD METHODS ON THE GZ DATASET
Model Precision (%) Recall (%) F1-score Params (m)
FC-EF[19] 85.92 78.43 82.00 1.1
FC-Siam-conc[19] 87.63 83.50 85.52 1.55
FC-Siam-diff[19] 90.47 79.45 84.60 1.35
STANet[18] 88.40 78.84 83.35 12.18
SNUNet 89.61 84.40 86.92 12.04
BIT 86.38 88.60 87.48 3.55
DMINet 89.31 83.901 86.52 6.24
BGINEt 88.52 88.00 88.25 2.88
Refer to caption
Figure 3: Figure 4 showcases the qualitative results of different methods on the GZ-CD and WHU-CD datasets. The top two rows correspond to the WHU-CD dataset, while the bottom two rows pertain to the GZ-CD dataset. Each subfigure represents a specific element, as follows: (a) RGB image of the first date, (b) RGB image of the second date, (c) FC-EF, (d) FC-Siam-conc, (e) FC-Siam-diff, (f) STANet, (g) SNUNet, (h) BIT, (i) DMINet, (j) BGINet, and (k) ground-truth labels. The true positive (TP) regions are highlighted in yellow, the false positive (FP) regions in red, and the false negative (FN) regions in blue. The true negative (TN) regions are depicted in black. For a more detailed observation, we recommend zooming in on the figure.

In Fig. 3 and Fig. 4, we present a tradeoff analysis between the F1 score and computational cost for various classical remote sensing image change detection methods and recently proposed methods (e.g., STANet, BIT, DMINet, etc.). These figures provide valuable insights into different approaches’ performance and computational requirements.The results indicate that while the aforementioned methods demonstrate good performance, they often have a significant computational overhead. On the other hand, baseline models like FC-EF require minimal computational resources but fall short in accuracy. Our proposed method, however, achieves a favorable balance between accuracy and computational overhead.By examining the figures, it is evident that our method outperforms the baseline models in terms of accuracy while still maintaining a manageable computational cost. This highlights the effectiveness and efficiency of our approach in remote sensing image change detection tasks.

Refer to caption
Figure 4: Comparing the complexity of different methods in terms of FlOP (computational cost) and F1-score on the GZ-CD dataset.
Refer to caption
Figure 5: Comparing the complexity of different methods in terms of FlOP (computational cost) and F1-score on the WHU-CD dataset.
TABLE III: QUANTITATIVE RESULTS OF THE CD METHODS ON THE WHU DATASET
Model Precision (%) Recall (%) F1-score
BaselinegzBaseline_{gz} 87.47 85.37 86.41
BGINEtgzBGINEt_{gz} 88.52 88.00 88.25
BaselinewhuBaseline_{whu} 85.70 92.20 88.80
BGINEtwhuBGINEt_{whu} 91.84 90.22 91.02

III-C Ablation Experiment

To validate the effectiveness of our proposed BGINet, we conducted ablation experiments on network knots using two publicly available datasets. The results of these experiments are presented in Table 3. As a baseline model, we selected resnet18 and utilized only the first three stages of the network. Upon introducing GIM (Graph Interaction Module) on the GZ-CD dataset, we observed improved precision, recall, and F1 scores by 1.05%,2.63%,and1.84%1.05\%,2.63\%,and1.84\%, respectively. On the WHU dataset, although there was a decrease in the recall by 2.02%2.02\%, we observed improvements in precision and F1 scores by 6.14%6.14\% and 2.2%2.2\%, respectively. These findings highlight the positive impact of our proposed BGINet, particularly when integrated with GIM, in enhancing the performance of remote sensing image change detection. The ablation experiments demonstrate the introduced graph interaction module’s significance and contribution to improving precision, recall, and overall F1 score on both datasets. Overall, the results affirm the effectiveness of our approach and its potential for advancing the field of remote sensing image change detection.

IV Conclusion

In this letter, we introduce a novel method for improving change detection accuracy in dual-temporal images. By mapping the image features into the graph space and utilizing the Graph Interaction Module (GIM), we enable effective feature interaction and mitigate the influence of pseudo-change. Our proposed approach achieves a lightweight implementation, offering a favorable tradeoff between accuracy, number of parameters, and computational complexity. Experimental results on two publicly available datasets demonstrate the effectiveness of our model, with our approach achieving state-of-the-art performance on the GZ-CD dataset.

References

  • [1] M. Liu, Z. Chai, H. Deng, and R. Liu, “A cnn-transformer network with multiscale context aggregation for fine-grained cropland change detection,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 4297–4306, 2022, doi:10.1109/JSTARS.2022.3177235.
  • [2] H. Chen, Z. Qi, and Z. Shi, “Remote sensing image change detection with transformers,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022, doi:10.1109/TGRS.2021.3095166.
  • [3] T. Celik, “Unsupervised change detection in satellite images using principal component analysis and kk-means clustering,” IEEE Geoscience and Remote Sensing Letters, vol. 6, no. 4, pp. 772–776, 2009, doi:10.1109/LGRS.2009.2025059.
  • [4] N. Zerrouki, F. Harrou, and Y. Sun, “Statistical monitoring of changes to land cover,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 6, pp. 927–931, 2018, doi:10.1109/LGRS.2018.2817522.
  • [5] S. Fang, K. Li, J. Shao, and Z. Li, “Snunet-cd: A densely connected siamese network for change detection of vhr images,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022, doi:10.1109/LGRS.2021.3056416.
  • [6] H. Chen, F. Pu, R. Yang, R. Tang, and X. Xu, “Rdp-net: Region detail preserving network for change detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–10, 2022, doi:10.1109/TGRS.2022.3227098.
  • [7] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
  • [8] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19.
  • [9] M. Liu, Z. Chai, H. Deng, and R. Liu, “A cnn-transformer network with multiscale context aggregation for fine-grained cropland change detection,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 4297–4306, 2022, doi:10.1109/JSTARS.2022.3177235.
  • [10] A. Abuelgasim, W. Ross, S. Gopal, and C. Woodcock, “Change detection using adaptive fuzzy neural networks: Environmental damage assessment after the gulf war,” Remote Sensing of Environment, 1999.
  • [11] A. Frick and S. Tervooren, “A framework for the long-term monitoring of urban green volume based on multi-temporal and multi-sensoral remote sensing data,” Journal of geovisualization and spatial analysis, vol. 3, no. 1, p. 6, 2019.
  • [12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  • [13] Y. Feng, J. Jiang, H. Xu, and J. Zheng, “Change detection on remote sensing images using dual-branch multilevel intertemporal network,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–15, 2023.
  • [14] Y. Li and A. Gupta, “Beyond grids: Learning graph representations for visual recognition,” in Neural Information Processing Systems, 2018.
  • [15] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  • [16] S. Ji, S. Wei, and M. Lu, “Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set,” IEEE Transactions on geoscience and remote sensing, vol. 57, no. 1, pp. 574–586, 2018.
  • [17] D. Peng, L. Bruzzone, Y. Zhang, H. Guan, H. Ding, and X. Huang, “Semicdnet: A semisupervised convolutional neural network for change detection in high resolution remote-sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 7, pp. 5891–5906, 2021, doi:10.1109/TGRS.2020.3011913.
  • [18] H. Chen and Z. Shi, “A spatial-temporal attention-based method and a new dataset for remote sensing image change detection,” Remote Sensing, vol. 12, no. 10, 2020, doi:10.3390/rs12101662. [Online]. Available: https://www.mdpi.com/2072-4292/12/10/1662
  • [19] R. Caye Daudt, B. Le Saux, and A. Boulch, “Fully convolutional siamese networks for change detection,” in 2018 25th IEEE International Conference on Image Processing (ICIP), 2018, pp. 4063–4067, doi:10.1109/ICIP.2018.8451652.