This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

MATA*: Combining Learnable Node Matching with A* Algorithm for Approximate Graph Edit Distance Computation

Junfeng Liu Beihang UniversityBeijingChina liujunfeng@buaa.edu.cn Min Zhou Huawei Noah’s Ark LabShenzhenChina zhoumin27@huawei.com Shuai Ma Beihang UniversityBeijingChina mashuai@buaa.edu.cn  and  Lujia Pan Huawei Noah’s Ark LabShenzhenChina panlujia@huawei.com
(2023)
Abstract.

Graph Edit Distance (GED) is a general and domain-agnostic metric to measure graph similarity, widely used in graph search or retrieving tasks. However, the exact GED computation is known to be 𝖭𝖯\mathsf{NP}-complete. For instance, the widely used A* algorithms explore the entire search space to find the optimal solution which inevitably suffers scalability issues. Learning-based methods apply graph representation techniques to learn the GED by formulating a regression task, which can not recover the edit path and lead to inaccurate GED approximation (i.e., the predicted GED is smaller than the exact). To this end, in this work, we present a data-driven hybrid approach MATA* for approximate GED computation based on Graph Neural Networks (GNNs) and A* algorithms, which models from the perspective of learning to match nodes instead of directly regressing GED. Specifically, aware of the structure-dominant operations (i.e., node and edge insertion/deletion) property in GED computation, a structure-enhanced GNN is firstly designed to jointly learn local and high-order structural information for node embeddings for node matchings. Second, top-kk candidate nodes are produced via a differentiable top-kk operation to enable the training for node matchings, which is adhering to another property of GED, i.e., multiple optimal node matchings. Third, benefiting from the candidate nodes, MATA* only performs on the promising search directions, reaching the solution efficiently. Finally, extensive experiments show the superiority of MATA* as it significantly outperforms the combinatorial search-based, learning-based and hybrid methods and scales well to large-size graphs.

combinatorial optimization; graph edit distance; graph neural networks; node matching; A* algorithm
journalyear: 2023price: 15.00ccs: Mathematics of computing Approximation algorithmsccs: Computing methodologies Supervised learningccs: Computing methodologies Discrete space search

1. Introduction

Graph-structured data are ubiquitous, ranging from chemical compounds (Carlos et al., 2019), social networks (Fey et al., 2020), computer vision (Yan et al., 2020) to programming languages (Li et al., 2019). A recurrent and pivotal task when working with these graph-structured applications is assessing how similar or different two given graphs are, among which graph edit distance (GED) is a widely used metric due to its flexible and domain-agnostic merits (Li et al., 2019; Chang et al., 2020, 2023; Bai et al., 2019). In general, GED computation refers to finding the minimum cost of edit operations (node insertion/deletion, edge insertion/deletion, and node/edge relabeling) to transform the source graph to a target one (Blumenthal et al., 2020) (see Fig. 1 for an example).

The exact GED computation guarantees optimality which is however 𝖭𝖯\mathsf{NP}-complete (Blumenthal et al., 2020). It typically treats all possible edit operations as a pathfinding problem where A* algorithm (a best-first search method) is widely used to expand the search (Kim et al., 2019; Chen et al., 2019; Chang et al., 2020, 2023). These solutions mainly focus on pruning unpromising search spaces using A* algorithm or filtering dissimilar graph pairs to speed up GED computation. However, they all run in factorial time in the worst case due to the exhaustiveness of their search spaces, such that they cannot reliably compute the GED of graphs with more than 1616 nodes in a reasonable time (Blumenthal and Gamper, 2020).

Refer to caption
Figure 1. An edit path from source graph 𝒢1\mathcal{G}_{1} to target graph 𝒢2\mathcal{G}_{2}. Different colors represent the nodes with different labels. Assume the edit costs are uniform, and 𝗀𝖾𝖽\mathsf{ged} (𝒢1,𝒢2)=4(\mathcal{G}_{1},\mathcal{G}_{2})=4. That is at least four edit operations are required to transform 𝒢1\mathcal{G}_{1} to 𝒢2\mathcal{G}_{2}, where the node mapping corresponding to the edit path is {u1,u2,u3,u4}\{u_{1},u_{2},u_{3},u_{4}\} to {v1,v2,v3,v4}\{v_{1},v_{2},v_{3},v_{4}\}. (1) Essentially, there are two optimal node matchings for 𝗀𝖾𝖽\mathsf{ged} (𝒢1,𝒢2)=4(\mathcal{G}_{1},\mathcal{G}_{2})=4, and another node mapping is {u1,u2,u3,u4}\{u_{1},u_{2},u_{3},u_{4}\} to {v2,v3,v4,v5}\{v_{2},v_{3},v_{4},v_{5}\}. (2) Among the edit operations, there includes one attribute operation (i.e., relabel u3u_{3}) and three structure operations.

Some recent works for the approximate GED computation have been proposed with the help of the graph representation techniques, which can be divided into two main categories: Learning-based models (Li et al., 2019; Bai et al., 2019; Bai and Zhao, 2021; Bai et al., 2020; Peng et al., 2021; Zhang et al., 2021; Ranjan et al., 2022; Zhuo and Tan, 2022) and hybrid approaches (Wang et al., 2021; Yang and Zou, 2021). For learning-based models, they directly formulate the approximate GED computation as a regression task and supervisedly learn the GED as a graph similarity metric in an end-to-end manner. Although such learning-based methods alleviate the computational burden of GED, they could encounter the inaccurate GED approximation issue (i.e., the predicted GED is smaller than the exact result) and also fail to recover an actual edit path, which is indispensable in specific tasks e.g., network alignment (Koutra et al., 2013), graph matching (Cho et al., 2013; Wang et al., 2021). For hybrid methods, deep learning and combinatorial-search techniques are combined to optimize the GED computation. Recently (Wang et al., 2021) and (Yang and Zou, 2021) separately propose two hybrid approaches, both of which apply Graph Neural Networks (GNNs) to guide the search directions of A* algorithms. However, the solved edit distance is typically provided with a large gap, due to their accumulation of the inaccurate GED approximation in the cost function estimation (i.e., the cost of unmatched subgraphs) of A* algorithms. Besides, GNNs with the attention mechanism are employed to estimate the cost function, which take 𝒪(n2d+d2n)\mathcal{O}(n^{2}d+d^{2}n) time for extending each search, and encounter scalability issues (Wang et al., 2021; Yang and Zou, 2021).

It is known that GED computation equals finding the optimal node matching between the source and the target graphs. Once the node matchings are given, GED can be easily calculated by scanning the two graphs once (Chang et al., 2020), which reveals the intrinsic connection between GED computation and node matching. Besides, existing learning-based and hybrid approaches only formulate GED as a regression task for graph or subgraph pairs, which fails to explicitly consider the node matching in their models. Be aware of the intrinsic connection between GED computation and node matching, in this work, we attempt to learn the node matching corresponding to GED using GNNs. However, it is not trivial as the following two combinatorial properties essentially exist in GED computation. (1) Multiple optimal node matchings (i.e., different matchings to produce GED) makes it difficult to learn the node matching by directly modeling in end-to-end learning. (2) Structure-dominant operations (i.e., most edit operations are involved in structure) create challenges for incorporating structural information into learning models. Also, see Fig. 1 for an example.

To this end, in this work, we present a data-driven hybrid approach MATA* based on Graph Neural Networks and A* algorithms, which leverages the learned candidate matching nodes to prune unpromising search directions of A* algorithm (i.e., 𝖠\mathsf{A}*𝖫𝖲𝖺\mathsf{LSa} (Chang et al., 2020)) for approximate GED computation.

Contributions. The main contributions are as follows.


(1) We present a hybrid approach based on GNNs and A* algorithms rather than via an end-to-end manner, which models GED computation from the perspective of node matching and combines the intrinsic connection between GED computation and node matching.


(2) A structure-enhanced Graph Neural Network (i.e., SEGcn) is introduced to jointly learn local and high-order structural information for node embeddings w.r.t. node matchings from a fine granularity, which captures the combinatorial property of structure-dominant operations in GED computation.


(3) Further, top-kk candidate nodes are produced to be aware of the multiple optimal node matchings combinatorial property via a differentiable top-kk operation, which is built upon two complementary learning tasks to jointly generate the candidate nodes, i.e., learning node matching and learning GED.


(4) We conduct extensive experiments on real-life datasets Aids, Imdb, and Cancer to demonstrate the superiority and scalability of MATA* from three types of methods: combinatorial search-based, learning-based and hybrid approaches. Indeed, MATA* improves the accuracy by (39.0%, 21.6%, 11.7%) and reduces the average discrepancy by (6.5%, 9.1%, 24.5%) at least on three real-life datasets (Aids, Imdb, Cancer), respectively.

2. Related Works

Computing the graph edit distance between graphs is a classical and fundamental combinatorial optimization problem over graphs where a vast body of literature exists in various domains (Riesen et al., 2007; Chang et al., 2020, 2023; Kim et al., 2019; Neuhaus et al., 2006; Riesen and Bunke, 2009; Fankhauser et al., 2011; Wang et al., 2021; Yang and Zou, 2021; Bai et al., 2020; Bai and Zhao, 2021; Zhang et al., 2021; Li et al., 2019; Ranjan et al., 2022; Zhuo and Tan, 2022). We next present a detailed overview of existing literature from three categories: (1) combinatorial search-based, (2) learning-based and (3) hybrid graph edit distance computation.

Combinatorial search-based. Combinatorial search-based algorithms either directly explore the search space corresponding to GED, or relax it to other combinatorial problems with polynomial time complexity. (1) The solution space of exact GED is typically treated as a pathfinding problem where best-search (A* (Riesen et al., 2007; Hart et al., 1968)) and depth-first search (Abu-Aisheh et al., 2015; Blumenthal and Gamper, 2017) are utilized to expand the search path (Yang and Zou, 2021). Different exact algorithms mainly focus on how to better estimate the cost of unmatched subgraphs with the theoretical guarantee to prune the search space, such as using label sets (Riesen et al., 2007, 2013), and subgraph structures (Chang et al., 2020, 2023; Kim et al., 2019). (2) The approximate algorithms are proposed to find the sub-optimal solutions. (Neuhaus et al., 2006) explores the most possible directions with the limited beam size of A* algorithms. (Riesen and Bunke, 2009) and (Fankhauser et al., 2011) only consider the local structure and relax it to bipartite matching problems, which are computed in cubic time.

Learning-based GED computation. With the progress of graph representation techniques of Graph Neural Networks (Kipf and Welling, 2016; Ying et al., 2021; Dwivedi et al., 2022), some works directly model it as a regression problem and learn the approximate GED via an end-to-end manner by treating GED as a similarity score between graphs. Different learning-based algorithms mainly focus on designing different GNN models for the graph edit distance computation task. (Bai et al., 2019) first presents a model using GCN (Kipf and Welling, 2016) and attention mechanism to approximately learn GED in an end-to-end fashion. Based on (Bai et al., 2019), (Bai et al., 2020) further introduces a multi-scale node comparison technique to extract the fine-grained information from the node-to-node similarity matrix. Besides, (Li et al., 2019) incorporates both the node and graph level information by the cross-graph module to trade-off the accuracy and computation. (Bai and Zhao, 2021) splits the graph edit distance into different types of edit operations and applies graph aggregation layers to learn each type individually. More recently, (Peng et al., 2021) designs a GED-specific regularizer to impose the matching constraints involved in GED, where the graph pairs are represented by the association graphs. (Ranjan et al., 2022) designs a novel siamese graph neural network, which through a carefully crafted inductive bias, learns the graph and subgraph edit distances via a property-preserving manner.

Hybrid GED computation. Recently, there has been a surge of interest in marrying learning-based approaches with combinatorial-search techniques. This interdisciplinary blend has given birth to several hybrid methodologies, particularly those that integrate Graph Neural Networks (GNNs) with the A* search algorithm, as seen in references (Wang et al., 2021; Yang and Zou, 2021). Both methods leverage machine learning techniques to enhance the performance of A* algorithms for GED computation, by predicting the cost of unmatched subgraphs to optimize their search directions. (Yang and Zou, 2021) proposes graph path networks incorporating pre-training edit path information and cross-graph information for training the model and (Wang et al., 2021) integrates a dynamic graph embedding network (Bai et al., 2019) for estimating the cost associated with unmatched subgraphs.

3. Preliminaries

We focus the discussions on the labeled and undirected simple graphs, that is a graph denoted by 𝒢={𝒱,,Φ}\mathcal{G}=\{\mathcal{V,E,}{\Phi}\}, where 𝒱\mathcal{V} is the set of nodes, \mathcal{E} is the set of undirected edges with 𝒱×𝒱\mathcal{E\subseteq V\times V} and Φ{\Phi} is a label function that assigns labels to each node or edge.

Graph Edit Distance (GED). The graph edit distance between graphs 𝒢1\mathcal{G}_{1} and 𝒢2\mathcal{G}_{2} is defined as the minimum cost of edit operations (i.e., node insertion/deletion, edge insertion/deletion, and node/edge relabeling) to transform 𝒢1\mathcal{G}_{1} to 𝒢2\mathcal{G}_{2}, denoted by 𝗀𝖾𝖽(𝒢1,𝒢2){\mathsf{ged}}(\mathcal{G}_{1},\mathcal{G}_{2}) (Bai and Zhao, 2021; Yang and Zou, 2021; Chang et al., 2023). One specific constraint to note is the aspect of node deletion operation, it’s restricted only to the nodes that are isolated, ensuring structural integrity and meaningful transformations between the graphs. Due to the NP-completeness of graph edit distance, the approximate edit distance is often used, denoted by 𝗀𝖾𝖽~(𝒢1,𝒢2){\mathsf{\widetilde{ged}}}(\mathcal{G}_{1},\mathcal{G}_{2}), offers a balance between computational scalability and accuracy. In this work, we focus on the line of uniform edit cost, i.e., all of the edit operations share the same cost (Bai and Zhao, 2021; Bai et al., 2019, 2020; Chang et al., 2020; Ranjan et al., 2022; Peng et al., 2021; Chang et al., 2023; Yang and Zou, 2021), yet the techniques presented in the following sections can also be extended to handle the non-uniform edit cost.

GED computation from node matchings. We next illustrate how to compute GED from the view of node matchings. Here, the node matching refers to an injective function from the nodes 𝒱1\mathcal{V}_{1} to the nodes 𝒱2\mathcal{V}_{2}.

Proposition 1: The 𝗀𝖾𝖽\mathsf{ged} between the graph pair 𝒢1\mathcal{G}_{1} and 𝒢2\mathcal{G}_{2} equals the minimum edit cost among all node matchings from the source graph 𝒢1\mathcal{G}_{1} to the target graph 𝒢2\mathcal{G}_{2} (Chang et al., 2020). \Box

By proposition 3, the 𝗀𝖾𝖽(𝒢1,𝒢2){\mathsf{ged}}(\mathcal{G}_{1},\mathcal{G}_{2}) can be determined by exhaustively generating all possible matchings from 𝒱1\mathcal{V}_{1} to 𝒱2\mathcal{V}_{2}, i.e., this essentially translates to identifying the node matching that incurs the least edit cost. Based on the commutativity of 𝗀𝖾𝖽(,){\mathsf{ged}}(\cdot,\cdot) and the edit cost are uniform, w.l.o.g. for a graph pair 𝒢1\mathcal{G}_{1} and 𝒢2\mathcal{G}_{2}, 𝒢1\mathcal{G}_{1} always refers to the graph with fewer nodes in later sections.

Moreover, the optimal node matching between 𝒢1\mathcal{G}_{1} and 𝒢2\mathcal{G}_{2} can be formulated as the constrained binary quadratic programming (𝖢𝖡𝖰𝖯\mathsf{CBQP}) problem (Peng et al., 2021):

(1) minX dist =\displaystyle\min_{X}\text{ dist }= ui𝒢1vk𝒢2ci,kXi,k+ui,uj𝒢1vk,vl𝒢2ci,k,j,lXi,kXj,l\displaystyle\sum_{\begin{subarray}{c}u_{i}\in\mathcal{G}_{1}\\ v_{k}\in\mathcal{G}_{2}\end{subarray}}c_{i,k}X_{i,k}+\sum_{\begin{subarray}{c}u_{i},u_{j}\in\mathcal{G}_{1}\\ v_{k},v_{l}\in\mathcal{G}_{2}\end{subarray}}c_{i,k,j,l}X_{i,k}X_{j,l}
s.t. vk𝒢2Xi,k=1,ui𝒢1\displaystyle\sum_{v_{k}\in\mathcal{G}_{2}}X_{i,k}=1,\forall u_{i}\in\mathcal{G}_{1}
ui𝒢1Xi,k=1,vk𝒢2\displaystyle\sum_{u_{i}\in\mathcal{G}_{1}}X_{i,k}=1,\forall v_{k}\in\mathcal{G}_{2}
Xi,k{0,1},ui𝒢1,vk𝒢2\displaystyle X_{i,k}\in\{0,1\},\forall u_{i}\in\mathcal{G}_{1},v_{k}\in\mathcal{G}_{2}

where X|𝒱1|×|𝒱2|X\in|\mathcal{V}_{1}|\times|\mathcal{V}_{2}| is a binary matrix representing the node matching between 𝒢1\mathcal{G}_{1} and 𝒢2\mathcal{G}_{2}. The value Xi,kX_{i,k} is 1 if node uiu_{i} in 𝒢1\mathcal{G}_{1} matches with node vkv_{k} in 𝒢2\mathcal{G}_{2}. The edit cost ci,kc_{i,k} denotes the cost of matching uiu_{i} in 𝒢1\mathcal{G}_{1} and vkv_{k} in 𝒢2\mathcal{G}_{2}. ci,k=1c_{i,k}=1 if uiu_{i} and vkv_{k} have different labels and 0 otherwise. Similarly, ci,k,j,lc_{i,k,j,l} is the edit cost of matching the edge (ui,uj)(u_{i},u_{j}) in 𝒢1\mathcal{G}_{1} and the edge (vk,vl)(v_{k},v_{l}) in 𝒢2\mathcal{G}_{2}. ci,k,j,l=1c_{i,k,j,l}=1 if (ui,uj)(u_{i},u_{j}) and (vk,vl)(v_{k},v_{l}) have different labels and 0 otherwise.

Table 1. Statistics of types of edit operations. We randomly sample 1,0001,000 graphs for each dataset and compute their edit operations, where NI, EI and ED stand for node insertions, edge insertions and edge deletions, respectively.
Datasets Structure Operations Attribute Operations
NI EI ED Relabeling
Aids 18.318.3% 34.834.8% 8.78.7% 38.038.0%
Imdb 12.112.1% 61.861.8% 5.25.2% 0.00.0%
Cancer 4.64.6% 40.840.8% 35.535.5% 18.618.6%

4. The Proposed Model: MATA*

Distinct from previous works that formulate GED computation as a regression task, we suggest tracing the problem back to the node matching so that the combinatorial properties (i.e., structure-dominant operations and multiple optimal node matchings) in GED computation could be leveraged.

Refer to caption
Figure 2. The framework of MATA*. The black arrows stand for the data flow in the training and testing phases and the red arrows only denote that in the testing. (1) Embedding module takes graph pairs as input and extracts the local and high-order structural information via SEGcn. (2) Matching module utilized node embeddings to build two learning tasks, i.e., learning node matching with the help of the similarity matrix and learning GED using graph representation. Further, top-kk candidates and the assignment matrix are generated by Alg. 1. (3) Benefiting from the candidate nodes, MATA* only performs on the promising search directions to refine these matchings using 𝖠\mathsf{A}*𝖫𝖲𝖺\mathsf{LSa}.

4.1. Analysis of Learning to Match Nodes.

In order to learn node matchings of GED from 𝖢𝖡𝖰𝖯\mathsf{CBQP} formulation, it requires Xi,k{0,1}X_{i,k}\in\{0,1\} to relax to be continuous in [0,1][0,1], where the constraints in Eq. (1) could be modeled as the quadratic infeasibility penalty (Kochenberger et al., 2014). This relaxation strategy endows the binary matrix XX with an augmented perspective. Thus, the binary matrix XX can be viewed as the confidence of node uiu_{i} in 𝒢1\mathcal{G}_{1} matching with node vkv_{k} in 𝒢2\mathcal{G}_{2}. In this way, we formulate the problem as the linear matching paradigm, by incorporating the graph structure information into node embedding, i.e., solving the following transportation problem:

(2) mini=1|𝒱1|j=1|𝒱2|𝐗ij𝐡𝟏i𝐡𝟐j2\min\sum_{i=1}^{|\mathcal{V}_{1}|}\sum_{j=1}^{|\mathcal{V}_{2}|}\mathbf{X}_{ij}\left\|\mathbf{h_{1}}_{i}-\mathbf{h_{2}}_{j}\right\|_{2}

where 𝐡1|𝒱1|×d\mathbf{h}_{1}\in\mathbb{R}^{|\mathcal{V}_{1}|\times d} and 𝐡2|𝒱2|×d\mathbf{h}_{2}\in\mathbb{R}^{|\mathcal{V}_{2}|\times d} are the node embeddings of 𝒢1\mathcal{G}_{1} and 𝒢2\mathcal{G}_{2}, respectively. Intuitively, the kernel finds the optimal way to transform one set of node embeddings 𝐡1\mathbf{h}_{1} to the other 𝐡2\mathbf{h}_{2}, by minimizing the Euclidean distance between corresponding node pairs w.r.t. graph pairs.

We further analyze the combinatorial properties to design the approximate GED computation framework, after modeling it from learning node matchings by solving Eq. (2).

Structure-dominant operations. Structure operations (node and edge insertion/deletion) are dominant among all edit operations, which occupy at least 62.062.0% as illustrated in Table 1. Indeed, the operations of node deletion can be interpreted as node insertion, as we arrange the source graph 𝒢1\mathcal{G}_{1} and target 𝒢2\mathcal{G}_{2} graph with 𝒱1𝒱2\mathcal{V}_{1}\leq\mathcal{V}_{2} (Chang et al., 2020). Further, by reducing the 𝖢𝖡𝖰𝖯\mathsf{CBQP} formulation to the transportation problem in Eq. (2), the graph structure information is assumed to be embedded into node embeddings. These tell us we need a GNN to effectively learn powerful node embeddings enhanced by the graph structures (Section 4.2).

Multiple optimal node matchings. Due to the combination and permutation natures of node matchings, the two graphs typically have multiple optimal node matchings that yield the GED. Hence, directly learning the node correspondence according to the matching confidence is extremely challenging, as it could lead to the inability to satisfy the injection constraint or a larger gap between 𝗀𝖾𝖽(,){\mathsf{ged}}(\cdot,\cdot) and 𝗀𝖾𝖽~(,){\mathsf{\widetilde{ged}}}(\cdot,\cdot). That is, it requires us to relax the constraint on the number of matched nodes and obtain candidate nodes using a flexible parameter top-kk. To conclude, we (1) need a differentiable top-kk operation to enable the training for node matchings (Section 4.3) and (2) refine the matchings from the top-kk candidates using A* algorithms (Section 4.4).

Thus, the proposed MATA* employs a structure-enhanced GNN (i.e., SEGcn) to learn the differentiable top-kk candidate matching nodes which prunes the unpromising search directions of 𝖠\mathsf{A}*𝖫𝖲𝖺\mathsf{LSa} (Chang et al., 2020) for approximate GED computation. The overview of MATA* is illustrated in Fig. 2 with the details of each module elaborated in the following paragraphs.

4.2. Embedding Module

As analyzed, structural information is critical to learn the fine-grained matching of node pairs for the GED computation. Here, a structure-enhanced GNN SEGcn is proposed that jointly learns the structural information from the views of local and high-order.

Degree encoding. When matching the nodes of two graphs w.r.t. GED, the nodes with similar degrees are more likely to be matched. Note that, the degree is not an accurate measure of structural similarity as the edge insertions and deletions are involved. Hence, rather than directly encoding the degree with a one-hot vector, we assign each node with a learnable embedding did_{i} based on its degree, and the values of did_{i} are randomly initialized.

Position encoding. The nodes located with similar local positions are more likely to match in the GED computation. Shortest-path-distances (Ying et al., 2021), PageRank (Mialon et al., 2021) and random walk (Dwivedi et al., 2022; Li et al., 2020) are generally used to measure the relative position of nodes (Yang et al., 2023). For the sake of computation efficiency, we employ the probabilities of random walks after different steps as the relative position encoding pitp_{i}\in\mathbb{R}^{t}.

(3) pi=[Rii(1),Rii(2),,Rii(t)],\displaystyle p_{i}=[R_{ii}^{(1)},R_{ii}^{(2)},\cdots,R_{ii}^{(t)}],

where R=AD1R=AD^{-1} is the random walk operator, tt is the number of steps of random walks, and Rii(t)R_{ii}^{(t)} refers to the landing probability of the node ii to itself after the tt-thth step of random walk. Though the local positions are encoded via random walk, nodes with slight structure distinction may hard to be matched as the Eq. (3) is deterministic yet the editing operations from 𝒢1\mathcal{G}_{1} to 𝒢2\mathcal{G}_{2} do change the graph structures.

For more robust position encoding, perturbations are further injected to the original graphs to get the perturbed position encoding. Specifically, we randomly insert and remove a small portion of edges (10% utilized in experiments) to produce the perturbed graphs 𝒢in\mathcal{G}_{in} and 𝒢re\mathcal{G}_{re}, respectively. The random walk diffusion manners are further performed on 𝒢in\mathcal{G}_{in} and 𝒢re\mathcal{G}_{re} with the perturbed local positions piinp_{i}^{in} and pirep_{i}^{re}, respectively. Combined with the Eq. (3), the positional encoding is given as follow:

(4) pi^=pi+piin+pire.\displaystyle\hat{p_{i}}=p_{i}+p_{i}^{in}+p_{i}^{re}.

Local view. By concatenating (1) the node feature xix_{i} (i.e., the attribute feature of its label), (2) degree encoding did_{i}, and (3) position encoding pi^\hat{p_{i}}, the node embeddings with local structure hi(0)dh_{i}^{(0)}\in\mathbb{R}^{d} are built via a multilayer perceptron (MLP):

(5) hi(0)=MLP(xidipi^),i𝒱\displaystyle h_{i}^{(0)}=\texttt{MLP}(x_{i}\oplus d_{i}\oplus\hat{p_{i}}),\ \ \forall i\in\mathcal{V}

High-order view. We adopt the GCN (Kipf and Welling, 2016) as the backbone of SEGcn to learn the higher-order neighbor information. The node embeddings are aggregated from the embeddings of its adjacency nodes and itself. The ll-thth iteration of aggregation could be characterized as:

(6) hi(l)=σ(j𝒩i1cijhj(l1)w(l1))\displaystyle h_{i}^{(l)}=\sigma\Big{(}\sum_{j\in\mathcal{N}_{i}}{\frac{1}{c_{ij}}h_{j}^{(l-1)}w^{(l-1)}}\Big{)}

where hi(l)dh_{i}^{(l)}\in\mathbb{R}^{d} is the representation of node ii of ll-thth GCN layer, 𝒩i\mathcal{N}_{i} is the set of neighbors of node ii, and w(l)w^{(l)} is the learned weights of ll-thth layer. In order to reduce the bias because of the different numbers of neighbors, the aggregated embeddings from adjacent nodes are also normalized by the total number of adjacent nodes cijc_{ij}. SEGcn takes the obtained hi0h_{i}^{0} as the input embedding.

After encoding by SEGcn, the node embeddings of 𝒢1\mathcal{G}_{1} from the local and high-order views are denoted as 𝐡1(0)|𝒱1|×d\mathbf{h}_{1}^{(0)}\in\mathbb{R}^{|\mathcal{V}_{1}|\times d} and 𝐡1(l)|𝒱1|×d\mathbf{h}_{1}^{(l)}\in\mathbb{R}^{|\mathcal{V}_{1}|\times d}, respectively. The node embeddings of 𝒢2\mathcal{G}_{2} are similarly obtained.

4.3. Matching Module

The local and high-order structural affinities between two graphs has been ingeniously encoded into the node embedding space, by SEGcn. As such, learning to match nodes is reduced to solve the Eq. (2). We thus jointly learn the matchings from both the local view and high-order view to obtain the differentiable top-kk candidates by iteratively minimizing the underlying transportation problem. In addition to the task of learning node matching, a complementary task learning GED is also put forward, which attempts to learn the distance between graph representations that assists the node matching task.

Learning node matchings. Intuitively, we learn node matchings from fine-grained correspondences to minimize the transportation problem, which aims that the resultant node matchings are not just approximations but are reflective of the genuine structural alignments between the two graphs.

Similarity matrix. To solve the Eq. (2), we model it in a flexible way, and similarity matrices from the local view 𝐒(0)|𝒱1|×|𝒱2|\mathbf{S}^{(0)}\in|\mathcal{V}_{1}|\times|\mathcal{V}_{2}| and the high-order view 𝐒(l)|𝒱1|×|𝒱2|\mathbf{S}^{(l)}\in|\mathcal{V}_{1}|\times|\mathcal{V}_{2}| are:

(7) 𝐒(0)=σ(𝐡1(0)𝐖𝐧𝐡2(0))\displaystyle\mathbf{S}^{(0)}=\sigma({{\mathbf{h}_{1}^{(0)}}^{\top}\mathbf{W_{n}}\mathbf{h}_{2}^{(0)}})
𝐒(l)=σ(𝐡1(l)𝐖𝐧𝐡2(l))\displaystyle\mathbf{S}^{(l)}=\sigma({{\mathbf{h}_{1}^{(l)}}^{\top}\mathbf{W_{n}}\mathbf{h}_{2}^{(l)}})

where 𝐖𝐧d×d\mathbf{W_{n}}\in\mathbb{R}^{d\times d} is a learnable weights matrix and shares the parameters between 𝐒(0)\mathbf{S}^{(0)} and 𝐒(l)\mathbf{S}^{(l)}. All elements of similarity matrix 𝐒\mathbf{S} are positive after applying the sigmoid function, and 𝐒i,j(0)\mathbf{S}^{(0)}_{i,j} measures the similarity between 𝒱1i\mathcal{V}_{1i} and 𝒱2j\mathcal{V}_{2j} from the local view. And 𝐒i,j(l)\mathbf{S}^{(l)}_{i,j} measures the similarity from the high-order view. Besides, 𝐒(0)\mathbf{S}^{(0)} also models the cost by transforming the embedding 𝐡1(0)\mathbf{h}_{1}^{(0)} to 𝐡2(0)\mathbf{h}_{2}^{(0)}. Different from padding the similarity matrix (Bai et al., 2020), it is enough to represent all possible matchings with |𝒱1|×|𝒱2||\mathcal{V}_{1}|\times|\mathcal{V}_{2}| from the Eq. 1.

Algorithm 1 Differentiable top-kk matching nodes

Input: Similarity mat 𝐒(0)\mathbf{S}^{(0)}, 𝐒(l)\mathbf{S}^{(l)}, kk, regularization ϵ\epsilon
Output: Assignment mat 𝐒𝐚(0)\mathbf{S_{a}}^{(0)}, 𝐒𝐚(l)\mathbf{S_{a}}^{(l)}, candidates M|𝒱1|×kM^{|\mathcal{V}_{1}|\times k}

1:Build 𝐃\mathbf{D}, 𝐜\mathbf{c}, 𝐫\mathbf{r} from 𝐒(0)\mathbf{S}^{(0)} by Eq. (8); 𝚪=𝐃/ϵ\mathbf{\Gamma}=-\mathbf{D}/\epsilon;
2:while 𝚪\mathbf{\Gamma} is not converged do \triangleright Sinkhorn normalization
3:     𝚪=diag((𝚪𝟏𝐫))1𝚪\mathbf{\Gamma}=\operatorname{diag}((\mathbf{\Gamma}\mathbf{1}\odot\mathbf{r}))^{-1}\mathbf{\Gamma};
4:     𝚪=diag((𝚪𝟏𝐜))1𝚪\mathbf{\Gamma}=\operatorname{diag}\left(\left(\mathbf{\Gamma}^{\top}\mathbf{1}\odot\mathbf{c}\right)\right)^{-1}\mathbf{\Gamma};
5:Rebuild assignment mat 𝐒𝐚(0)\mathbf{S_{a}}^{(0)} from 𝚪\mathbf{\Gamma};
6:Repeat lines 1–5 for 𝐒(l)\mathbf{S}^{(l)} to obtain 𝐒𝐚(l)\mathbf{S_{a}}^{(l)};
7:if  training then return 𝐒𝐚(0)\mathbf{S_{a}}^{(0)}, 𝐒𝐚(l)\mathbf{S_{a}}^{(l)} ;
8:else  return M|𝒱1|×kM^{|\mathcal{V}_{1}|\times k} by greedily searching top-kk;

Top-kk candidate matching nodes. Inspired by (Xie et al., 2020; Wang et al., 2023), choosing the top-kk matches from the similarity matrices 𝐒i,j(0)\mathbf{S}^{(0)}_{i,j} and 𝐒i,j(l)\mathbf{S}^{(l)}_{i,j} is typically formulated as an optimal transport problem, which selects the kk most confident matches for each node based on the matching confidences, shown in Alg. 1.

Specifically, we first flatten the similarity matrix 𝐒(0)\mathbf{S}^{(0)} with local structure affinity into 𝐝=[d1,d2,,d|𝒱1||𝒱2|]\mathbf{d}=[d_{1},d_{2},...,d_{|\mathcal{V}_{1}||\mathcal{V}_{2}|}]. As such, to differentiable find the top-kk matches, the optimal transport problem can be viewed as redistributing 𝐝\mathbf{d} to one of dmaxd_{max} and dmind_{min}, where the capacities of dmaxd_{max} and dmind_{min} are kk and |𝒱1||𝒱2|k|\mathcal{V}_{1}||\mathcal{V}_{2}|-k, respectively. That is the matches moved into dmaxd_{max} are preserved during the redistributing and the others moved into dmind_{min} are discarded. Let 𝐜\mathbf{c} and 𝐫\mathbf{r} represent the marginal distributions, 𝐃\mathbf{D} represents the distance matrix, and 𝟏\mathbf{1} represents the vector of all ones (line 1). And we have:

(8) 𝐫\displaystyle\mathbf{r} =𝟏|𝒱1||𝒱2|,𝐜=[|𝒱1||𝒱2|k,k]\displaystyle=\mathbf{1}_{|\mathcal{V}_{1}||\mathcal{V}_{2}|}^{\top},\quad\mathbf{c}=\left[|\mathcal{V}_{1}||\mathcal{V}_{2}|-k,k\right]^{\top}
𝐃\displaystyle\mathbf{D} =[d1dmind2dmind|𝒱1||𝒱2|dmindmaxd1dmaxd2dmaxd|𝒱1||𝒱2|]\displaystyle=\left[\begin{array}[]{llll}d_{1}-d_{\min}&d_{2}-d_{\min}&\cdots&d_{|\mathcal{V}_{1}||\mathcal{V}_{2}|}-d_{\min}\\ d_{\max}-d_{1}&d_{\max}-d_{2}&\cdots&d_{\max}-d_{|\mathcal{V}_{1}||\mathcal{V}_{2}|}\end{array}\right]

Then an efficient method Sinkhorn (Cuturi, 2013; Fey et al., 2020) for solving the optimal transport problem is typically adopted to learn the probabilities of the top-kk matchings, which is an approximate and differentiable version of 𝖧𝗎𝗇𝗀𝖺𝗋𝗂𝖺𝗇\mathsf{Hungarian}. It iteratively performs row-normalization, i.e., element-wise division by the sum of its row and column-normalization until convergence, where \odot means element-wise division, diag()\operatorname{diag}(\cdot) means building a diagonal matrix from a vector (lines 2–4).

After the differentiable top-kk operation, we reshape 𝚪\mathbf{\Gamma} into the assignment matrix from local view 𝐒𝐚(0)|𝒱1|×|𝒱2|\mathbf{S_{a}}^{(0)}\in|\mathcal{V}_{1}|\times|\mathcal{V}_{2}| , which essentially measures the confidence of 𝒱1i\mathcal{V}_{1i} and 𝒱2j\mathcal{V}_{2j} belonging to the optimal matching (line 5). For the similarity matrix 𝐒(l)\mathbf{S}^{(l)} with high-order structure affinity are also performed to obtain the assignment matrix 𝐒𝐚(l)\mathbf{S_{a}}^{(l)} from high-order view (line 8). Finally, Alg. 1 returns 𝐒𝐚(0)\mathbf{S_{a}}^{(0)}, 𝐒𝐚(l)\mathbf{S_{a}}^{(l)} during the training, and top-kk candidate nodes M|𝒱1|×kM^{|\mathcal{V}_{1}|\times k} during the testing (lines 6–7).

Note that, during the testing, we further propose a greedy method to find top-kk candidate nodes in 𝒪(kn2)\mathcal{O}(kn^{2}) time, In brief, it iteratively finds a node with the largest matching probability as a candidate from the unmatched nodes, where the injection constraint of node matchings is also guaranteed.

Learning GED. We further propose an auxiliary task tailored to learn the (approximate) graph edit distance that assists the node matching task by exploiting the graph-level similarity.

Algorithm 2 Mapping refinement based on 𝖠\mathsf{A}*𝖫𝖲𝖺\mathsf{LSa} [1]

Input: Graphs 𝒢1\mathcal{G}_{1}, 𝒢2\mathcal{G}_{2}, candidates M|𝒱1|×kM^{|\mathcal{V}_{1}|\times k}
Output: The approximate 𝗀𝖾𝖽~(𝒢1,𝒢2){\mathsf{\widetilde{ged}}}(\mathcal{G}_{1},\mathcal{G}_{2})

1:Push (0,,null,0){(0,\emptyset,null,0)} into QQ;        \triangleright Initialize the priority queue QQ with the root of the search tree.
2:while QQ\neq\emptyset do
3:     Pop (i,f,pa,lb)(i,f,pa,lb) with minimum lblb from QQ;
4:     Compute the lower bound lblb using 𝖠\mathsf{A}*𝖫𝖲𝖺\mathsf{LSa} for each child cM|𝒱1|×kc\in M^{|\mathcal{V}_{1}|\times k} of ff;
5:     for all child cM|𝒱1|×kc\in M^{|\mathcal{V}_{1}|\times k} of ff  do
6:         if i+1=|𝒱1|i+1=|\mathcal{V}_{1}| then 𝗀𝖾𝖽~(𝒢1,𝒢2){\mathsf{\widetilde{ged}}}(\mathcal{G}_{1},\mathcal{G}_{2}) = c.lbc.lb;  break;
7:         else Push (i+1,c,f,lb)(i+1,c,f,lb) into QQ;               
8:return 𝗀𝖾𝖽~(𝒢1,𝒢2){\mathsf{\widetilde{ged}}}(\mathcal{G}_{1},\mathcal{G}_{2})

Intuitively, 𝐡1(0)\mathbf{h}_{1}^{(0)} and 𝐡1(l)\mathbf{h}_{1}^{(l)} capture the node features enhanced by the local and high-order graph structural information of 𝒢1\mathcal{G}_{1}. This intricate embedding process ensures that the nodes’ features are not only captured in their raw features but are also contextualized within the broader structure of the graph. Essentially, (approximate) GED measures the similarity of graph pairs from the graph level, and hence, we aggregate the node embeddings with both local and high-order views of 𝒢1\mathcal{G}_{1} and 𝒢2\mathcal{G}_{2} passed through MLPs for the learning GED task. And we have the following:

(9) 𝐡1g=MLP(𝐡1(0)𝐡1(l))\displaystyle\mathbf{h}_{1}^{g}=\texttt{MLP}(\mathbf{h}_{1}^{(0)}\oplus\mathbf{h}_{1}^{(l)})
𝐡2g=MLP(𝐡2(0)𝐡2(l))\displaystyle\mathbf{h}_{2}^{g}=\texttt{MLP}(\mathbf{h}_{2}^{(0)}\oplus\mathbf{h}_{2}^{(l)})
d𝒢1,𝒢2=MLP(𝐡1g𝐡2g)\displaystyle d_{\mathcal{G}_{1},\mathcal{G}_{2}}=\texttt{MLP}(\mathbf{h}_{1}^{g}\oplus\mathbf{h}_{2}^{g})

That is, d𝒢1,𝒢2d_{\mathcal{G}_{1},\mathcal{G}_{2}} is predicted using the MLP operation which gradually reduces the concatenated graph representations 𝐡1g\mathbf{h}_{1}^{g} and 𝐡2g\mathbf{h}_{2}^{g} of the graph pair. Actually, to counter this and provide a more interpretable and standardized measure, the GED are typically normalized by exp{𝗀𝖾𝖽(𝒢1,𝒢2)×2/(𝒱1+𝒱2)}\exp\{-{\mathsf{ged}}(\mathcal{G}_{1},\mathcal{G}_{2})\times 2/(\mathcal{V}_{1}+\mathcal{V}_{2})\}.

Loss design. MATA* is trained in a supervised manner for graph pairs 𝒢x\mathcal{G}_{x} and 𝒢y\mathcal{G}_{y} using normalized ground-truth GED dx,yt{d}^{t}_{x,y} and its corresponding node matching x,yt\mathcal{M}^{t}_{x,y}. The loss function evaluates both the difference for predicted node matchings from the local/high-order view of assignment matrices 𝐒𝐚(0)\mathbf{S_{a}}^{(0)}/𝐒𝐚(l)\mathbf{S_{a}}^{(l)} and learning GED from the predicted normalized GED dx,yd_{x,y}. For the learning node matching task, we jointly minimize the negative log-likelihood of the node matchings on the assignment matrices 𝐒𝐚(0)\mathbf{S_{a}}^{(0)} and 𝐒𝐚(l)\mathbf{S_{a}}^{(l)}:

(10) n=1|𝒟|(x,y)𝒟(i,j)x,ytlog𝐒𝐚i,j(0)+log𝐒𝐚i,j(l)\displaystyle\mathcal{L}_{n}=-\frac{1}{|\mathcal{D}|}\sum_{(x,y)\in\mathcal{D}}\sum_{(i,j)\in\mathcal{M}^{t}_{x,y}}{\texttt{log}\mathbf{S_{a}}^{(0)}_{i,j}+\texttt{log}\mathbf{S_{a}}^{(l)}_{i,j}}

Note that, different from the use of permutation cross-entropy loss (Wang et al., 2019) or Hungarian loss (Yu et al., 2020) for the graph matching task, only the node pairs belonging to a node matching are penalized by n\mathcal{L}_{n}, the other node pairs are not penalized. The rationale behind this is that multiple optimal node matchings typically exist, and these unmatched node pairs may also belong to other node matchings corresponding to the GED.

For the learning GED task, we minimize the MSE loss:

(11) g=1|𝒟|(x,y)𝒟(dx,ydx,yt)2\displaystyle\mathcal{L}_{g}=\frac{1}{|\mathcal{D}|}\sum_{(x,y)\in\mathcal{D}}{(d_{x,y}-d^{t}_{x,y})^{2}}

where 𝒟\mathcal{D} is the set of training graph pairs.

Our final loss function is a combination of the negative log-likelihood loss and MSE loss: =g+n\mathcal{L}=\mathcal{L}_{g}+\mathcal{L}_{n}.

4.4. Mapping Refinement Module

MATA* finally integrates 𝖠\mathsf{A}*𝖫𝖲𝖺\mathsf{LSa} algorithm (Chang et al., 2020) to refine the edit distance (i.e., node matching) among the learned top-kk candidate matching nodes M|𝒱1|×kM^{|\mathcal{V}_{1}|\times k}, as shown in Alg. 2.

Specifically, MATA* conducts a best-first search by treating GED computation as a pathfinding problem. Such a representation is convenient because it provides a systematic and heuristic way to explore possible node matchings. To facilitate this search process, a priority queue is maintained to store the search states during the process for guiding the search direction. The priority queue QQ contains the level ii, current partial matching ff, the parent matching papa, and the lower bound lblb. 𝖠\mathsf{A}*𝖫𝖲𝖺\mathsf{LSa} initializes the priority queue QQ by the root of the search tree (line 1). It then iteratively pops (i,f,pa,lb)(i,f,pa,lb) from QQ with the minimum lower bound, and subsequently extends the current matching ff by computing the lower bound of each child belonging to the candidates M|𝒱1|×kM^{|\mathcal{V}_{1}|\times k} (lines 2–7). If the full node matching is formed, then 𝗀𝖾𝖽~(,){\mathsf{\widetilde{ged}}}(\cdot,\cdot) equals its lower bound and is returned (lines 6, 8).

Note that, different from the hybrid approach (Yang and Zou, 2021; Wang et al., 2021), during the mapping refinement of MATA*, the search space is pruned by the theoretical bounded estimation of unmatched subgraphs of 𝖠\mathsf{A}*𝖫𝖲𝖺\mathsf{LSa}.

5. Experiments

5.1. Experimental Settings

Table 2. Statistics of datasets. The graph pairs are partitioned 60%, 20%, 20% as training, validation, test sets, respectively.
|Graphs| |Pairs| avg(|||𝒱|\mathcal{\frac{|E|}{|V|}}) min(|𝒱|\mathcal{|V|}) max(|𝒱|\mathcal{|V|}) avg(|𝒱|\mathcal{|V|})
Aids 700700 490490K 0.980.98 22 1010 8.908.90
Imdb 15001500 2.252.25M 4.054.05 77 8989 13.0013.00
Cancer 800800 100100K 1.081.08 2121 9090 30.7930.79

Datasets. In this work, three benchmark datasets i.e., Aids (Bai et al., 2019), Imdb (Yanardag and Vishwanathan, 2015), and Cancer 111https://cactus.nci.nih.gov/download/nci/CAN2DA99.sdz are employed. (1) Aids is a set of antivirus screen chemical compounds labeled with 2929 types. Following (Bai et al., 2019; Wang et al., 2021), 700700 graphs with no more than ten nodes are sampled as the Aids dataset. (2) Imdb consists of 1,5001,500 ego-networks of movie actors or actresses and each of which is an non-attributed graph. (3) Cancer consists of 32,57732,577 graphs of molecules discovered in carcinogenic tumors. To test the scalability and efficiency of our MATA*, we sample 800800 graphs with nodes from 2121 to 9090 as Cancer dataset, where the nodes are labeled with 3737 types of atoms. Statistics of the three real-life datasets are shown in Table 2.

Table 3. Effectiveness evaluations. The metrics are calculated on the normalized edit distance. \uparrow indicates the high the better and \downarrow otherwise. Top-kk are set to 44, 66 and 88 for Aids, Imdb and Cancer, respectively. The unit of metrics ACC and MSE are % and 10210^{-2}, respectively, and – refers to memory overflow on 32GB machines or runs in more than 10 minutes for one graph pair.
Datasets Methods Edit Path ACC \uparrow MAE \downarrow MSE \downarrow p@10 \uparrow p@20 \uparrow ρ\rho \uparrow τ\tau \uparrow
Aids 𝖠\mathsf{A}*𝖡𝖾𝖺𝗆\mathsf{Beam} (Neuhaus et al., 2006) \checkmark 16.6816.68 0.0920.092 1.371.37 0.4600.460 0.4700.470 0.7200.720 0.5460.546
𝖧𝗎𝗇𝗀𝖺𝗋𝗂𝖺𝗇\mathsf{Hungarian} (Riesen and Bunke, 2009) \checkmark 4.194.19 0.1940.194 4.774.77 0.2930.293 0.3280.328 0.5410.541 0.3860.386
𝖵𝖩\mathsf{VJ} (Fankhauser et al., 2011) \checkmark 0.950.95 0.2160.216 5.645.64 0.2150.215 0.2730.273 0.5430.543 0.3870.387
𝖲𝗂𝗆𝖦𝖭𝖭\mathsf{SimGNN} (Bai et al., 2019) ×\times 0.010.01 0.0360.036 0.220.22 0.4700.470 0.5400.540 0.8860.886 0.7250.725
𝖦𝖬𝖭\mathsf{GMN} (Li et al., 2019) ×\times 0.020.02 0.0340.034 0.190.19 0.4010.401 0.4890.489 0.7500.750 0.6730.673
𝖦𝖱𝖤𝖤𝖣\mathsf{GREED} (Ranjan et al., 2022) ×\times 0.000.00 0.0310.031 0.17 0.4610.461 0.5330.533 0.8940.894 0.7320.732
𝖦𝖤𝖭𝖭\mathsf{GENN} (Wang et al., 2021) ×\times 0.020.02 0.0310.031 0.17 0.4410.441 0.5250.525 0.898 0.738
𝖦𝖤𝖭𝖭𝖠\mathsf{GENNA^{*}} (Wang et al., 2021) \checkmark 20.0520.05 0.0340.034 0.460.46 0.4070.407 0.556 0.5150.515 0.3780.378
MATA* (Ours) \checkmark 59.12 0.029 0.370.37 0.486 0.526 0.8440.844 0.6980.698
Imdb 𝖠\mathsf{A}*𝖡𝖾𝖺𝗆\mathsf{Beam} (Neuhaus et al., 2006) \checkmark 23.1823.18 0.1110.111 5.225.22 0.4640.464 0.5270.527 0.4890.489 0.3810.381
𝖧𝗎𝗇𝗀𝖺𝗋𝗂𝖺𝗇\mathsf{Hungarian} (Riesen and Bunke, 2009) \checkmark 22.5322.53 0.1150.115 5.385.38 0.4380.438 0.4980.498 0.4650.465 0.3590.359
𝖵𝖩\mathsf{VJ} (Fankhauser et al., 2011) \checkmark 22.2422.24 0.1150.115 5.385.38 0.4360.436 0.4950.495 0.4650.465 0.3590.359
𝖲𝗂𝗆𝖦𝖭𝖭\mathsf{SimGNN} (Bai et al., 2019) ×\times 0.110.11 0.1140.114 5.01 0.4740.474 0.5310.531 0.5000.500 0.3880.388
𝖦𝖬𝖭\mathsf{GMN} (Li et al., 2019) ×\times 0.290.29 0.1280.128 5.01 0.4790.479 0.5420.542 0.5130.513 0.3920.392
𝖦𝖱𝖤𝖤𝖣\mathsf{GREED} (Ranjan et al., 2022) ×\times 0.930.93 0.1100.110 5.045.04 0.4770.477 0.5410.541 0.4990.499 0.3890.389
𝖦𝖤𝖭𝖭\mathsf{GENN} (Wang et al., 2021) ×\times 0.220.22 0.1080.108 5.045.04 0.4760.476 0.5330.533 0.4950.495 0.3840.384
𝖦𝖤𝖭𝖭𝖠\mathsf{GENNA^{*}} (Wang et al., 2021) \checkmark
MATA* (Ours) \checkmark 44.80 0.098 5.03 0.509 0.570 0.542 0.456
Cancer 𝖠\mathsf{A}*𝖡𝖾𝖺𝗆\mathsf{Beam} (Neuhaus et al., 2006) \checkmark 44.2344.23 0.0530.053 1.14 0.1610.161 0.2660.266 0.4460.446 0.3520.352
𝖧𝗎𝗇𝗀𝖺𝗋𝗂𝖺𝗇\mathsf{Hungarian} (Riesen and Bunke, 2009) \checkmark 2.192.19 0.1620.162 3.563.56 0.1230.123 0.2270.227 0.1390.139 0.0960.096
𝖵𝖩\mathsf{VJ} (Fankhauser et al., 2011) \checkmark 0.000.00 0.1840.184 4.854.85 0.0950.095 0.1870.187 0.1880.188 0.1330.133
𝖲𝗂𝗆𝖦𝖭𝖭\mathsf{SimGNN} (Bai et al., 2019) ×\times 0.010.01 0.0680.068 1.421.42 0.2730.273 0.2970.297 0.2770.277 0.1910.191
𝖦𝖬𝖭\mathsf{GMN} (Li et al., 2019) ×\times 0.000.00 0.0710.071 1.471.47 0.2800.280 0.2850.285 0.2540.254 0.1740.174
𝖦𝖱𝖤𝖤𝖣\mathsf{GREED} (Ranjan et al., 2022) ×\times 0.000.00 0.0770.077 1.861.86 0.1310.131 0.1640.164 0.1700.170 0.1180.118
𝖦𝖤𝖭𝖭\mathsf{GENN} (Wang et al., 2021) ×\times 0.000.00 0.0690.069 1.441.44 0.2850.285 0.2640.264 0.3000.300 0.2070.207
𝖦𝖤𝖭𝖭𝖠\mathsf{GENNA^{*}} (Wang et al., 2021) \checkmark
MATA* (Ours) \checkmark 55.89 0.040 1.13 0.820 0.825 0.729 0.625

Baseline methods. Our baselines include three types of methods, combinatorial search-based algorithms, learning-based models and hybrid approaches. (1) The representative methods in the first category include three well-known approximate algorithms 𝖠\mathsf{A}*𝖡𝖾𝖺𝗆\mathsf{Beam} (Neuhaus et al., 2006), 𝖧𝗎𝗇𝗀𝖺𝗋𝗂𝖺𝗇\mathsf{Hungarian} (Riesen and Bunke, 2009) and 𝖵𝖩\mathsf{VJ} (Fankhauser et al., 2011). (2) The second category includes three common-used and one state-of-the-art learning models, i.e., 𝖲𝗂𝗆𝖦𝖭𝖭\mathsf{SimGNN} (Bai et al., 2019), 𝖦𝖬𝖭\mathsf{GMN} (Li et al., 2019), 𝖦𝖤𝖭𝖭\mathsf{GENN} (Wang et al., 2021) and 𝖦𝖱𝖤𝖤𝖣\mathsf{GREED} (Ranjan et al., 2022). (3) We chose an up-to-date model 𝖦𝖤𝖭𝖭𝖠\mathsf{GENNA^{*}} as the representative of hybrid approaches, and our MATA* also belongs to this category.

Evaluation metrics. We adopt the following experimental metrics to evaluate the performance of the various approaches: (1) Edit path means whether a method can recover the edit path corresponding to the computed edit distance. (2) Accuracy (ACC), which measures the accuracy between the computed distance and the ground-truth solutions. (3) Mean Absolute Error (MAE), which indicates the average discrepancy between the computed distance and ground truth. (4) Mean Squared Error (MSE), which stands for the average squared difference between the computed distance and ground truth. (5) Precision at 1010 (p@10) and (6) Precision at 2020 (p@20), both of which mean the precision of the top 1010 and 2020 most similar graphs within the ground truth top 1010 and 2020 similar results. (7) Spearman’s Rank Correlation Coefficient (ρ\rho) and (8) Kendall’s Rank Correlation (τ\tau), both of which measure how well the computed results match with the ground-truth ranking results. (9) Time. It records the running time to compute the distance for one graph pair. The methods involving learning only report the testing time.

Due to the fact that exact GED computation is 𝖭𝖯\mathsf{NP}-complete, the ground-truth of Aids is produced by exact algorithms and the ground-truth of Imdb and Cancer are generated by the smallest edit distances of 𝖠\mathsf{A}*𝖡𝖾𝖺𝗆\mathsf{Beam}, 𝖧𝗎𝗇𝗀𝖺𝗋𝗂𝖺𝗇\mathsf{Hungarian}, and 𝖵𝖩\mathsf{VJ}, following (Bai et al., 2019). Note that, MATA* is able to achieve a smaller edit distance, and the ground-truth of Imdb and Cancer are further updated by the best results of the four approaches. Therefore, the metrics on Aids are calculated by the exact solutions and the metrics on Imdb and Cancer are calculated by the updated ground-truth. Note that the edit distance is normalized into a similarity score in the range of (0,1](0,1] as explained in Section 4.3, the same as (Bai et al., 2019; Wang et al., 2021).

Parameter settings. We conduct all experiments on machines with Intel Xeon Gold@2.40GHz CPU and NVIDIA Tesla V100 32GB GPU. The number of SEGcn layers, i.e., ll is set to 33 and the random walk step tt is set to 1616 for the three datasets. During the training, we set the batch size to 128 and use Adam optimizer with 0.001 learning rate and 5×1045\times 10^{4} weight decay for each dataset. The source codes and data are available at https://github.com/jfkey/mata.

5.2. Experimental Results

In this section, we evaluate the performance of MATA* from the effectiveness, scalability, efficiency, ablation study, top-kk comparisons, and the analysis of assignment matrices.

Effectiveness evaluations. Table 3 shows the effectiveness of eight approaches on three real-world datasets. MATA* consistently achieves the best performance under almost each evaluation metric, which demonstrates the superiority of our hybrid method MATA* incorporating the two combinatorial properties of GED computation. We conduct the following findings from the evaluations.

(1) From the ACC, MATA* achieves smaller edit distances at least (58.1%, 32.1%, 53.6%) of graph pairs on (Aids, Imdb, Cancer) when comparing with combinatorial search-based and hybrid approaches. Hence, the ground-truth of Imdb and Cancer are further updated by these, which reduces MAE by at least (6.5%, 9.1%, 24.5%). (2) Only learning-based models cannot recover the edit path, as they directly learn GED as a similarity score and ignore the combinatorial nature. (3) On Imdb, all methods perform worse than on Aids and Cancer. Imdb is large with a wide range from 77 to 8989 nodes. Besides, the graphs are much denser with ||/|𝒱|\mathcal{|E|}/\mathcal{|V|} = 4.054.05 and the distances of pairs are also larger, which increases the difficulty for combinatorial search and learning methods.

(4) The improvement of MATA* on ACC is such significant with at least (39.0%, 21.6%, 11.7%), and the improvement of other metrics is relatively less significant. The rationales behind this lie in that (a) MATA* models from the perspective of node matching and explicitly build the task of learning node matching, that is the learned top-kk candidate nodes could directly improve the accuracy due to the correspondence between GED and the node matching. (b) For fewer node pairs whose matchings have not been learned by MATA*, it prunes the search subtree rooted at the these node pairs, which leads to a larger edit distance reflected in other metrics.

Table 4. Efficiency evaluations. Average running time for solving one graph pair on test data (ms). The training time for learning-based and hybrid approaches does not include.
𝖲𝗂𝗆𝖦𝖭𝖭\mathsf{SimGNN} 𝖦𝖬𝖭\mathsf{GMN} 𝖦𝖤𝖭𝖭\mathsf{GENN} 𝖠\mathsf{A}*𝖡𝖾𝖺𝗆\mathsf{Beam} 𝖧𝗎𝗇𝗀𝖺𝗋𝗂𝖺𝗇\mathsf{Hungarian} 𝖵𝖩\mathsf{VJ} 𝖦𝖤𝖭𝖭𝖠\mathsf{GENNA^{*}} MATA*
Aids 0.3 9.09.0 0.40.4 20.420.4 6.76.7 6.76.7 3862438624 4.44.4
Imdb 0.70.7 5.95.9 0.4 26.626.6 230.8230.8 230.7230.7 35.335.3
Cancer 5.5 91.591.5 9.59.5 271.7271.7 38.838.8 32.732.7 146.8146.8

Scalability w.r.t. graph size. Consider the overall performance of the approaches from small-size, i.e., Aids to large-size i.e., Imdb, Cancer. We find the following. (1) Our MATA* leverages the learned candidate matching nodes to directly prune unpromising search directions, which scales well to large-size graphs and also performs better on i.e., Imdb and Cancer from Table 3. (2) Combinatorial search-based algorithms can be extended to large-scale graphs with general performance due to aggressive relaxation (𝖧𝗎𝗇𝗀𝖺𝗋𝗂𝖺𝗇\mathsf{Hungarian} and 𝖵𝖩\mathsf{VJ}) or pruning strategies (𝖠\mathsf{A}*𝖡𝖾𝖺𝗆\mathsf{Beam}). (3) Learning-based models add a bias to the predicted GED values to reduce the discrepancy between the predicted and ground truth. Their scalability heavily relies on the ground truth produced by combinatorial search-based algorithms. This is why they perform worse on Imdb and Cancer than Aids. (4) The hybrid approach 𝖦𝖤𝖭𝖭𝖠\mathsf{GENNA^{*}} only completes Aids for less than 1010 nodes graphs and fails to scale to Imdb and Cancer. This is because 𝖦𝖤𝖭𝖭𝖠\mathsf{GENNA^{*}} explores the entire space, i.e., factorial scale and takes 𝒪(n2d+d2n)\mathcal{O}(n^{2}d+d^{2}n) time for each search as explained in the introduction.

Efficiency w.r.t. running time. The computational efficiency of eight approaches, evaluated over three real-life datasets, has been presented in Table 4. Due to the end-to-end learning, 𝖲𝗂𝗆𝖦𝖭𝖭\mathsf{SimGNN} and 𝖦𝖤𝖭𝖭\mathsf{GENN} achieve the best results and run in several microseconds to predict the GED for one graph pair. Though MATA* is slightly slower than the learning-based models, its running time is close to combinatorial search-based algorithms, and nearly 10410^{4} times faster than the other hybrid approach 𝖦𝖤𝖭𝖭𝖠\mathsf{GENNA^{*}}. This marked difference in performance underscores the significance of top-kk candidate node finds and the impact on the computational efficiency of MATA*.

Table 5. Ablation study. / SEGcn refers to replacing SEGcn with GCN, / LN refers to only removing learning node matching, / LG refers to only removing learning GED, and / A* refers to only removing the matching refinement module.
Datasets Models ACC \uparrow MAE \downarrow p@10 \uparrow ρ\rho \uparrow
Aids MATA* 59.12 0.031 0.486 0.844
/ SEGcn 54.3854.38 0.0360.036 0.4850.485 0.8190.819
/ LN 32.1332.13 0.0570.057 0.3970.397 0.7830.783
/ LG 58.0958.09 0.0330.033 0.4690.469 0.858
/ A* 9.289.28 0.1670.167 0.2440.244 0.3570.357
Imdb MATA* 44.80 0.098 0.509 0.5420.542
/ SEGcn 41.4341.43 0.1000.100 0.4930.493 0.5440.544
/ LN 40.0140.01 0.1020.102 0.4880.488 0.5410.541
/ LG 42.9942.99 0.098 0.5050.505 0.549
/ A* 30.1630.16 0.1120.112 0.4280.428 0.5210.521
Cancer MATA* 55.89 0.040 0.820 0.729
/ SEGcn 53.3653.36 0.0420.042 0.8170.817 0.6990.699
/ LN 45.8545.85 0.0460.046 0.7900.790 0.5820.582
/ LG 55.8755.87 0.040 0.8120.812 0.729
/ A* 10.8910.89 0.1040.104 0.3270.327 0.1460.146

Ablation study. We perform ablation studies to verify the effectiveness of each module and the results are reported in Table 5. It is observed the performance drops dramatically if the matching refinement module is removed. This is because multiple optimal node matchings typically exist from the node matching perspective of GED. It is also noted that the performance considerably decreases if the structure-enhanced GNN (i.e., SEGcn) is replaced by the vanilla one. It demonstrates that SEGcn successfully captures the combinatorial property of structure-dominant operations and learns the powerful embeddings for approximate GED computation. Further, the two designed learning tasks (i.e., LN, and LG) are both helpful for improving the solution quality of GED, especially the learning node matching tasks (i.e., LN). The ablation study is consistent with our analysis of the approximate GED computation.

Table 6. Performance evaluations w.r.t. different top-kk. The metrics are calculated in the same way as those in Table 3.
Datasets top-kk ACC \uparrow MAE \downarrow p@10 \uparrow ρ\rho \uparrow Time \downarrow
Aids 55 74.8574.85 0.0180.018 0.5860.586 0.7960.796 4.3
66 84.1084.10 0.0100.010 0.6930.693 0.8590.859 4.44.4
77 91.7991.79 0.0050.005 0.7870.787 0.9100.910 4.44.4
88 95.7995.79 0.0020.002 0.8460.846 0.9400.940 4.54.5
99 97.7697.76 0.0010.001 0.9120.912 0.9590.959 4.84.8
10 100.00 0.000 1.000 1.000 5.25.2
Imdb 55 38.5738.57 0.1060.106 0.3810.381 0.5780.578 30.2
66 39.4039.40 0.1050.105 0.3910.391 0.5790.579 35.335.3
77 40.9740.97 0.1030.103 0.4010.401 0.5820.582 43.343.3
88 41.5641.56 0.1020.102 0.4000.400 0.5820.582 51.951.9
99 44.4744.47 0.100 0.4120.412 0.5870.587 53.253.2
10 45.05 0.100 0.415 0.588 55.355.3
Cancer 55 5.015.01 0.1040.104 0.4520.452 0.6780.678 129.8
66 7.377.37 0.0910.091 0.4860.486 0.7090.709 146.8146.8
77 10.5510.55 0.0790.079 0.5430.543 0.7340.734 153.1153.1
88 14.3714.37 0.0690.069 0.5690.569 0.7580.758 168.0168.0
99 22.6322.63 0.0580.058 0.6150.615 0.7770.777 176.7176.7
10 34.19 0.048 0.684 0.794 193.2193.2

5.3. Performance w.r.t. top-kk selection.

We also study the performance w.r.t. selecting different kk of MATA* on Aids, Imdb and Cancer datasets. We conduct the following findings from Table 6.

(1) Varying kk in the experiments emphasize the trade-off between solution quality and time. That is setting larger kk could improve the approximate GED solution quality, but the running time indeed increases mainly due to the large search space of 𝖠\mathsf{A}*𝖫𝖲𝖺\mathsf{LSa}. (2) Aids is a small dataset with the exact solutions, and MATA* also achieves the optimal solutions running in 5.25.2 ms, when kk is set to 1010. Note that, MATA* degenerates to 𝖠\mathsf{A}*𝖫𝖲𝖺\mathsf{LSa} when all nodes of 𝒢2\mathcal{G}_{2} are selected as the candidate matching nodes. (3) When kk is set to 6, 8 for datasets Imdb and Cancer, the evaluation metric is worse than that in Table 3. In fact, kk is set to 10 in this test which achieves a smaller edit distance, and the ground-truth of Imdb and Cancer are further updated by these solutions. Hence, the evaluation metrics are re-calculated, which produces worse metrics on Imdb and Cancer. To encapsulate, the interplay of the value of kk, offers a balance between computational scalability and accuracy of the approximate graph edit distance computation.

5.4. Analysis of assignment matrices of MATA*

we offer a visual representation of four assignment matrices that encapsulate both local and high-order perspectives. These matrices, specifically pertaining to two pairs of graphs, have been portrayed as heatmap images, and are generated by MATA* on Aids in Fig. 3. From our observations, both the local and high-order views play a crucial role in the extraction of features tailored for node matchings. This is evident when considering specific node pairs, such as (6,6) and (7,7), from the graph pair labeled (a) (the top row of the assignment matrics 𝐒𝐚(0)\mathbf{S_{a}}^{(0)} and 𝐒𝐚(l)\mathbf{S_{a}}^{(l)}). We can see that the local and high-order views both extract features for matching nodes, e.g., the node pairs (6,6) and (7, 7) of the pair (a) (the top row). Besides, the assignment matrix 𝐒𝐚(l)\mathbf{S_{a}}^{(l)} in the high-order view typically has a more powerful capacity to learn the node correspondence compared to the local view 𝐒𝐚(0)\mathbf{S_{a}}^{(0)}. For example, 𝐒𝐚(0)\mathbf{S_{a}}^{(0)} fails to capture the node pairs (1,1) of the pair (b), while 𝐒𝐚(l)\mathbf{S_{a}}^{(l)} successfully learns it. Thus, Fig. 3 shows the importance of using local and high-order views to jointly learn the top-kk candidates rather than a single one.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3. Analysis of the assignment matrices. The GED of graph pairs (a) & (b) are both equal to 2.

6. Conclusion

We have presented a data-driven hybrid approach MATA* based on Graph Neural Networks (SEGcn) and A* algorithms, which leverage the learned candidate matching nodes to prune unpromising search directions of 𝖠\mathsf{A}*𝖫𝖲𝖺\mathsf{LSa} algorithm to approximate graph edit distance. We have modeled it from a new perspective of node matching and combined the intrinsic relationship between GED computation and node matching. Besides, the design of our hybrid approach MATA* has been aware of the two combinatorial properties involved in GED computation: structure-dominant operations and multiple optimal node matching, to learn the matching nodes from both local and high-order views. Benefiting from the candidate nodes, MATA* has offered a balance between computational scalability and accuracy on the real-life datasets. Finally, we have conducted extensive experiments on Aids, Imdb, and Cancer to demonstrate the effectiveness, scalability, and efficiency of combinatorial search-based, learning-based and hybrid approaches.

Acknowledgements.
This work is supported in part by NSF of China under Grant 61925203 & U22B2021. For any correspondence, please refer to Shuai Ma and Min Zhou.

References

  • (1)
  • Abu-Aisheh et al. (2015) Zeina Abu-Aisheh, Romain Raveaux, Jean-Yves Ramel, and Patrick Martineau. 2015. An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems. In ICPRAM. 271–278.
  • Bai and Zhao (2021) Jiyang Bai and Peixiang Zhao. 2021. TaGSim: Type-aware Graph Similarity Learning and Computation. Proc. VLDB Endow. 15, 2 (2021), 335–347.
  • Bai et al. (2019) Yunsheng Bai, Hao Ding, Song Bian, Ting Chen, Yizhou Sun, and Wei Wang. 2019. SimGNN: A Neural Network Approach to Fast Graph Similarity Computation. In WSDM. 384–392.
  • Bai et al. (2020) Yunsheng Bai, Hao Ding, Ken Gu, Yizhou Sun, and Wei Wang. 2020. Learning-Based Efficient Graph Similarity Computation via Multi-Scale Convolutional Set Matching. In AAAI. 3219–3226.
  • Blumenthal et al. (2020) David B. Blumenthal, Nicolas Boria, Johann Gamper, Sébastien Bougleux, and Luc Brun. 2020. Comparing heuristics for graph edit distance computation. VLDB J. 29, 1 (2020), 419–458.
  • Blumenthal and Gamper (2017) David B. Blumenthal and Johann Gamper. 2017. Exact Computation of Graph Edit Distance for Uniform and Non-uniform Metric Edit Costs. In GbRPR, Vol. 10310. 211–221.
  • Blumenthal and Gamper (2020) David B. Blumenthal and Johann Gamper. 2020. On the exact computation of the graph edit distance. Pattern Recognit. Lett. 134 (2020), 46–57.
  • Carlos et al. (2019) Garcia-Hernandez Carlos, Alberto Fernández, and Francesc Serratosa. 2019. Ligand-Based Virtual Screening Using Graph Edit Distance as Molecular Similarity Measure. J. Chem. Inf. Model. 59, 4 (2019), 1410–1421.
  • Chang et al. (2020) Lijun Chang, Xing Feng, Xuemin Lin, Lu Qin, Wenjie Zhang, and Dian Ouyang. 2020. Speeding Up GED Verification for Graph Similarity Search. In ICDE. 793–804.
  • Chang et al. (2023) Lijun Chang, Xing Feng, Kai Yao, Lu Qin, and Wenjie Zhang. 2023. Accelerating Graph Similarity Search via Efficient GED Computation. IEEE Trans. Knowl. Data Eng. 35, 5 (2023), 4485–4498.
  • Chen et al. (2019) Xiaoyang Chen, Hongwei Huo, Jun Huan, and Jeffrey Scott Vitter. 2019. An efficient algorithm for graph edit distance computation. Knowl. Based Syst. 163 (2019), 762–775.
  • Cho et al. (2013) Minsu Cho, Karteek Alahari, and Jean Ponce. 2013. Learning graphs to match. In ICCV. 25–32.
  • Cuturi (2013) Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. NeurIPS 26 (2013), 2292–2300.
  • Dwivedi et al. (2022) Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. 2022. Graph Neural Networks with Learnable Structural and Positional Representations. In International Conference on Learning Representations.
  • Fankhauser et al. (2011) Stefan Fankhauser, Kaspar Riesen, and Horst Bunke. 2011. Speeding Up Graph Edit Distance Computation through Fast Bipartite Matching. In GbRPR, Vol. 6658. 102–111.
  • Fey et al. (2020) Matthias Fey, Jan Eric Lenssen, Christopher Morris, Jonathan Masci, and Nils M. Kriege. 2020. Deep Graph Matching Consensus. In International Conference on Learning Representations.
  • Hart et al. (1968) Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Trans. Syst. Sci. Cybern. 4, 2 (1968), 100–107.
  • Kim et al. (2019) Jongik Kim, Dong-Hoon Choi, and Chen Li. 2019. Inves: Incremental Partitioning-Based Verification for Graph Similarity Search. In EDBT. 229–240.
  • Kipf and Welling (2016) Thomas N. Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. CoRR abs/1609.02907 (2016).
  • Kochenberger et al. (2014) Gary Kochenberger, Jin-Kao Hao, Fred Glover, Mark Lewis, Zhipeng Lü, Haibo Wang, and Yang Wang. 2014. The unconstrained binary quadratic programming problem: a survey. Journal of combinatorial optimization 28 (2014), 58–81.
  • Koutra et al. (2013) Danai Koutra, Hanghang Tong, and David Lubensky. 2013. Big-align: Fast bipartite graph alignment. In ICDM. 389–398.
  • Li et al. (2020) Pan Li, Yanbang Wang, Hongwei Wang, and Jure Leskovec. 2020. Distance encoding: Design provably more powerful neural networks for graph representation learning. NeurIPS 33 (2020), 4465–4478.
  • Li et al. (2019) Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. 2019. Graph Matching Networks for Learning the Similarity of Graph Structured Objects. In ICML, Vol. 97. 3835–3845.
  • Mialon et al. (2021) Grégoire Mialon, Dexiong Chen, Margot Selosse, and Julien Mairal. 2021. GraphiT: Encoding Graph Structure in Transformers. CoRR abs/2106.05667 (2021).
  • Neuhaus et al. (2006) Michel Neuhaus, Kaspar Riesen, and Horst Bunke. 2006. Fast Suboptimal Algorithms for the Computation of Graph Edit Distance. In IAPR Workshops, Vol. 4109. 163–172.
  • Peng et al. (2021) Yun Peng, Byron Choi, and Jianliang Xu. 2021. Graph Edit Distance Learning via Modeling Optimum Matchings with Constraints. In IJCAI. 1534–1540.
  • Ranjan et al. (2022) Rishabh Ranjan, Siddharth Grover, Sourav Medya, Venkatesan Chakaravarthy, Yogish Sabharwal, and Sayan Ranu. 2022. Greed: A neural framework for learning graph distance functions. NeurIPS 35 (2022), 22518–22530.
  • Riesen and Bunke (2009) Kaspar Riesen and Horst Bunke. 2009. Approximate graph edit distance computation by means of bipartite graph matching. Image Vis. Comput. 27, 7 (2009), 950–959.
  • Riesen et al. (2013) Kaspar Riesen, Sandro Emmenegger, and Horst Bunke. 2013. A Novel Software Toolkit for Graph Edit Distance Computation. In GbRPR, Vol. 7877. 142–151.
  • Riesen et al. (2007) Kaspar Riesen, Stefan Fankhauser, and Horst Bunke. 2007. Speeding Up Graph Edit Distance Computation with a Bipartite Heuristic. In MLG. 21–24.
  • Wang et al. (2023) Runzhong Wang, Ziao Guo, Shaofei Jiang, Xiaokang Yang, and Junchi Yan. 2023. Deep Learning of Partial Graph Matching via Differentiable Top-K. In CVPR. 6272–6281.
  • Wang et al. (2019) Runzhong Wang, Junchi Yan, and Xiaokang Yang. 2019. Learning Combinatorial Embedding Networks for Deep Graph Matching. In ICCV. 3056–3065.
  • Wang et al. (2021) Runzhong Wang, Tianqi Zhang, Tianshu Yu, Junchi Yan, and Xiaokang Yang. 2021. Combinatorial Learning of Graph Edit Distance via Dynamic Embedding. In CVPR. 5241–5250.
  • Xie et al. (2020) Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, and Tomas Pfister. 2020. Differentiable top-k with optimal transport. NeurIPS 33 (2020), 20520–20531.
  • Yan et al. (2020) Junchi Yan, Shuang Yang, and Edwin R. Hancock. 2020. Learning for Graph Matching and Related Combinatorial Optimization Problems. In IJCAI. 4988–4996.
  • Yanardag and Vishwanathan (2015) Pinar Yanardag and S. V. N. Vishwanathan. 2015. Deep Graph Kernels. In SIGKDD. 1365–1374.
  • Yang and Zou (2021) Lei Yang and Lei Zou. 2021. Noah: Neural-optimized A* Search Algorithm for Graph Edit Distance Computation. In ICDE. 576–587.
  • Yang et al. (2023) Menglin Yang, Min Zhou, Lujia Pan, and Irwin King. 2023. κ\kappaHGCN: Tree-likeness Modeling via Continuous and Discrete Curvature Learning. In SIGKDD. 2965–2977.
  • Ying et al. (2021) Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. 2021. Do transformers really perform badly for graph representation? NeurIPS 34 (2021), 28877–28888.
  • Yu et al. (2020) Tianshu Yu, Runzhong Wang, Junchi Yan, and Baoxin Li. 2020. Learning deep graph matching with channel-independent embedding and Hungarian attention. In International Conference on Learning Representations.
  • Zhang et al. (2021) Zhen Zhang, Jiajun Bu, Martin Ester, Zhao Li, Chengwei Yao, Zhi Yu, and Can Wang. 2021. H2MN: Graph Similarity Learning with Hierarchical Hypergraph Matching Networks. In SIGKDD. 2274–2284.
  • Zhuo and Tan (2022) Wei Zhuo and Guang Tan. 2022. Efficient Graph Similarity Computation with Alignment Regularization. NeurIPS 35, 30181–30193.