Discourse-level Relation Extraction via Graph Pooling

Anonymous

General Response

We thank all the reviewers for the comments and suggestions.

Response to Reviewer 1

We thank the reviewer for the feedback and appreciation of our work. We will resolve the writing issue in our revised version.

We want to clarify that on the $n$ -ary dataset, our method presents a new SOTA, and this is our focused setting of learning better mention representations. This showcases the effectiveness of our method. For entity-based document-level relation extraction, indeed, we do not achieve a new SOTA. This may due to our minimal design to aggregate the mention-level representations to entity embeddings and to the final relation representation. For example, in EoGANE (Tran et al.), and SSAN (Xu et al.), they encode complex path information to update entity and relation embeddings. We leave it as a future work to integrate our improved mention representation with sophisticated methods to better learn the interaction among mentions, entities, and relations.

Response to Reviewer 2

We thank the reviewer for the feedback and appreciation of our work.

How to balance long-term and short-term?

As a precaution to balance the input between short-term and long-term, we design residual connections that combine information at different scale (as we state at Section 2.3). Yet, the design is definitely not enough from what we learnt in the analysis. We really appreciate your feedback and will dive into the problem in the future.

The improvement is not significant.

In the $n$ -ary dataset, our improvement over AGGCN (Guo et al.) and Graph LSTM (Peng et al.) are quite large with an average over 5% accuracy improvements. Since we cannot get the output from the compared baselines, no significance test is conducted. However, as shown in our Table 2, we show significant improvements over our GCN baselines in the CDR dataset, which verifies the effectiveness of our method.

Response to Reviewer 3

1. Many of the claimed SOTAs were very close and would need some significance testing to demonstrate convincingly.

On the $n$ -ary dataset, our improvements over AGGCN (Guo et al.) and Graph LSTM (Peng et al.) are quite large with an average over 5% accuracy improvements. Since we cannot get the output from the compared baselines, no significance test is conducted; but we believe the improvements are statistically significant based on large gap. Besides, as shown in our Table 2, we show statistically significant improvements over our GCN baselines in the CDR dataset, which verifies the effectiveness of our method.

2. Training with larger corpus.

We appreciate your suggestions on evaluating our method on larger corpus, and we consider add it in our revised version. However, we want to emphasize that even if our method can only show improvements on small datasets, it is still valuable, as many domains (e.g., biomedical domain) suffer from the data sparsity issue, and we argue small training corpus is also a valuable and more practical setting.

3. It is not clear whether this pooling operation will lead to information loss/noises.

The pooling operation could bring unpredictable information change, and that is why we create the residual connections design within our framework. The model can learn representation from different granularity of graph. Our strong empirical performances clearly indicate that the pooling operation can bring gains even with potential risk of information loss.

Response to Reviewer 5

We address your concern about the novelty from two aspects:

First, the key insight of our paper is to address the issue that prior works on document-level relation extraction (DRE) rarely address – learning larger receptive field for GNNs. A large receptive field becomes more important when the input text are longer, which is particularly suitable for the DRE tasks. As we mentioned in the introduction, an intuitive method to achieve this is to stack a deep GNN network, yet this approach can easily falls into saturation, like oversmoothing. This motivates us to employ the pooling-unpooling mechanism on DRE tasks. To our best of knowledge, we are the first to raise attention to increase receptive field of GNNs and to integrate graph pooling on top of GNN models for DRE tasks. From the analysis (Figure 5), we verify this idea and show that our method indeed help the learning for instances that requires long-term dependencies.

Second, we propose a novel pooling technique that is specifically designed for NLP applications. We point out that the document graphs contain rich edge information and the edge information is ignored if we directly apply graph pooling methods that are invented in social network research (like the HM pooling strategy). Our attempt to design new graph pooling technique specifically for NLP tasks could potentially inspire the community to pay attention to the property of the graphs and foster more NLP-oriented graph operation designs.