Large Language Models as Zero-Shot Keyphrase Extractors:
A Preliminary Empirical Study

Mingyang Song, Xuelian Geng, Songfang Yao, Shilong Lu, Yi Feng, Liping Jing
Beijing Key Lab of Traffic Data Analysis and Mining
Beijing Jiaotong University, Beijing, China
mingyang.song@bjtu.edu.cn
Corresponding Author

Abstract

Zero-shot keyphrase extraction aims to build a keyphrase extractor without training by human-annotated data, which is challenging due to the limited human intervention involved. Challenging but worthwhile, zero-shot setting efficiently reduces the time and effort that data labeling takes. Recent efforts on pre-trained large language models (e.g., ChatGPT and ChatGLM) show promising performance on zero-shot settings, thus inspiring us to explore prompt-based methods. In this paper, we ask whether strong keyphrase extraction models can be constructed by directly prompting the large language model ChatGPT. Through experimental results, it is found that ChatGPT still has a lot of room for improvement in the keyphrase extraction task compared to existing state-of-the-art unsupervised and supervised models. We have released the related data¹¹1https://github.com/MySong7NLPer/ChatGPT_as_Keyphrase_Extractor.

1 Introduction

Keyphrase extraction aims to extract a set of important phrases from unstructured text into structured data formats, which is a fundamental and crucial task in natural language processing Hasan and Ng (2014); Song et al. (2023b, d). Typically, keyphrase is beneficial for various applications due to its concise and precise expression Li et al. (2023b); Song et al. (2021a); Tian et al. (2023); Song et al. (2022b); Li et al. (2023c); Song et al. (2023a); Salewski et al. (2023). Working with an enormous amount of labeling data is always hectic, labor-intensive, and time-consuming. Hence, many organizations and companies rely on keyphrase extraction to automate manual work with zero / few-shot settings Liang et al. (2021); Song et al. (2023e); Kong et al. (2023); Song et al. (2023f); Ding and Luo (2022).

Recent works Agrawal et al. (2022); Wei et al. (2023) on large-scale pre-trained language models, such as GPT-3 Brown et al. (2020), InstructGPT Ouyang et al. (2022) and ChatGPT 2, suggest that large language models perform well in various natural language processing downstream tasks even without tuning the parameters. Therefore, it is a challenging question: Is it feasible to prompt large language models to achieve a zero-shot keyphrase extractor? Based on these clues, in this paper, we turn to ChatGPT and hypothesize that ChatGPT is born with the right abilities to deposit a zero-shot keyphrase extractor in an interactive mode.

In this paper, we mainly focus on verifying the performance of ChatGPT on four keyphrase extraction datasets and the ability of understanding long documents. Generally, long documents often necessitate complex processing strategies Mahata et al. (2022); Li et al. (2023a); Feng et al. (2023); Pu et al. (2023). In the keyphrase extraction task, while it may be feasible to design improved algorithms for handling a multitude of candidate keyphrases in long documents, we believe that the effective incorporation of advanced features, especially those with background knowledge, can enhance the efficient discrimination between keyphrases and non-keyphrases, even when dealing with a vast number of candidate keyphrases. Simultaneously, the question of how to encode long documents is a topic worthy of exploration Hasan and Ng (2014). Consequently, we test the capacity of ChatGPT as a general large language model to handle long documents. Through extensive experiments, it is found that ChatGPT still has a lot of room for improvement in the keyphrase extraction task compared to existing state-of-the-art unsupervised and supervised models.

Test Set	Domain	Type	# Doc.	Avg. # Words
Inspec Hulth (2003)	Scientific Abstract	Short	500	134.6
DUC2001 Nguyen and Kan (2007)	News Article	Long	308	847.2
SemEval2010 Kim et al. (2010)	Scientific Article	Long	100	1587.5
OpenKP Xiong et al. (2019)	Open Web Domain	Long	6,616	900.4

Table 1: Statistics of used test sets. # Doc. is the number of documents in the dataset. Avg. # Words is the average number of words for documents. Present KPs (%) indicates the percentage of keyphrases, which are presented in the documents. Note that this report uses all of the test data rather than sampling part from it.

Prompts
Tp $1$	Extract keywords from this text: [Document]
Tp $2$	Extract keyphrases from this text: [Document]

Table 2: Two prompts are designed for chatting with ChatGPT to extract keyphrases from the text document.

2 ChatGPT for Keyphrase Extraction

2.1 Evaluation Setting

We briefly introduce the evaluation setting, which mainly includes the compared baselines, datasets, and evaluation metrics. Note that each time a new query is made to ChatGPT, we clear conversations to avoid the influence of previous samples, which is similar to Bang et al. (2023).

In this paper, we compare ChatGPT with several state-of-the-art unsupervised keyphrase extraction models: TF-IDF Jones (2004), YAKE Campos et al. (2018), TextRank Mihalcea and Tarau (2004), TopicRank Bougouin et al. (2013), PositionRank Florescu and Caragea (2017), MultipartiteRank Boudin (2018), EmbedRank Bennani-Smires et al. (2018), KeyGames Saxena et al. (2020), SIFRank Sun et al. (2020), AttentionRank Ding and Luo (2021), JointGL Liang et al. (2021), MDERank Zhang et al. (2022), SetMatch Song et al. (2023c), HGUKE Song et al. (2023e), AGRank Ding and Luo (2022), HyperRank Song et al. (2023f), CentralityRank Song et al. (2023i), and PromptRank Kong et al. (2023). In addition to unsupervised models, we compare ChatGPT with two state-of-the-art supervised models: HyperMatch Song et al. (2022a) and KIEMP Song et al. (2021b). Furthermore, we also select the large language models ChatGLM-6B²²2https://github.com/THUDM/ChatGLM-6B and ChatGLM2-6B³³3https://github.com/thudm/chatglm2-6b for comparison. By default, the results in this report come from the ChatGPT version on 2023.03.01.

We evaluate ChatGPT and all the baselines on four datasets: Inspec Hulth (2003), DUC2001 Wan and Xiao (2008), SemEval2010 Kim et al. (2010), and OpenKP Xiong et al. (2019). Table 1 summarizes the statistics of used test sets.

Following previous studies Song et al. (2021b, 2022a, 2023h), we adopt macro averaged F1@5 and F1@M to evaluate the quality of both present and absent keyphrases. When using F1@5, blank keyphrases are added to make the keyphrase number reach five if the prediction number is less than five. Similar to the previous studies Song et al. (2022c, 2023e, 2023c, 2023g), we employ the Porter Stemmer to remove the identical stemmed keyphrases.

2.2 Keyphrase Extraction Prompts

ChatGPT requires human-designed prompts or instructions as guiding information to trigger its ability to achieve downstream tasks, i.e., extracting keyphrases. It should be noted that the style and quality of prompts can greatly affect the quality of ChatGPT keyphrase extraction. High-quality prompts often achieve better task performance than low-quality prompts. Figure 2 presents the prompts designed for extracting keyphrases, which is similar to Song et al. (2023d).

2.3 Overall Performance

As illustrated in Table 4, it can be seen that ChatGPT achieves similar performance to the unsupervised method TF-IDF with simple prompts, and far surpasses the unsupervised method TextRank. Although there is a certain gap in the performance of ChatGPT compared to supervised methods HyperMatch and KIEMP, fine-tuning large models like ChatGPT through human-annotated data may make the performance far superior to existing supervised keyphrase extraction models. In addition, it is interesting that ChatGLM2 with only 6B parameters achieved comparable performance to ChatGPT.

Statistical Models
Model	DUC2001			Inspec			SemEval2010
Model	F1@5	F1@10	F1@15	F1@5	F1@10	F1@15	F1@5	F1@10	F1@15
TF-IDF Jones (2004)	9.21	10.63	11.06	11.28	13.88	13.83	2.81	3.48	3.91
YAKE Campos et al. (2018)	12.27	14.37	14.76	18.08	19.62	20.11	11.76	14.4	15.19
Graph-based Models
TextRank Mihalcea and Tarau (2004)	11.80	18.28	20.22	27.04	25.08	36.65	3.80	5.38	7.65
SingleRank Wan and Xiao (2008)	20.43	25.59	25.70	27.79	34.46	36.05	5.90	9.02	10.58
TopicRank Bougouin et al. (2013)	21.56	23.12	20.87	25.38	28.46	29.49	12.12	12.90	13.54
PositionRank Florescu and Caragea (2017)	23.35	28.57	28.60	28.12	32.87	33.32	9.84	13.34	14.33
MultipartiteRank Boudin (2018)	23.20	25.00	25.24	25.96	29.57	30.85	12.13	13.79	14.92
Embedding-based Models
EmbedRankd2v Bennani-Smires et al. (2018)	24.02	28.12	28.82	31.51	37.94	37.96	3.02	5.08	7.23
EmbedRanks2v Bennani-Smires et al. (2018)	27.16	31.85	31.52	29.88	37.09	38.40	5.40	8.91	10.06
KeyGames Saxena et al. (2020)	24.42	28.28	29.77	32.12	40.48	40.94	11.93	14.35	14.62
SIFRank Sun et al. (2020)	24.27	27.43	27.86	29.11	38.80	39.59	-	-	-
SIFRank+ Sun et al. (2020)	30.88	33.37	32.24	28.49	36.77	38.82	-	-	-
AttentionRank Ding and Luo (2021)	-	-	-	24.45	32.15	34.49	11.39	15.12	16.66
JointGL Liang et al. (2021)	28.62	35.52	36.29	32.61	40.17	41.09	13.02	19.35	21.72
MDERank Zhang et al. (2022)	23.31	26.65	26.42	27.85	34.36	36.40	13.05	18.27	20.35
SetMatch Song et al. (2023c)	31.19	36.34	38.72	33.54	40.63	42.11	14.44	20.79	24.18
HGUKE Song et al. (2023e)	31.31	37.24	38.31	34.18	41.05	42.16	14.07	20.52	23.10
AGRank Ding and Luo (2022)	-	-	-	34.59	40.70	41.15	15.37	21.22	23.72
HyperRank Song et al. (2023f)	32.68	39.18	40.21	33.35	40.79	42.12	14.79	21.33	24.20
CentralityRank Song et al. (2023i)	31.63	37.77	38.77	32.99	40.93	41.73	15.51	21.39	23.83
PromptRank Kong et al. (2023)	27.39	31.59	31.01	31.73	37.88	38.17	17.24	20.66	21.35
LLM-based Models
ChatGPT (gpt-3.5-turbo) - Tp $1$	19.29	23.32	22.98	22.95	30.57	33.87	13.25	15.94	17.12
ChatGPT (gpt-3.5-turbo) - Tp $2$	21.50	25.03	24.21	28.07	34.85	36.69	13.66	16.11	16.18
ChatGLM-6B - Tp $2$	11.57	11.72	11.09	14.52	18.85	20.17	6.69	8.71	9.14
ChatGLM2-6B - Tp $2$	15.08	15.44	13.96	25.07	30.09	31.56	13.12	13.79	13.94

Table 3: Performance on DUC2001, Inspec and SemEval2010 test sets. The best results are in bold.

Unsupervised Models
Model	OpenKP
Model	F1@1	F1@3	F1@5
TF-IDF Jones (2004)	19.6	22.3	19.6
TextRank Mihalcea and Tarau (2004)	5.4	7.6	7.9
PLM-based Models
HyperMatch Song et al. (2022a)	36.4	39.4	33.8
KIEMP Song et al. (2021b)	36.9	39.2	34.0
LLM-based Models
ChatGPT (gpt-3.5-turbo) - Tp $1$	16.5	21.1	17.4
ChatGPT (gpt-3.5-turbo) - Tp $2$	18.0	21.6	18.7
ChatGLM-6B - Tp $2$	11.5	8.5	7.1
ChatGLM2-6B - Tp $2$	16.0	11.0	8.6

Table 4: Results of keyphrase extraction on the OpenKP dataset. F1@3 is the main evaluation metric for this dataset Xiong et al. (2019). The results of the baselines are reported in their corresponding papers. The best results are highlighted in bold.

2.4 Long Document Understanding

Generally, handling long documents poses a significant challenge in numerous natural language processing tasks, primarily due to the presence of intricate contexts within extended texts. This challenge is particularly pressing in keyphrase generation, where effectively comprehending long documents is paramount. Many of the existing keyphrase extraction baseline models, such as Sun et al. (2021), Song et al. (2021b), and Song et al. (2022a), necessitate the truncation of input documents to conform to the input constraints imposed by the underlying backbone, such as BERT Devlin et al. (2019), resulting in a substantial loss of information. Simultaneously, extracting keyphrases that are contextually meaningful to the document is crucial in achieving semantic document understanding.

Consequently, we conduct experiments to measure the capability of ChatGPT to understand lengthy documents, as shown in Table 4 and Table 3. From the results, it can be seen that compared with baselines, the extraction results of ChatGPT are not very ideal. Of course, various factors affect the performance of ChatGPT, and the results presented in this paper are only preliminary explorations. In addition, the extraction results of ChatGPT on the long document dataset SemEval2010 are significantly better than those on the Inspec and DUC2001 datasets, which verifies its excellent ability to understand long documents. As a next step, we plan to select longer documents, for example, documents containing approximately 4096 words, to serve as test cases.

3 Conclusion

In this paper, we conduct preliminary experiments and analysis on the keyphrase extraction task. Although ChatGPT has achieved good results in some natural language processing downstream tasks, there is still room for improvement in the keyphrase extraction task. Of course, due to limited available resources, this paper only conducted experimental verification under some basic settings, which may to some extent limit the performance of ChatGPT. In addition, there are many methods to optimize the extraction results of ChatGPT, such as designing more complex and high-quality prompts, constructing contextual samples as auxiliary information, and introducing supervised fine-tuning methods.

In future research, utilizing large language models as external knowledge-assisted methods may be a hot topic in the keyphrase extraction task.

References

Agrawal et al. (2022) Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim, and David Sontag. 2022. Large language models are few-shot clinical information extractors. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1998–2022, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Bang et al. (2023) Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, and Pascale Fung. 2023. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. CoRR, abs/2302.04023.
Bennani-Smires et al. (2018) Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl, and Martin Jaggi. 2018. Simple unsupervised keyphrase extraction using sentence embeddings. In CoNLL, pages 221–229. Association for Computational Linguistics.
Boudin (2018) Florian Boudin. 2018. Unsupervised keyphrase extraction with multipartite graphs. In NAACL-HLT (2), pages 667–672. Association for Computational Linguistics.
Bougouin et al. (2013) Adrien Bougouin, Florian Boudin, and Béatrice Daille. 2013. Topicrank: Graph-based topic ranking for keyphrase extraction. In IJCNLP, pages 543–551. Asian Federation of Natural Language Processing / ACL.
Brown et al. (2020) Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. NIPS’20, Red Hook, NY, USA. Curran Associates Inc.
Campos et al. (2018) Ricardo Campos, Vítor Mangaravite, Arian Pasquali, Alípio Mário Jorge, Célia Nunes, and Adam Jatowt. 2018. Yake! collection-independent automatic keyword extractor. In ECIR, volume 10772 of Lecture Notes in Computer Science, pages 806–810. Springer.
Devlin et al. (2019) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT (1), pages 4171–4186. Association for Computational Linguistics.
Ding and Luo (2021) Haoran Ding and Xiao Luo. 2021. Attentionrank: Unsupervised keyphrase extraction using self and cross attentions. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1919–1928.
Ding and Luo (2022) Haoran Ding and Xiao Luo. 2022. AGRank: Augmented graph-based unsupervised keyphrase extraction. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 230–239, Online only. Association for Computational Linguistics.
Feng et al. (2023) Shangbin Feng, Zhaoxuan Tan, Wenqian Zhang, Zhenyu Lei, and Yulia Tsvetkov. 2023. KALM: Knowledge-aware integration of local, document, and global contexts for long document understanding. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 2116–2138. Association for Computational Linguistics.
Florescu and Caragea (2017) Corina Florescu and Cornelia Caragea. 2017. Positionrank: An unsupervised approach to keyphrase extraction from scholarly documents. In ACL (1), pages 1105–1115. Association for Computational Linguistics.
Hasan and Ng (2014) Kazi Saidul Hasan and Vincent Ng. 2014. Automatic keyphrase extraction: A survey of the state of the art. In ACL (1), pages 1262–1273. The Association for Computer Linguistics.
Hulth (2003) Anette Hulth. 2003. Improved automatic keyword extraction given more linguistic knowledge. In EMNLP.
Jones (2004) Karen Spärck Jones. 2004. A statistical interpretation of term specificity and its application in retrieval. J. Documentation, 60(5):493–502.
Kim et al. (2010) Su Nam Kim, Olena Medelyan, Min-Yen Kan, and Timothy Baldwin. 2010. Semeval-2010 task 5 : Automatic keyphrase extraction from scientific articles. In SemEval@ACL, pages 21–26. The Association for Computer Linguistics.
Kong et al. (2023) Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, and Xiaoyan Bai. 2023. PromptRank: Unsupervised keyphrase extraction using prompt. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9788–9801, Toronto, Canada. Association for Computational Linguistics.
Li et al. (2023a) Irene Li, Aosong Feng, Dragomir Radev, and Rex Ying. 2023a. HiPool: Modeling long documents using graph neural networks. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 161–171. Association for Computational Linguistics.
Li et al. (2023b) Minghan Li, Honglei Zhuang, Kai Hui, Zhen Qin, Jimmy Lin, Rolf Jagerman, Xuanhui Wang, and Michael Bendersky. 2023b. Generate, filter, and fuse: Query expansion via multi-step keyword generation for zero-shot neural rankers.
Li et al. (2023c) Zongyi Li, Xiaoqing Zheng, and Jun He. 2023c. Unsupervised summarization by jointly extracting sentences and keywords.
Liang et al. (2021) Xinnian Liang, Shuangzhi Wu, Mu Li, and Zhoujun Li. 2021. Unsupervised keyphrase extraction by jointly modeling local and global context. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 155–164, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Mahata et al. (2022) Debanjan Mahata, Navneet Agarwal, Dibya Gautam, Amardeep Kumar, Swapnil Parekh, Yaman Kumar Singla, Anish Acharya, and Rajiv Ratn Shah. 2022. LDKP - A dataset for identifying keyphrases from long scientific documents. In Proceedings of the Workshop on Deep Learning for Search and Recommendation (DL4SR 2022) co-located with the 31st ACM International Conference on Information and Knowledge Management (CIKM 2022), volume 3317 of CEUR Workshop Proceedings. CEUR-WS.org.
Mihalcea and Tarau (2004) Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In EMNLP, pages 404–411. ACL.
Nguyen and Kan (2007) Thuy Dung Nguyen and Min-Yen Kan. 2007. Keyphrase extraction in scientific publications. In ICADL, volume 4822 of Lecture Notes in Computer Science, pages 317–326. Springer.
Ouyang et al. (2022) Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human feedback.
Pu et al. (2023) Dongqi Pu, Yifan Wang, and Vera Demberg. 2023. Incorporating distributions of discourse structure for long document abstractive summarization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 5574–5590. Association for Computational Linguistics.
Salewski et al. (2023) Leonard Salewski, Stefan Fauth, A. Sophia Koepke, and Zeynep Akata. 2023. Zero-shot audio captioning with audio-language model guidance and audio context keywords.
Saxena et al. (2020) Arnav Saxena, Mudit Mangal, and Goonjan Jain. 2020. Keygames: A game theoretic approach to automatic keyphrase extraction. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2037–2048.
Song et al. (2022a) Mingyang Song, Yi Feng, and Liping Jing. 2022a. Hyperbolic relevance matching for neural keyphrase extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 5710–5720. Association for Computational Linguistics.
Song et al. (2022b) Mingyang Song, Yi Feng, and Liping Jing. 2022b. A preliminary exploration of extractive multi-document summarization in hyperbolic space. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, October 17-21, 2022, pages 4505–4509. ACM.
Song et al. (2022c) Mingyang Song, Yi Feng, and Liping Jing. 2022c. Utilizing BERT intermediate layers for unsupervised keyphrase extraction. In 5th International Conference on Natural Language and Speech Processing, ICNLSP 2022, Trento, Italy, December 16-17, 2022, pages 277–281. Association for Computational Linguistics.
Song et al. (2023a) Mingyang Song, Yi Feng, and Liping Jing. 2023a. Hisum: Hyperbolic interaction model for extractive multi-document summarization. In Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, pages 1427–1436. ACM.
Song et al. (2023b) Mingyang Song, Yi Feng, and Liping Jing. 2023b. A survey on recent advances in keyphrase extraction from pre-trained language models. In Findings of the Association for Computational Linguistics: EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, pages 2108–2119. Association for Computational Linguistics.
Song et al. (2023c) Mingyang Song, Haiyun Jiang, Lemao Liu, Shuming Shi, and Liping Jing. 2023c. Unsupervised keyphrase extraction by learning neural keyphrase set function. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2482–2494, Toronto, Canada. Association for Computational Linguistics.
Song et al. (2023d) Mingyang Song, Haiyun Jiang, Shuming Shi, Songfang Yao, Shilong Lu, Yi Feng, Huafeng Liu, and Liping Jing. 2023d. Is chatgpt a good keyphrase generator? a preliminary study.
Song et al. (2021a) Mingyang Song, Liping Jing, Yi Feng, Zhiwei Sun, and Lin Xiao. 2021a. Hybrid summarization with semantic weighting reward and latent structure detector. In Proceedings of The 13th Asian Conference on Machine Learning, volume 157 of Proceedings of Machine Learning Research, pages 1739–1754. PMLR.
Song et al. (2021b) Mingyang Song, Liping Jing, and Lin Xiao. 2021b. Importance Estimation from Multiple Perspectives for Keyphrase Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2726–2736, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Song et al. (2023e) Mingyang Song, Huafeng Liu, Yi Feng, and Liping Jing. 2023e. Improving embedding-based unsupervised keyphrase extraction by incorporating structural information. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1041–1048, Toronto, Canada. Association for Computational Linguistics.
Song et al. (2023f) Mingyang Song, Huafeng Liu, and Liping Jing. 2023f. HyperRank: Hyperbolic ranking model for unsupervised keyphrase extraction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16070–16080, Singapore. Association for Computational Linguistics.
Song et al. (2023g) Mingyang Song, Huafeng Liu, and Liping Jing. 2023g. Improving diversity in unsupervised keyphrase extraction with determinantal point process. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, page 4294–4299. Association for Computing Machinery.
Song et al. (2023h) Mingyang Song, Lin Xiao, and Liping Jing. 2023h. Learning to extract from multiple perspectives for neural keyphrase extraction. Comput. Speech Lang., 81:101502.
Song et al. (2023i) Mingyang Song, Pengyu Xu, Yi Feng, Huafeng Liu, and Liping Jing. 2023i. Mitigating over-generation for unsupervised keyphrase extraction with heterogeneous centrality detection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16349–16359, Singapore. Association for Computational Linguistics.
Sun et al. (2021) Si Sun, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu, and Jie Bao. 2021. Capturing global informativeness in open domain keyphrase extraction. In CCF International Conference on Natural Language Processing and Chinese Computing, pages 275–287. Springer.
Sun et al. (2020) Yi Sun, Hangping Qiu, Yu Zheng, Zhongwei Wang, and Chaoran Zhang. 2020. Sifrank: A new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access, 8:10896–10906.
Tian et al. (2023) Yuanyuan Tian, Wenwen Li, Sizhe Wang, and Zhining Gu. 2023. Semantic similarity measure of natural language text through machine learning and a <scp>keyword-aware cross-encoder-ranking</scp> summarizer—a case study using <scp>ucgis gis</scp>&t body of knowledge. Transactions in GIS, 27(4):1068–1089.
Wan and Xiao (2008) Xiaojun Wan and Jianguo Xiao. 2008. Single document keyphrase extraction using neighborhood knowledge. In AAAI, pages 855–860. AAAI Press.
Wei et al. (2023) Xiang Wei, Xingyu Cui, Ning Cheng, Xiaobin Wang, Xin Zhang, Shen Huang, Pengjun Xie, Jinan Xu, Yufeng Chen, Meishan Zhang, Yong Jiang, and Wenjuan Han. 2023. Zero-shot information extraction via chatting with chatgpt. ArXiv, abs/2302.10205.
Xiong et al. (2019) Lee Xiong, Chuan Hu, Chenyan Xiong, Daniel Campos, and Arnold Overwijk. 2019. Open domain web keyphrase extraction beyond language modeling. In EMNLP/IJCNLP (1), pages 5174–5183. Association for Computational Linguistics.
Zhang et al. (2022) Linhan Zhang, Qian Chen, Wen Wang, Chong Deng, ShiLiang Zhang, Bing Li, Wei Wang, and Xin Cao. 2022. MDERank: A masked document embedding rank approach for unsupervised keyphrase extraction. In Findings of the Association for Computational Linguistics: ACL 2022, pages 396–409, Dublin, Ireland. Association for Computational Linguistics.

Large Language Models as Zero-Shot Keyphrase Extractors: A Preliminary Empirical Study