Introduction

I started my Ph.D. two month ago, with some literature study and discussion, I decided to focus on “Text Summarization”. While its phrase, “Text Summarization”, sounds simple and straightforward, I had no idea what researchers are actually working on. So I decided to (skim-)read (almost) all papers in fresh EMNLP 2021 about summarization, and tried to understand its overview.

At the end, I read 42 papers from main (long/short), finding (long/short), workshop about summarization. However, I didn’t read ones about “Dialogue/Conversation summarization” and some language specific papers. It’s not that they aren’t interesting but simply I didn’t have time/energy. To show a clear view of what researchers are working on, I tried to spot keywords and categorize. There are 13 keywords (some of them have only one paper), for each keyword, I listed related papers from the conference. And at the end of this post, I listed papers with short summaries so you can decide which papers you want to read.

Keywords

I here list 13 keywords with their short description, and related papers. Ordered by the number of related papers.

Big Problems

Having this overview makes me noticed two big issues in summarization. 1) evaluation, 2) factual consistency, and these two are closely related. Since we don’t know how to evaluate summaries properly, state-of-the-art models still generate summaries which contain information not alighned with the input document (hullucination). This problem prevents us to build cool summarization applications. Most of evaluation papers in the list focus to provide ways to assess this issue.

Zhang+ propose a hybrid (machine and human) way for evaluation. Zeng+ shows adversarial samples can help evaluation for factual consistency. While these work show approaches to evaluate/provide factual consistency, it seems there is still a big gap for real-life application.

Conclusion

In this post, I tried to categorize almost all papers from EMNLP 2021 about summarization by some keywords to grasp the overview of the summarization world. After the categorization, I listed two big problems we need to solve to bring the technologies into applications. I hope this post help students who just started to study summarization like me to find their own interests.

Appendix

Personal 3 favorite papers

List of Summarizes

Google Docs contains a list of papers above with short summaries.