A Summarization of Summarization Papers from EMNLP 2021 2021-12-04

Introduction

I started my Ph.D. two month ago, with some literature study and discussion, I decided to focus on “Text Summarization”. While its phrase, “Text Summarization”, sounds simple and straightforward, I had no idea what researchers are actually working on. So I decided to (skim-)read (almost) all papers in fresh EMNLP 2021 about summarization, and tried to understand its overview.

At the end, I read 42 papers from main (long/short), finding (long/short), workshop about summarization. However, I didn’t read ones about “Dialogue/Conversation summarization” and some language specific papers. It’s not that they aren’t interesting but simply I didn’t have time/energy. To show a clear view of what researchers are working on, I tried to spot keywords and categorize. There are 13 keywords (some of them have only one paper), for each keyword, I listed related papers from the conference. And at the end of this post, I listed papers with short summaries so you can decide which papers you want to read.

Keywords

I here list 13 keywords with their short description, and related papers. Ordered by the number of related papers.

New dataset/task (7 papers)
Large datasets for DL models, new one for specific purpose, one to address critical problem.
Evaluation (5 papers)
ROUGE or BERTScore aren’t good enouth yet, we need better ways.
Factual consistency (4 papers)
Large pretrained language models tend to generate texts which factually inconsistent with input texts. We need ways to evaluate/mitigate this problem.
Grpah Neural Networks (4 papers)
Graph neural networks work well to obtain better representation from input document.
Multi Document Summarization (4 paper)
Given multiple input documents, provide a short summary covers their important points.
Controlled Generation (4 papers)
Techniques to guide generating summaries to contain user-provided words/phrases.
Long Input Document (4 papers)
Input texts in summarization are long, we need better/efficient ways to obtain representation.
Low-resource / Data Augmentation (3 papers)
We are always lack of data, here is ways to tackle.
Analysis (3 papers)
We still don’t know much about how deep learning models behave.
Multilinguality (3 papers)
Most of existing datasets are in English, we need a way to extend to other languages.
Reinforcement Learning (RL) (2 papers)
By using RL, you can add control directions how models learn to summarize.
- Using Question Answering Rewards to Improve Abstractive Summarization
- Rewards with Negative Examples for Reinforced Topic-Focused Abstractive Summarization
Copying Mechanism (1 paper)
Humans put phrases from input document in summaries, immitate this by models.
- Learn to Copy from the Copying History: Correlational Copy Network for Abstractive Summarization
Multimodality (1 paper)
Visual features can provide auxiliary information to models.
- Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
others
I couldn’t think of a good way to categorize them yet cool works.

Big Problems

Having this overview makes me noticed two big issues in summarization. 1) evaluation, 2) factual consistency, and these two are closely related. Since we don’t know how to evaluate summaries properly, state-of-the-art models still generate summaries which contain information not alighned with the input document (hullucination). This problem prevents us to build cool summarization applications. Most of evaluation papers in the list focus to provide ways to assess this issue.

Zhang+ propose a hybrid (machine and human) way for evaluation. Zeng+ shows adversarial samples can help evaluation for factual consistency. While these work show approaches to evaluate/provide factual consistency, it seems there is still a big gap for real-life application.

Conclusion

In this post, I tried to categorize almost all papers from EMNLP 2021 about summarization by some keywords to grasp the overview of the summarization world. After the categorization, I listed two big problems we need to solve to bring the technologies into applications. I hope this post help students who just started to study summarization like me to find their own interests.

Appendix

Personal 3 favorite papers

Finding a Balanced Degree of Automation for Summary Evaluation
- tldr: Automated Pyramid evaluation frameworks in three difference levels.
- why favorite?: Low-cost but high-quality evaluation method provided by combination of humans and machines.
Decision-Focused Summarization
- tldr: Dataset extracted from Yelp to build a summarization model that helps decision making.
- why favorite?: New task with clear purpose to help people by unique idea.
Does Pretraining for Summarization Require Knowledge Transfer?
- tldr: T5 pre-trained on nonsense corpus still performs well.
- why favorite?: We still don’t know much about DL models…

List of Summarizes

Google Docs contains a list of papers above with short summaries.

« haiku - super short Japanese poetry Papers on Guided Text Generation. »