Introduction
I started my Ph.D. two month ago, with some literature study and discussion, I decided to focus on “Text Summarization”. While its phrase, “Text Summarization”, sounds simple and straightforward, I had no idea what researchers are actually working on. So I decided to (skim-)read (almost) all papers in fresh EMNLP 2021 about summarization, and tried to understand its overview.
At the end, I read 42 papers from main (long/short), finding (long/short), workshop about summarization. However, I didn’t read ones about “Dialogue/Conversation summarization” and some language specific papers. It’s not that they aren’t interesting but simply I didn’t have time/energy. To show a clear view of what researchers are working on, I tried to spot keywords and categorize. There are 13 keywords (some of them have only one paper), for each keyword, I listed related papers from the conference. And at the end of this post, I listed papers with short summaries so you can decide which papers you want to read.
Keywords
I here list 13 keywords with their short description, and related papers. Ordered by the number of related papers.
- New dataset/task (7 papers)
Large datasets for DL models, new one for specific purpose, one to address critical problem.- Decision-Focused Summarization
- MSˆ2: Multi-Document Summarization of Medical Studies
- MassiveSumm: a very large-scale, very multilingual, news summarisation dataset
- MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization
- TLDR9+: A Large Scale Resource for Extreme Summarization of Social Media Posts
- SUBSUME: A Dataset for Subjective Summary Extraction from Wikipedia Documents
- A Novel Wikipedia based Dataset for Monolingual and Cross-Lingual Summarization
- Evaluation (5 papers)
ROUGE or BERTScore aren’t good enouth yet, we need better ways.- Finding a Balanced Degree of Automation for Summary Evaluation
- QuestEval: Summarization Asks for Fact-based Evaluation
- Fine-grained Factual Consistency Assessment for Abstractive Summarization Models
- Gradient-based Adversarial Factual Consistency Evaluation for Abstractive Summarization
- Are Factuality Checkers Reliable? Adversarial Meta-evaluation of Factuality in Summarization
- Factual consistency (4 papers)
Large pretrained language models tend to generate texts which factually inconsistent with input texts. We need ways to evaluate/mitigate this problem.- CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization
- Fine-grained Factual Consistency Assessment for Abstractive Summarization Models
- Gradient-based Adversarial Factual Consistency Evaluation for Abstractive Summarization
- MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization
- Grpah Neural Networks (4 papers)
Graph neural networks work well to obtain better representation from input document.- SgSum:Transforming Multi-document Summarization into Sub-graph Selection
- Multiplex Graph Neural Network for Extractive Text Summarization
- Frame Semantic-Enhanced Sentence Modeling for Sentence-level Extractive Text Summarization
- Considering Nested Tree Structure in Sentence Extractive Summarization with Pre-trained Transformer
- Multi Document Summarization (4 paper)
Given multiple input documents, provide a short summary covers their important points. - Controlled Generation (4 papers)
Techniques to guide generating summaries to contain user-provided words/phrases. - Long Input Document (4 papers)
Input texts in summarization are long, we need better/efficient ways to obtain representation.- Enriching and Controlling Global Semantics for Text Summarization
- Topic-Guided Abstractive Multi-Document Summarization
- HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization
- Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems
- Low-resource / Data Augmentation (3 papers)
We are always lack of data, here is ways to tackle. - Analysis (3 papers)
We still don’t know much about how deep learning models behave. - Multilinguality (3 papers)
Most of existing datasets are in English, we need a way to extend to other languages. - Reinforcement Learning (RL) (2 papers)
By using RL, you can add control directions how models learn to summarize. - Copying Mechanism (1 paper)
Humans put phrases from input document in summaries, immitate this by models. - Multimodality (1 paper)
Visual features can provide auxiliary information to models. - others
I couldn’t think of a good way to categorize them yet cool works.- AUTOSUMM: Automatic Model Creation for Text Summarization
- EASE: Extractive-Abstractive Summarization End-to-End using the Information Bottleneck Principle
- Sentence-level Planning for Especially Abstractive Summarization
- Event Graph based Sentence Fusion
- Leveraging Information Bottleneck for Scientific Document Summarization
Big Problems
Having this overview makes me noticed two big issues in summarization. 1) evaluation, 2) factual consistency, and these two are closely related. Since we don’t know how to evaluate summaries properly, state-of-the-art models still generate summaries which contain information not alighned with the input document (hullucination). This problem prevents us to build cool summarization applications. Most of evaluation papers in the list focus to provide ways to assess this issue.
Zhang+ propose a hybrid (machine and human) way for evaluation. Zeng+ shows adversarial samples can help evaluation for factual consistency. While these work show approaches to evaluate/provide factual consistency, it seems there is still a big gap for real-life application.
Conclusion
In this post, I tried to categorize almost all papers from EMNLP 2021 about summarization by some keywords to grasp the overview of the summarization world. After the categorization, I listed two big problems we need to solve to bring the technologies into applications. I hope this post help students who just started to study summarization like me to find their own interests.
Appendix
Personal 3 favorite papers
- Finding a Balanced Degree of Automation for Summary Evaluation
- tldr: Automated Pyramid evaluation frameworks in three difference levels.
- why favorite?: Low-cost but high-quality evaluation method provided by combination of humans and machines.
- Decision-Focused Summarization
- tldr: Dataset extracted from Yelp to build a summarization model that helps decision making.
- why favorite?: New task with clear purpose to help people by unique idea.
- Does Pretraining for Summarization Require Knowledge Transfer?
- tldr: T5 pre-trained on nonsense corpus still performs well.
- why favorite?: We still don’t know much about DL models…
List of Summarizes
Google Docs contains a list of papers above with short summaries.