TLDRs

EACL 2023 (at Dubrovnik, Croatia)

PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic Search
- Thang Pham, Seunghyun Yoon, Trung Bui, Anh Nguyen
- TLDR: We present a dataset of noun phrases accompanied by their contextual Wikipedia pages and a suite of three tasks for training and evaluating phrase embeddings.
Enhancing Dialogue Summarization with Topic-Aware Global- and Local- Level Centrality
- Xinnian Liang, Shuangzhi Wu, Chenhao Cui, Jiaqi Bai, Chao Bian, Zhoujun Li
- TLDR: We propose a novel topic-aware Global-Local Centrality model to help select the salient context from all sub-topics in dialogue stream.
Exploiting Summarization Data to Help Text Simplification
- Renliang Sun, Zhixian Yang, Xiaojun Wan
- TLDR: We propose a method to extract sentence pairs from summarization data to help simplify text.
Shironaam: Bengali News Headline Generation using Auxiliary Information
- Abu Ubaida Akash, Mir Tafseer Nayeem, Faisal Tareque Shohan, Tanvir Islam
- TLDR: We present a novel and cost-effective system for Bengali news headline generation using auxiliary data such as image captions, topic words, and category information.
PCC: Paraphrasing with Bottom-k Sampling and Cyclic Learning for Curriculum Data Augmentation
- Hongyuan Lu, Wai Lam
- TLDR: We propose a curriculum-aware paraphrase generation module based on bottom-k sampling and cyclic learning for curriculum data augmentation.
A Two-Sided Discussion of Preregistration of NLP Research
- Anders Søgaard, Daniel Hershcovich, Miryam De Lhoneux
- TLDR: Proposed preregistration for NLP research.
WinoDict: Probing language models for in-context word acquisition
- Julian Eisenschlos, Jeremy Cole, Fangyu Liu, William Cohen
- TLDR: We introduce a new in-context learning paradigm to measure Large Language Models’ (LLMs) ability to learn novel words during inference.
Sentiment as an Ordinal Latent Variable
- Niklas Stoehr, Ryan Cotterell, Aaron Schein
- TLDR: We propose a Bayesian generative model that learns a composite sentiment dictionary as an interpolation between six existing dictionaries with different scales.
Nationality Bias in Text Generation
- Pranav Narayanan Venkit, Sanjana Gautam, Ruchi Panchanadikar, Ting-hao Huang, Shomir Wilson
- TLDR: We explore how a text generation model, GPT-2, accentuates pre-existing societal biases about country-based demonyms.
Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation
- Zoey Liu, Justin Spence, Emily Prud’hommeaux
- TLDR: We show that the widely used hold-speakers-out approach to ASR data partitioning can yield results that do not reflect model performance on unseen data or speakers.
Shortcomings of Question Answering Based Factuality Frameworks for Error Localization
- Ryo Kamoi, Tanya Goyal, Greg Durrett
- TLDR: We show that QA-based frameworks fail to correctly identify error spans in generated summaries and are outperformed by trivial exact match baselines.
Socratic Question Generation: A Novel Dataset, Models, and Evaluation
- Beng Heng Ang, Sujatha Das Gollapalli, See-kiong Ng
- TLDR: We present SocratiQ, the first large dataset of 110K (question, context) pairs for enabling studies on Socratic Question Generation (SoQG).
Do we need Label Regularization to Fine-tune Pre-trained Language Models?
- Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-omri, Peng Lu, Pascal Poupart, Ali Ghodsi
- TLDR: We show that KD and other label regularization techniques do not play any meaningful role over regular fine-tuning when the student model is pre-trained.
COVID-VTS: Fact Extraction and Verification on Short Video Platforms
- Fuxiao Liu, Yaser Yacoob, Abhinav Shrivastava
- TLDR: We propose a new benchmark for fact-checking multi-modal information involving short-duration videos with COVID19- focused information from both the real world and machine generation.
Multimodal Graph Transformer for Multimodal Question Answering
- Xuehai He, Xin Wang
- TLDR: We propose a novel Multimodal Graph Transformer for question answering tasks that requires performing reasoning across multiple modalities.
Retrieval Enhanced Data Augmentation for Question Answering on Privacy Policies
- Md Rizwan Parvez, Jianfeng Chi, Wasi Uddin Ahmad, Yuan Tian, Kai-wei Chang
- TLDR: We develop a data augmentation framework based on ensembling retriever models that captures the relevant text segments from unlabeled policy documents and expand the positive examples in the training set.
FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric
- Maximillian Chen, Caitlyn Chen, Xiao Yu, Zhou Yu
- TLDR: We present FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar constituency parse trees between a pair of documents based on tree kernels.
Friend-training: Learning from Models of Different but Related Tasks
- Mian Zhang, Lifeng Jin, Linfeng Song, Haitao Mi, Xiabing Zhou, Dong Yu
- TLDR: We propose friend-training, a cross-task self-training framework, where models trained to do different tasks are used in an iterative training, pseudo-labeling, and retraining process to help each other for better selection of pseudo-labels.
Understanding Transformer Memorization Recall Through Idioms
- Adi Haviv, Ido Cohen, Jacob Gidron, Roei Schuster, Yoav Goldberg, Mor Geva
- TLDR: We propose a new methodology for probing and characterizing recall of memorized sequences in transformer LMs.
A Discerning Several Thousand Judgments: GPT-3 Rates the Article + Adjective + Numeral + Noun Construction
- Kyle Mahowald
- TLDR: We present a new method for asking GPT-3 to give acceptability judgments on rare, idiosyncratic constructions in English syntax.
Triple-Hybrid Energy-based Model Makes Better Calibrated Natural Language Understanding Models
- Haotian Xu, Yingying Zhang
- TLDR: We propose a triple-hybrid EBM which combines the benefits of classifier, conditional generative model and marginal generative models to train a language model.
A weakly supervised textual entailment approach to zero-shot text classification
- Marc Pàmies, Joan Llop, Francesco Multari, Nicolau Duran-silva, César Parra-rojas, Aitor Gonzalez-agirre, Francesco Alessandro Massucci, Marta Villegas
- TLDR: We propose a novel zero-shot text classification model that learns on a weakly supervised dataset generated from traditional classification data.
Fair Enough: Standardizing Evaluation and Model Selection for Fairness Research in NLP
- Xudong Han, Timothy Baldwin, Trevor Cohn
- TLDR: We propose a new approach to fair learning, which aims to clarify the current situation and plot a course for meaningful progress in fair learning.
CHARD: Clinical Health-Aware Reasoning Across Dimensions for Text Generation Models
- Steven Y. Feng, Vivek Khetan, Bogdan Sacaleanu, Anatole Gershman, Eduard Hovy
- TLDR: We present a novel text generation model for health-aware reasoning across multiple clinical dimensions and present a dataset of explanations about 52 health-related conditions across three clinical dimensions.
Prompt Tuning with Contradictory Intentions for Sarcasm Recognition
- Yiyi Liu, Ruqing Zhang, Yixing Fan, Jiafeng Guo, Xueqi Cheng
- TLDR: We propose SarcPrompt, a new approach to incorporate the prior knowledge about contradictory intentions into prompt tuning for sarcasm recognition.
COMBO: A Complete Benchmark for Open KG Canonicalization
- Chengyue Jiang, Yong Jiang, Weiqi Wu, Yuting Zheng, Pengjun Xie, Kewei Tu
- TLDR: We present COMBO, a Complete Benchmark for Open Knowledge Graph canonicalization.
UScore: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation
- Jonas Belouadi, Steffen Eger
- TLDR: We develop fully unsupervised evaluation metrics for machine translation that are effective on 4 out of 5 evaluation datasets.
Assistive Recipe Editing through Critiquing
- Diego Antognini, Shuyang Li, Boi Faltings, Julian Mcauley
- TLDR: We present a novel hierarchical denoising auto-encoder that edits recipes given ingredient-level critiques.
DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer
- Shanu Kumar, Soujanya Abbaraju, Sandipan Dandapat, Sunayana Sitaram, Monojit Choudhury
- TLDR: We propose a novel approach for improving zero-shot cross-lingual transfer by reducing the feature incongruity between the source and the target language and increasing the generalization capabilities of pre-trained multilingual transformers.
“John is 50 years old, can his son be 65?” Evaluating NLP Models’ Understanding of Feasibility
- Himanshu Gupta, Neeraj Varshney, Swaroop Mishra, Kuntal Kumar Pal, Saurabh Arjun Sawant, Kevin Scaria, Siddharth Goyal, Chitta Baral
- TLDR: We present a question-answering dataset that tests understanding of feasibility, a commonsense ability in language models.
Efficient Encoders for Streaming Sequence Tagging
- Ayush Kaushal, Aditya Gupta, Shyam Upadhyay, Manaal Faruqui
- TLDR: We present a Hybrid Encoder with Adaptive Restart for streaming sequence tagging that improves streaming performance by up to 71% and outperforms unidirectional encoders for streaming predictions by upto +10% streaming exact match.
Retrieve-and-Fill for Scenario-based Task-Oriented Semantic Parsing
- Akshat Shrivastava, Shrey Desai, Anchit Gupta, Ali Elkahky, Aleksandr Livshits, Alexander Zotov, Ahmed Aly
- TLDR: We present scenario-based semantic parsing, a novel approach to task-oriented parsing which uses a scenario-driven architecture to solve the task of parsing utterances.
Document Flattening: Beyond Concatenating Context for Document-Level Neural Machine Translation
- Minghao Wu, George Foster, Lizhen Qu, Gholamreza Haffari
- TLDR: We propose a novel Document Flattening technique for document-level neural machine translation that captures long-range information from distant context.
Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation
- Jinhui Ye, Wenxiang Jiao, Xing Wang, Zhaopeng Tu
- TLDR: We propose a Prompt based domain text Generation (PGen) approach to produce large-scale in-domain spoken language text data.
Realistic Conversational Question Answering with Answer Selection based on Calibrated Confidence and Uncertainty Measurement
- Soyeong Jeong, Jinheon Baek, Sung Ju Hwang, Jong Park
- TLDR: Filter out inaccurate answers in the conversation history based on their estimated confidences and uncertainties from the ConvQA model, without making any architectural changes.
PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically
- Sedrick Scott Keh, Steven Y. Feng, Varun Gangal, Malihe Alikhani, Eduard Hovy
- TLDR: We propose PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically.
A User-Centered, Interactive, Human-in-the-Loop Topic Modelling System
- Zheng Fang, Lama Alqazlan, Du Liu, Yulan He, Rob Procter
- TLDR: We present a novel, interactive human-in-the-loop topic modelling system with a user-friendly interface that enables users compare and record every step they take, and a novel topic words suggestion feature to help users provide feedback that is faithful to the ground truth.
A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing
- Sophie Henning, William Beluch, Alexander Fraser, Annemarie Friedrich
- TLDR: We present various methods for class imbalance in deep-learning based NLP tasks and show how to address it.
Extracting or Guessing? Improving Faithfulness of Event Temporal Relation Extraction
- Haoyu Wang, Hongming Zhang, Yuqian Deng, Jacob Gardner, Dan Roth, Muhao Chen
- TLDR: We propose to improve the faithfulness of TempRel extraction models from two perspectives by providing proper uncertainty estimation and abstain from extraction when no relation is described in the text.
LoFT: Enhancing Faithfulness and Diversity for Table-to-Text Generation via Logic Form Control
- Yilun Zhao, Zhenting Qi, Linyong Nan, Lorenzo Jaime Flores, Dragomir Radev
- TLDR: We propose a novel model for logical table-to-to text generation that addresses unfaithfulness and lack of diversity issues simultaneously.
PromptDA: Label-guided Data Augmentation for Prompt-based Few Shot Learners
- Canyu Chen, Kai Shu
- TLDR: We propose a novel label-guided data augmentation framework PromptDA, which exploits the enriched label semantic information for prompt-based few-shot tuning on PLMs.
Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection
- Daniel Deutsch, Dan Roth
- TLDR: We propose a method for incorporating question-answering (QA) signals into a summarization model that generates high-quality summaries by automatically generating wh-questions that are answered by the NPs and automatically determining whether those questions are answered in the gold summaries.
Patient Outcome and Zero-shot Diagnosis Prediction with Hypernetwork-guided Multitask Learning
- Shaoxiong Ji, Pekka Marttinen
- TLDR: We propose a hypernetwork-based approach that generates task-conditioned parameters and coefficients of multitask prediction heads to learn task-specific prediction and balance the multitask learning.
A Kind Introduction to Lexical and Grammatical Aspect, with a Survey of Computational Approaches
- Annemarie Friedrich, Nianwen Xue, Alexis Palmer
- TLDR: We describe the concepts of stativity, telicity, habituality, habitual, perfective and imperfective, as well as influential inventories of eventuality and situation types.
Incorporating Context into Subword Vocabularies
- Shaked Yehezkel, Yuval Pinter
- TLDR: We present SaGe, a tokenizer that tailors subwords for their downstream use by baking in the contextualized signal at the vocabulary creation phase.
LoRaLay: A Multilingual and Multimodal Dataset for Long Range and Layout-Aware Summarization
- Laura Nguyen, Thomas Scialom, Benjamin Piwowarski, Jacopo Staiano
- TLDR: We present a collection of datasets for long-range summarization with accompanying visual/layout information and propose four novel datasets for summarization in French, Spanish, Portuguese, and Korean languages.
ViHOS: Hate Speech Spans Detection for Vietnamese
- Phu Gia Hoang, Canh Luu, Khanh Tran, Kiet Nguyen, Ngan Nguyen
- TLDR: We present the first human-annotated corpus containing 26k spans on 11k Vietnamese comments and provide definitions of hateful and offensive spans in Vietnamese comments as well as detailed annotation guidelines.
Vote’n’Rank: Revision of Benchmarking with Social Choice Theory
- Mark Rofin, Vladislav Mikhailov, Mikhail Florinsky, Andrey Kravchenko, Tatiana Shavrina, Elena Tutubalina, Daniel Karabekyan, Ekaterina Artemova
- TLDR: We propose Vote’n’Rank, a framework for ranking systems in multi-task benchmarks under the principles of the social choice theory.
Combining Parameter-efficient Modules for Task-level Generalisation
- Edoardo Maria Ponti, Alessandro Sordoni, Yoshua Bengio, Siva Reddy
- TLDR: We propose a modular latent-skill model for reinforcement learning and few-shot fine-tuning of language models.
Self-imitation Learning for Action Generation in Text-based Games
- Zijing Shi, Yunqiu Xu, Meng Fang, Ling Chen
- TLDR: We propose a confidence-based self-imitation model for reinforcement learning in text-based games.
Investigating the Effect of Relative Positional Embeddings on AMR-to-Text Generation with Structural Adapters
- Sebastien Montella, Alexis Nasr, Johannes Heinecke, Frederic Bechet, Lina M. Rojas Barahona
- TLDR: We investigate the influence of Relative Position Embeddings (RPE) on AMR-to-Text generation and propose StructAdapt, a structure-aware adapter which injects the input graph connectivity within PLMs using Graph Neural Networks.
On the Intersection of Context-Free and Regular Languages
- Clemente Pasti, Andreas Opedal, Tiago Pimentel, Tim Vieira, Jason Eisner, Ryan Cotterell
- TLDR: We give a generalization of the Bar-Hillel construction that generalizes the original construction to finite-state automata with ε-arcs.
Social Influence Dialogue Systems: A Survey of Datasets and Models For Social Influence Tasks
- Kushal Chawla, Weiyan Shi, Jingwen Zhang, Gale Lucas, Zhou Yu, Jonathan Gratch
- TLDR: We formally define and introduce the category of social influence dialogue systems that influence users’ cognitive and emotional responses, leading to changes in thoughts, opinions, and behaviors through natural conversations.
Aggregating Crowdsourced and Automatic Judgments to Scale Up a Corpus of Anaphoric Reference for Fiction and Wikipedia Texts
- Juntao Yu, Silviu Paun, Maris Camilleri, Paloma Garcia, Jon Chamberlain, Udo Kruschwitz, Massimo Poesio
- TLDR: We present a new corpus for anaphoric reference and coreference that is comparable in size to the largest existing corpora for anaptoric reference.
What Makes Sentences Semantically Related? A Textual Relatedness Dataset and Empirical Study
- Mohamed Abdalla, Krishnapriya Vishnubhotla, Saif Mohammad
- TLDR: We present a dataset for semantic relatedness of English sentence pairs and show that human intuition regarding relatedness is highly reliable.
RevUp: Revise and Update Information Bottleneck for Event Representation
- Mehdi Rezaee, Francis Ferraro
- TLDR: We propose a semi-supervised information bottleneck-based discrete latent variable model that learns to capture optional side information that is not already captured by the observed data.
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
- Genta Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung
- TLDR: We develop the first-ever parallel resource for 10 low-resource languages in Indonesia.
The Functional Relevance of Probed Information: A Case Study
- Michael Hanna, Roberto Zamparelli, David Mareček
- TLDR: We show that transformer models like BERT only use the subject plurality information encoded in its representations of the subject and words that agree with it in number.
Do Pretrained Contextual Language Models Distinguish between Hebrew Homograph Analyses?
- Avi Shmidman, Cheyn Shmidman, Dan Bareket, Moshe Koppel, Reut Tsarfaty
- TLDR: We present a new set of homograph disambiguation and analysis embeddings for Hebrew that are more effective than existing models for homograph segmentation and morphosyntactic features.
Parameter-Efficient Tuning with Special Token Adaptation
- Xiaocong Yang, James Y. Huang, Wenxuan Zhou, Muhao Chen
- TLDR: Parameter-efficient tuning of Transformer-based language models with special tokens.
Probing Power by Prompting: Harnessing Pre-trained Language Models for Power Connotation Framing
- Shima Khanehzar, Trevor Cohn, Gosia Mikolajczak, Lea Frermann
- TLDR: We propose a probing framework for power connotation in pre-trained language models, which can help to understand subtle bias in the media.
Zero and Few-Shot Localization of Task-Oriented Dialogue Agents with a Distilled Representation
- Mehrad Moradshahi, Sina Semnani, Monica Lam
- TLDR: We propose automatic methods that use ToD training data in a source language to build a high-quality functioning dialogue agent in another target language that has no training data (i.e. zero-shot) or a small training set (i..e. few-shot).
Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues
- Mehrad Moradshahi, Victoria Tsai, Giovanni Campagna, Monica Lam
- TLDR: We present a new contextual semantic parsing model for multilingual dialogue datasets, which encodes the formal slots and values, and only the last agent and user utterances, and a new semantic parsing algorithm for dialogue datasets.
Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers
- Minsoo Kim, Kyuhong Shim, Seongmin Park, Wonyong Sung, Jungwook Choi
- TLDR: We propose a proactive knowledge distillation method called Teacher Intervention for fast converging QAT of ultra-low precision pre-trained Transformers.
Generative Replay Inspired by Hippocampal Memory Indexing for Continual Language Learning
- Aru Maekawa, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu Okumura
- TLDR: We propose the hippocampal memory indexing to enhance the generative replay by controlling sample generation using compressed features of previous training samples.
A Survey of Multi-task Learning in Natural Language Processing: Regarding Task Relatedness and Training Methods
- Zhihan Zhang, Wenhao Yu, Mengxia Yu, Zhichun Guo, Meng Jiang
- TLDR: We present recent advances of multi-task learning methods in NLP, with the aim of summarizing them into two general multi- task training methods based on their task relatedness: (i) joint training and (ii) multi-step training.
Conclusion-based Counter-Argument Generation
- Milad Alshomary, Henning Wachsmuth
- TLDR: We propose a new approach to generate counter-arguments that explicitly model the argument’s conclusion and ensure that the stance of the generated counter is opposite to that conclusion.
Question-Answer Sentence Graph for Joint Modeling Answer Selection
- Roshni Iyer, Thuy Vu, Alessandro Moschitti, Yizhou Sun
- TLDR: Graph-based approach for Answer Sentence Selection.
Evaluating and Improving the Coreference Capabilities of Machine Translation Models
- Asaf Yehudai, Arie Cattan, Omri Abend, Gabriel Stanovsky
- TLDR: We develop an evaluation methodology that derives coreference clusters from MT output and evaluate them without requiring annotations in the target language.
Document-Level Planning for Text Simplification
- Liam Cripwell, Joël Legrand, Claire Gardent
- TLDR: We propose a novel approach to document-level sentence-level simplification that uses a sequence of labels to describe the sentence structure of the input document and use this information to guide generation of a document-specific simplification plan.
Efficient Hybrid Generation Framework for Aspect-Based Sentiment Analysis
- Haoran Lv, Junyi Liu, Henan Wang, Yaoming Wang, Jixiang Luo, Yaxiao Liu
- TLDR: We propose a novel framework for efficient neural neural neural sentiment analysis based on efficient hybrid generation and bipartite matching.
What’s New? Summarizing Contributions in Scientific Literature
- Hiroaki Hayashi, Wojciech Kryscinski, Bryan Mccann, Nazneen Rajani, Caiming Xiong
- TLDR: We propose a new task for disentangled paper summarization that uses S2ORC corpus of academic articles to generate separate summaries for the paper contributions and the context of the work, making it easier to identify the key findings shared in articles.
Find Parent then Label Children: A Two-stage Taxonomy Completion Method with Pre-trained Language Model
- Fei Xia, Yixuan Weng, Shizhu He, Kang Liu, Jun Zhao
- TLDR: We propose a two-stage method for taxonomy completion and extension that uses pre-trained language models for hypernym/hyponymy recognition and hypernymn/hyponomenal recognition.
Meta Self-Refinement for Robust Learning with Weak Supervision
- Dawei Zhu, Xiaoyu Shen, Michael Hedderich, Dietrich Klakow
- TLDR: Meta Self-Refinement is a novel noise-resistant learning framework for deep neural networks that can effectively combat label noise from weak supervision.
Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation
- Nuno M. Guerreiro, Elena Voita, André Martins
- TLDR: We show that for preventive settings, (i) previously used methods are largely inadequate, (ii) sequence log-probability works best and performs on par with reference-based methods, (iii) sequence uncertainty-based detectors are better than uncertainty-only methods, and (iv) a simple method for alleviating hallucinations at test time that significantly reduces the hallucinatory rate.
Investigating UD Treebanks via Dataset Difficulty Measures
- Artur Kulmizev, Joakim Nivre
- TLDR: We analyze a large subset of treebanks annotated with Universal Dependencies using three recently proposed accuracy-free dataset analysis methods: dataset cartography, ${mathcal{V}$-information, ${Mathcal{D}$ information, and minimum description length.
On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex
- Terry Yue Zhuo, Zhuang Li, Yujin Huang, Fatemeh Shiri, Weiqing Wang, Gholamreza Haffari, Yuan-fang Li
- TLDR: We present the first empirical study on the adversarial robustness of a prompt-based semantic parser based on CODEX, a stateof-the-art language model trained on code.
Leveraging Task Dependency and Contrastive Learning for Case Outcome Classification on European Court of Human Rights Cases
- Santosh T.y.s.s, Marcel Perez San Blas, Phillip Kemper, Matthias Grabmair
- TLDR: We present a novel approach to case outcome classification on European Court of Human Rights cases where our model first learns to identify the convention articles allegedly violated by the state from case facts descriptions, and subsequently uses that information to classify whether the court finds a violation of those articles.
Semi-supervised Relation Extraction via Data Augmentation and Consistency-training
- Komal Teru
- TLDR: We propose a novel data augmentation and consistency training algorithm for Relation extraction task.
Event Temporal Relation Extraction with Bayesian Translational Model
- Xingwei Tan, Gabriele Pergola, Yulan He
- TLDR: We propose Bayesian-Trans, a Bayesian learning-based method for temporal relation extraction that uses latent variables to encode uncertainty about the predictions.
Persona Expansion with Commonsense Knowledge for Diverse and Consistent Response Generation
- Donghyun Kim, Youbin Ahn, Wongyu Kim, Chanhee Lee, Kyungchan Lee, Kyong-ho Lee, Jeonguk Kim, Donghoon Shin, Yeonsoo Lee
- TLDR: We propose a consistent persona expansion framework that improves not only the diversity but also the consistency of persona-based responses.
UnifEE: Unified Evidence Extraction for Fact Verification
- Nan Hu, Zirui Wu, Yuxuan Lai, Chen Zhang, Yansong Feng
- TLDR: We propose a unified evidence graph for verifying Wikipedia dumps and show that it can make better decisions about which evidence should be kept.
MiniALBERT: Model Distillation via Parameter-Efficient Recursive Transformers
- Mohammadmahdi Nouriborji, Omid Rohanian, Samaneh Kouchaki, David A. Clifton
- TLDR: MiniALBERT is a novel approach for converting the knowledge of fully parameterised language models into a compact recursive student.
Multilingual Normalization of Temporal Expressions with Masked Language Models
- Lukas Lange, Jannik Strötgen, Heike Adel, Dietrich Klakow
- TLDR: We propose a novel neural method for normalizing temporal expressions based on masked language modeling.
K-hop neighbourhood regularization for few-shot learning on graphs: A case study of text classification
- Niels Van Der Heijden, Ekaterina Shutova, Helen Yannakoudakis
- TLDR: We present FewShotTextGCN, a novel method designed to effectively utilize the properties of word-document graphs for improved learning in low-resource settings.
What Clued the AI Doctor In? On the Influence of Data Source and Quality for Transformer-Based Medical Self-Disclosure Detection
- Mina Valizadeh, Xing Qian, Pardis Ranjbar-noiey, Cornelia Caragea, Natalie Parde
- TLDR: We present a three-pronged investigation of medical self-disclosure and show that the existing state of the art for this task is significantly outperformed by a new dataset and data augmentation technique.
Improving Visual-Semantic Embedding with Adaptive Pooling and Optimization Objective
- Zijian Zhang, Chang Shu, Ya Xiao, Yuan Shen, Di Zhu, Youxin Chen, Jing Xiao, Jey Han Lau, Qian Zhang, Zheng Lu
- TLDR: We propose a new adaptive pooling strategy for visual-semantic embedding that learns how to aggregate features through a combination of simple pooling methods.
Policy-based Reinforcement Learning for Generalisation in Interactive Text-based Environments
- Edan Toledo, Jan Buys, Jonathan Shock
- TLDR: We show that by replacing commonly used value-based update methods with REINFORCE with baseline, a far more general agent is produced.
Logic Against Bias: Textual Entailment Mitigates Stereotypical Sentence Reasoning
- Hongyin Luo, James Glass
- TLDR: We show that the explicit logic learning with textual entailment can significantly reduce bias and improve the recognition of social communities, without an explicit de-biasing process.
Entity Tracking via Effective Use of Multi-Task Learning Model and Mention-guided Decoding
- Janvijay Singh, Fan Bai, Zhen Wang
- TLDR: We propose MeeT, a Multi-task learning-enabled entity Tracking approach, which utilizes knowledge gained from general domain tasks to improve entity tracking.
Conversational Tree Search: A New Hybrid Dialog Task
- Dirk Väth, Lindsey Vanderlyn, Ngoc Thang Vu
- TLDR: We present a new task that bridges the gap between FAQ-style information retrieval and task-oriented dialogs and show that it improves goal completion while skipping unnecessary questions.
A Human Subject Study of Named Entity Recognition in Conversational Music Recommendation Queries
- Elena Epure, Romain Hennequin
- TLDR: We conducted a human subject study of named entity recognition on a noisy corpus of conversational music recommendation queries, with many irregular and novel named entities.
Entity Disambiguation with Entity Definitions
- Luigi Procopio, Simone Conia, Edoardo Barba, Roberto Navigli
- TLDR: We present a novel approach to extracting expressive textual representations for Entity Disambiguation that can improve generalization over unseen patterns.
Exploring Paracrawl for Document-level Neural Machine Translation
- Yusser Al Ghussin, Jingyi Zhang, Josef Van Genabith
- TLDR: We use Paracrawl parallel webpages as parallel documents for document-level neural machine translation and show that document-word NMT models trained with Paracrawling data can help context-aware pronoun translation.
Poor Man’s Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference
- Vilém Zouhar, Shehzaad Dhuliawala, Wangchunshu Zhou, Nico Daheim, Tom Kocmi, Yuchen Eleanor Jiang, Mrinmaya Sachan
- TLDR: We propose metric estimation, a new metric estimation task for machine translation quality estimation, which can predict automated metric scores without access to the reference.
Integrating Translation Memories into Non-Autoregressive Machine Translation
- Jitao Xu, Josep Crego, François Yvon
- TLDR: We propose a new variant of the Levenshtein Transformer that is well suited to translation with a Translation Memory.
Shorten the Long Tail for Rare Entity and Event Extraction
- Pengfei Yu, Heng Ji
- TLDR: We propose a new transformation module for the long-tailed learning problem that transforms infrequent candidate mention representation during evaluation with the average mention representation in the training dataset.
Do Deep Neural Networks Capture Compositionality in Arithmetic Reasoning?
- Keito Kudo, Yoichi Aoki, Tatsuki Kuribayashi, Ana Brassard, Masashi Yoshikawa, Keisuke Sakaguchi, Kentaro Inui
- TLDR: We propose a skill tree on compositionality in arithmetic symbolic reasoning that defines the hierarchical levels of complexity along with three compositionality dimensions: systematicity, productivity, and substitutivity.
BLM-AgrF: A New French Benchmark to Investigate Generalization of Agreement in Neural Networks
- Aixiu An, Chunyang Jiang, Maria A. Rodriguez, Vivi Nastase, Paola Merlo
- TLDR: We present a new task for learning the underlying rules of subject-verb agreement in sentences, developed in the BLM framework, a new neural network task inspired by visual IQ tests known as Raven’s Progressive Matrices.
Robustification of Multilingual Language Models to Real-world Noise in Crosslingual Zero-shot Settings with Robust Contrastive Pretraining
- Asa Cooper Stickland, Sailik Sengupta, Jason Krone, Saab Mansour, He He
- TLDR: We propose Robust Contrastive Pretraining, a novel approach to improve the robustness of multilingual language models on noisy data.
Unsupervised Anomaly Detection in Multi-Topic Short-Text Corpora
- Mira Ait-saada, Mohamed Nadif
- TLDR: We propose a novel method for detecting deviant data samples in a multi-topic corpus by capturing the underlying semantics of text.
Metaphor Detection with Effective Context Denoising
- Shun Wang, Yucheng Li, Chenghua Lin, Loic Barrault, Frank Guerin
- TLDR: We propose a novel RoBERTa-based model, RoPPT, which introduces a target-oriented parse tree structure in metaphor detection.
Low-Resource Compositional Semantic Parsing with Concept Pretraining
- Subendhu Rongali, Mukund Sridhar, Haidar Khan, Konstantine Arkoudas, Wael Hamza, Andrew Mccallum
- TLDR: We present a new architecture for compositional semantic parsing that learns to adapt to new domains without any new training data.
Made of Steel? Learning Plausible Materials for Components in the Vehicle Repair Domain
- Annerose Eichel, Helena Schlipf, Sabine Schulte Im Walde
- TLDR: We propose a novel approach to learn domain-specific plausible materials for components in the vehicle repair domain by probing Pretrained Language Models (PLMs) in a cloze task style setting to overcome the lack of annotated datasets.
Self-Adapted Utterance Selection for Suicidal Ideation Detection in Lifeline Conversations
- Zhong-ling Wang, Po-hsien Huang, Wen-yau Hsu, Hen-hsen Huang
- TLDR: We present a novel, self-adaptive approach to detecting suicidal ideation in Lifeline conversations by identifying the most critical utterances that the NLP model can more easily distinguish.
Can Pretrained Language Models (Yet) Reason Deductively?
- Zhangdie Yuan, Songbo Hu, Ivan Vulić, Anna Korhonen, Zaiqiao Meng
- TLDR: We show that PLMs are still far from robust deductive reasoning capabilities, even for simple deductive tasks.
Selective In-Context Data Augmentation for Intent Detection using Pointwise V-Information
- Yen Ting Lin, Alexandros Papangelis, Seokhwan Kim, Sungjin Lee, Devamanyu Hazarika, Mahdi Namazifar, Di Jin, Yang Liu, Dilek Hakkani-tur
- TLDR: We propose a novel approach based on PLMs and pointwise V-information to augment training data for intent detection.
Multilingual Representation Distillation with Contrastive Learning
- Weiting Tan, Kevin Heffernan, Holger Schwenk, Philipp Koehn
- TLDR: We use contrastive learning to improve the quality of multilingual sentence representations and use it for quality estimation of parallel sentences.
On the inconsistency of separable losses for structured prediction
- Caio Corro
- TLDR: We prove that separable negative log-likelihood losses for structured prediction are not necessarily Bayes consistent, that is minimizing these losses may not result in a model that predicts the most probable structure in the data distribution for a given input.
A Systematic Search for Compound Semantics in Pretrained BERT Architectures
- Filip Miletic, Sabine Schulte Im Walde
- TLDR: Pretrained BERT is better than static word embeddings for predicting the degree of compositionality of noun compounds associated with human compositionality ratings.
Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages
- Simeng Sun, Maha Elbayad, Anna Sun, James Cross
- TLDR: We present three techniques that help speed up the effective learning of new languages and alleviate catastrophic forgetting despite vocabulary and architecture mismatches.
Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages
- Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-wei Chang
- TLDR: We propose a novel approach for back-translation of programming language translation using source-to-target and source-target models.
The Impacts of Unanswerable Questions on the Robustness of Machine Reading Comprehension Models
- Son Tran, Phong Do, Uyen Le, Matt Kretchmar
- TLDR: We show that training with unanswerable questions in SQuAD 2.0 can help improve the robustness of MRC models against adversarial attacks.
FrameBERT: Conceptual Metaphor Detection with Frame Embedding Learning
- Yucheng Li, Shun Wang, Chenghua Lin, Frank Guerin, Loic Barrault
- TLDR: We propose FrameBERT, a BERT-based model that can explicitly learn and incorporate FrameNet Embeddings for concept-level metaphor detection.
Towards More Efficient Insertion Transformer with Fractional Positional Encoding
- Zhisong Zhang, Yizhe Zhang, Bill Dolan
- TLDR: We propose a novel reusable positional encoding scheme for Insertion Transformers that allows outputting multiple tokens in a single generation step.
SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models
- Haozhe An, Zongxia Li, Jieyu Zhao, Rachel Rudinger
- TLDR: We propose SODAPOPOP, a new approach for automatic social bias discovery in social commonsense question-answering.
Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering
- Wenhu Chen, Pat Verga, Michiel De Jong, John Wieting, William Cohen
- TLDR: We present a new open-domain question-answering system that augments a text-to-text model with a large memory of question-answer pairs, and a new pre-training task for the latent step of question retrieval.
Gold Doesn’t Always Glitter: Spectral Removal of Linear and Nonlinear Guarded Attribute Information
- Shun Shao, Yftah Ziser, Shay B. Cohen
- TLDR: We present a simple and effective method for removing private or guarded information from neural representations.
CTC Alignments Improve Autoregressive Translation
- Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W Black, Shinji Watanabe
- TLDR: We propose a novel approach for translation that can counteract several key weaknesses of pure-attention models during training and decoding.
Modelling Temporal Document Sequences for Clinical ICD Coding
- Boon Liang Clarence Ng, Diogo Santos, Marek Rei
- TLDR: We propose a hierarchical transformer architecture that uses text across the entire sequence of clinical notes in each hospital stay for ICD coding, and incorporates embeddings for text metadata such as their position, time, and type of note.
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
- Kalpesh Krishna, Erin Bransom, Bailey Kuehl, Mohit Iyyer, Pradeep Dasigi, Arman Cohan, Kyle Lo
- TLDR: We present LongEval, a set of guidelines for human evaluation of faithfulness in long-form summaries that addresses the following challenges: (1) How can we achieve high inter-annotator agreement on faithfulness scores? (2) How do we minimize annotator workload while maintaining accurate faithfulness score? and (3) Do humans benefit from automated alignment between summary and source snippets?
Cluster-Guided Label Generation in Extreme Multi-Label Classification
- Taehee Jung, Joo-kyung Kim, Sungjin Lee, Dongyeop Kang
- TLDR: We propose to guide label generation using label cluster information to hierarchically generate lower-level labels.
Empathy Identification Systems are not Accurately Accounting for Context
- Andrew Lee, Jonathan Kummerfeld, Larry An, Rada Mihalcea
- TLDR: We show that current systems are not making meaningful progress on empathetic rationale extraction and show that the response to utterances is not suitable in context.
Enhancing Multi-Document Summarization with Cross-Document Graph-based Information Extraction
- Zixuan Zhang, Heba Elfardy, Markus Dreyer, Kevin Small, Heng Ji, Mohit Bansal
- TLDR: We propose a novel approach to improve multi-document summarization by using structured information extraction graphs and a novel alignment loss to reduce inconsistencies between the input and output.
What happens before and after: Multi-Event Commonsense in Event Coreference Resolution
- Sahithya Ravi, Chris Tanner, Raymond Ng, Vered Shwartz
- TLDR: We propose a model that extends event mentions with temporal commonsense inferences.
Multi-Modal Bias: Introducing a Framework for Stereotypical Bias Assessment beyond Gender and Race in Vision–Language Models
- Sepehr Janghorbani, Gerard De Melo
- TLDR: We provide a new benchmark for evaluating bias in multimodal models and show that these models are biased toward certain groups.
CylE: Cylinder Embeddings for Multi-hop Reasoning over Knowledge Graphs
- Chau Nguyen, Tim French, Wei Liu, Michael Stewart
- TLDR: We propose unbounded cylinder embeddings for logical queries over Knowledge Graphs, which can handle logical negation operations.
Fiction-Writing Mode: An Effective Control for Human-Machine Collaborative Writing
- Wenjie Zhong, Jason Naradowsky, Hiroya Takamura, Ichiro Kobayashi, Yusuke Miyao
- TLDR: We explore the idea of incorporating concepts from writing skills curricula into human-machine collaborative writing scenarios, focusing on adding writing modes as a control for text generation models.
Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding
- Mengnan Du, Subhabrata Mukherjee, Yu Cheng, Milad Shokouhi, Xia Hu, Ahmed Hassan Awadallah
- TLDR: We show that the compressed models are significantly less robust than their PLM counterparts on OOD test sets although they obtain similar performance on in-distribution development sets for a task.
Don’t Blame the Annotator: Bias Already Starts in the Annotation Instructions
- Mihir Parmar, Swaroop Mishra, Mor Geva, Chitta Baral
- TLDR: We show that crowdsourcing can lead to bias in NLU benchmarks, and show that this bias can lead over-representation of similar examples in the collected data.
Performance Prediction via Bayesian Matrix Factorisation for Multilingual Natural Language Processing Tasks
- Viktoria Schram, Daniel Beck, Trevor Cohn
- TLDR: Bayesian matrix factorisation for performance prediction for NLP.
Unified Neural Topic Model via Contrastive Learning and Term Weighting
- Sungwon Han, Mingi Shin, Sungkyu Park, Changwook Jung, Meeyoung Cha
- TLDR: We present a novel topic model that combines the advantages of these two types of topic modeling and improve their performance.
Don’t Mess with Mister-in-Between: Improved Negative Search for Knowledge Graph Completion
- Fan Jiang, Tom Drummond, Trevor Cohn
- TLDR: We propose several novel means of finding more informative negatives, based on searching for candidates with high lexical overlaps, from the dual-encoder model and according to knowledge graph structures.
Semantic Frame Induction with Deep Metric Learning
- Kosuke Yamada, Ryohei Sasano, Koichi Takeda
- TLDR: We propose a model that uses deep metric learning to fine-tune a contextualized embedding model, and we apply the fine-tuned contextualized embeddeddings to perform semantic frame induction.
The Devil is in the Details: On Models and Training Regimes for Few-Shot Intent Classification
- Mohsen Mesgar, Thy Tran, Goran Glavaš, Iryna Gurevych
- TLDR: We present a unified framework to evaluate the key components of Few-Shot Intent Classification (FSIC) and show that episodic meta-learning consistently yields the best performance.
Iterative Document-level Information Extraction via Imitation Learning
- Yunmo Chen, William Gantt, Weiwei Gu, Tongfei Chen, Aaron White, Benjamin Van Durme
- TLDR: Iterative extraction of complex relations from documents using Markov decision processes.
CLICK: Contrastive Learning for Injecting Contextual Knowledge to Conversational Recommender System
- Hyeongjun Yang, Heesoo Won, Youbin Ahn, Kyong-ho Lee
- TLDR: We propose a Contrastive Learning approach for Injecting Contextual Knowledge from Reddit data to the CRS task, which facilitates the capture of a context-level user preference from a dialogue context, regardless of the existence of preferred item-entities.
LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation
- Zhuoyuan Mao, Tetsuji Nakagawa
- TLDR: We propose a novel language-agnostic sentence embedding model that can learn language-wide sentence embeddings with lightweight models.
Synthesizing Human Gaze Feedback for Improved NLP Performance
- Varun Khurana, Yaman Kumar, Nora Hollenstein, Rajesh Kumar, Balaji Krishnamurthy
- TLDR: We propose a novel model for generating human scanpaths over text that approximate meaningful cognitive signals in human gaze patterns.
Memory-efficient Temporal Moment Localization in Long Videos
- Cristian Rodriguez, Edison Marrese-taylor, Basura Fernando, Hiroya Takamura, Qi Wu
- TLDR: We propose a new method for temporal moment localization that can process long videos at a constant memory footprint.
Extracting Victim Counts from Text
- Mian Zhong, Shehzaad Dhuliawala, Niklas Stoehr
- TLDR: We present a novel method for extracting fine-grained counts of injured, displaced, or abused victims from text.
ConEntail: An Entailment-based Framework for Universal Zero and Few Shot Classification with Supervised Contrastive Pretraining
- Haoran Zhang, Aysa Xuemo Fan, Rui Zhang
- TLDR: We propose ConEntail, a new framework for universal zero and few shot classification with supervised contrastive pretraining and universal evaluation.
Guide the Learner: Controlling Product of Experts Debiasing Method Based on Token Attribution Similarities
- Ali Modarressi, Hossein Amirkhani, Mohammad Taher Pilehvar
- TLDR: We propose a novel method for improving out-of-distribution out-distributed inference performance by reducing dataset biases.
Task and Sentiment Adaptation for Appraisal Tagging
- Lin Tian, Xiuzhen Zhang, Myung Hee Kim, Jennifer Biggs
- TLDR: We propose novel task and sentiment adapters based on language models for appraisal tagging and propose novel algorithm for sequence tagging.
DREEAM: Guiding Attention with Evidence for Improving Document-Level Relation Extraction
- Youmi Ma, An Wang, Naoaki Okazaki
- TLDR: We propose DREEAM, a memory-efficient approach that adopts evidence information as the supervisory signal, thereby guiding the attention modules of the DocRE system to assign high weights to evidence.
Span-based Named Entity Recognition by Generating and Compressing Information
- Nhung Nguyen, Makoto Miwa, Sophia Ananiadou
- TLDR: We propose to combine the two types of information bottleneck (IB) principle into one system to enhance Named Entity Recognition (NER).
An In-depth Analysis of Implicit and Subtle Hate Speech Messages
- Nicolas Ocampo, Ekaterina Sviridova, Elena Cabrio, Serena Villata
- TLDR: We provide a systematic analysis of 7 standard benchmarks for hate speech detection and show that while such benchmarks are valid, they fail to detect implicit and subtle content.
MTEB: Massive Text Embedding Benchmark
- Niklas Muennighoff, Nouamane Tazi, Loic Magne, Nils Reimers
- TLDR: We present the first comprehensive text embedding benchmark that measures the state of the art on a large set of text embeddings across a large number of tasks.
Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial Text Attacks
- Piotr Gaiński, Klaudia Bałazy
- TLDR: We propose a novel gradient-based attack against transformer-based language models that searches for an adversarial example in a continuous space of tokens probabilities.
TwiRGCN: Temporally Weighted Graph Convolution for Question Answering over Temporal Knowledge Graphs
- Aditya Sharma, Apoorv Saxena, Chitrank Gupta, Mehran Kazemi, Partha Talukdar, Soumen Chakrabarti
- TLDR: We propose a novel, intuitive and interpretable scheme to modulate the messages passed through a KG edge during convolution based on the relevance of its associated period to the question.
ZELDA: A Comprehensive Benchmark for Supervised Entity Disambiguation
- Marcel Milich, Alan Akbik
- TLDR: We present ZELDA, a novel entity disambiguation benchmark that includes a unified training data set, entity vocabulary, candidate lists, as well as challenging evaluation splits covering 8 different domains.
GLADIS: A General and Large Acronym Disambiguation Benchmark
- Lihu Chen, Gael Varoquaux, Fabian Suchanek
- TLDR: We present a new acronym dictionary and a language model for general acronym disambiguation.
Probing Cross-Lingual Lexical Knowledge from Multilingual Sentence Encoders
- Ivan Vulić, Goran Glavaš, Fangyu Liu, Nigel Collier, Edoardo Maria Ponti, Anna Korhonen
- TLDR: We propose a novel method for exploiting the cross-lingual lexical knowledge stored in multilingual sentence encoders and propose a simple yet efficient method for exposing it.
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples
- Philipp Sadler, David Schlangen
- TLDR: We present Pento-DIARef, a diagnostic dataset in a visual domain of puzzle pieces where referring expressions are generated by a well-known symbolic algorithm (the Incremental Algorithm) and a targeted data generation scheme.
Mitigating Exposure Bias in Grammatical Error Correction with Data Augmentation and Reweighting
- Hannan Cao, Wenmian Yang, Hwee Tou Ng
- TLDR: We propose a novel data augmentation method for sequence-to-sequence GEC, which improves performance on the benchmark GEC datasets.
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
- Wenliang Dai, Zihan Liu, Ziwei Ji, Dan Su, Pascale Fung
- TLDR: We show that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination.
Characterizing the Entities in Harmful Memes: Who is the Hero, the Villain, the Victim?
- Shivam Sharma, Atharva Kulkarni, Tharun Suresh, Himanshi Mathur, Preslav Nakov, Md. Shad Akhtar, Tanmoy Chakraborty
- TLDR: We propose a novel multi-modal multi-mode model for the task of identifying the role of entities in harmful memes.
Systematic Investigation of Strategies Tailored for Low-Resource Settings for Low-Resource Dependency Parsing
- Jivnesh Sandhan, Laxmidhar Behera, Pawan Goyal
- TLDR: We present a new approach for low-resource dependency parsing for Universal Dependency languages.
Compositional Generalisation with Structured Reordering and Fertility Layers
- Matthias Lindemann, Alexander Koller, Ivan Titov
- TLDR: We present a flexible end-to-end differentiable neural model that composes two structural operations: a fertility step, which we introduce in this work, and a reordering step based on previous work (Wang et al., 2021).
Investigating Multi-source Active Learning for Natural Language Inference
- Ard Snijders, Douwe Kiela, Katerina Margatina
- TLDR: We show that uncertainty-based active learning schemes perform poorly due to the acquisition of collective outliers, i.e., hard-to-learn instances that hamper learning and generalisation.
Towards a Unified Multi-Domain Multilingual Named Entity Recognition Model
- Mayank Kulkarni, Daniel Preotiuc-pietro, Karthik Radhakrishnan, Genta Winata, Shijie Wu, Lingjue Xie, Shaohua Yang
- TLDR: We propose a novel setup for NER which includes multi-domain and multilingual training and evaluation across 13 domains and 4 languages.
Do Neural Topic Models Really Need Dropout? Analysis of the Effect of Dropout in Topic Modeling
- Suman Adhya, Avishek Lahiri, Debarshi Kumar Sanyal
- TLDR: We analyze the consequences of dropout in the encoder and decoder of neural topic models and show that it is not only a regularization trick, but also a predictive and quality problem.
A Psycholinguistic Analysis of BERT’s Representations of Compounds
- Lars Buijtelaar, Sandro Pezzelle
- TLDR: We study the semantic representations learned by BERT for compounds, that is, expressions such as sunlight or bodyguard, whose overall meaning depends—to a various extent—on the semantics of the constituent words.
Measuring Normative and Descriptive Biases in Language Models Using Census Data
- Samia Touileb, Lilja Øvrelid, Erik Velldal
- TLDR: We investigate in this paper how distributions of occupations with respect to gender is reflected in pre-trained language models.
UDAPTER - Efficient Domain Adaptation Using Adapters
- Bhavitvya Malik, Abhinav Ramesh Kashyap, Min-yen Kan, Soujanya Poria
- TLDR: We propose two methods to make unsupervised domain adaptation (UDA) more parameter efficient using adapters – small bottleneck layers interspersed with every layer of the large-scale pre-trained language model (PLM).
Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation
- Biao Zhang, Barry Haddow, Rico Sennrich
- TLDR: We propose coarse labeling for CTC regularization, which can compress the label space aggressively to 256 and even further, gaining training efficiency and improving quality without sacrificing accuracy.
Exploring Category Structure with Contextual Language Models and Lexical Semantic Networks
- Joseph Renner, Pascal Denis, Remi Gilleron, Angèle Brunellière
- TLDR: We show that word embeddings can be probed for typicality prediction using different types of probes and show that polysemy improves the performance of BERT-based word embedding methods.
An Empirical Study of Clinical Note Generation from Doctor-Patient Encounters
- Asma Ben Abacha, Wen-wai Yim, Yadan Fan, Thomas Lin
- TLDR: We introduce new resources and empirical investigations for the automatic summarization of doctor-patient conversations in a clinical setting.
Instruction Clarification Requests in Multimodal Collaborative Dialogue Games: Tasks, and an Analysis of the CoDraw Dataset
- Brielen Madureira, David Schlangen
- TLDR: We show that Instruction Clarification Requests in CoDraw are self-motivated and learnable, and provide a dataset for learning to recognise them.
Can Synthetic Text Help Clinical Named Entity Recognition? A Study of Electronic Health Records in French
- Nicolas Hiebel, Olivier Ferret, Karen Fort, Aurélie Névéol
- TLDR: Automatic text generation for clinical NER.
IRMA: the 335-million-word Italian coRpus for studying MisinformAtion
- Fabio Carrella, Alessandro Miani, Stephan Lewandowsky
- TLDR: We present IRMA, a large and robust corpus of untrustworthy news articles in Italian, which can be used to develop algorithms for automatic detection of misinformation.
Parameter-Efficient Korean Character-Level Language Modeling
- Marco Cognetta, Sangwhan Moon, Lawrence Wolf-sonkin, Naoaki Okazaki
- TLDR: We exploit the decomposability of Korean characters to model at the syllable level but using only jamo-level representations.
Opportunities and Challenges in Neural Dialog Tutoring
- Jakub Macina, Nico Daheim, Lingzhi Wang, Tanmay Sinha, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan
- TLDR: We rigorously analyze various generative language models on two dialog tutoring datasets for language learning using automatic and human evaluations to understand the new opportunities brought by these advances as well as the challenges we must overcome to build models that would be usable in real educational settings.
Evaluating the Robustness of Discrete Prompts
- Yoichi Ishibashi, Danushka Bollegala, Katsuhito Sudoh, Satoshi Nakamura
- TLDR: We study the robustness of automatic discrete prompt learning and show that it is highly sensitive to perturbations to NLI inputs and generalizes poorly across different NLI datasets.
Assessing Out-of-Domain Language Model Performance from Few Examples
- Prasann Singhal, Jarad Forristal, Xi Ye, Greg Durrett
- TLDR: We show that attribution-based factors can help rank relative model OOD performance.
Mind the Labels: Describing Relations in Knowledge Graphs With Pretrained Models
- Zdeněk Kasner, Ioannis Konstas, Ondrej Dusek
- TLDR: We show that although PLMs for D2T generation expectedly fail on unclear cases, models trained with a large variety of relation labels are surprisingly robust in verbalizing novel, unseen relations.
Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers
- William Held, Diyi Yang
- TLDR: We show that it is possible to reduce interference in multilingual transformer-based models by identifying and pruning language-specific attention heads.
Why Don’t You Do It Right? Analysing Annotators’ Disagreement in Subjective Tasks
- Marta Sandri, Elisa Leonardelli, Sara Tonelli, Elisabetta Jezek
- TLDR: We propose a taxonomy of possible reasons leading to annotators’ disagreement in subjective tasks and investigate how to classify tweets belonging to different categories of disagreement can be classified as offensive or not.
Analyzing Challenges in Neural Machine Translation for Software Localization
- Sai Koneru, Matthias Huck, Miriam Exel, Jan Niehues
- TLDR: We present a novel multilingual UI corpus collection for neural machine translation and analyze the limitations of state-of-the-art methods on this challenging task.
Bootstrapping Multilingual Semantic Parsers using Large Language Models
- Abhijeet Awasthi, Nitish Gupta, Bidisha Samanta, Shachi Dave, Sunita Sarawagi, Partha Talukdar
- TLDR: We present a new method for translating data using large language models for multilingual semantic parsing and demonstrate its effectiveness and flexibility.
Modeling Complex Event Scenarios via Simple Entity-focused Questions
- Mahnaz Koupaee, Greg Durrett, Nathanael Chambers, Niranjan Balasubramanian
- TLDR: We propose a question-guided generation framework for event sequences that models events in complex scenarios as answers to questions about participants.
Uncovering Implicit Inferences for Improved Relational Argument Mining
- Ameer Saadat-yazdi, Jeff Pan, Nadin Kokciyan
- TLDR: We present a generative neuro-symbolic approach to finding inference chains that connect the argument pairs by making use of the Commonsense Transformer.
How people talk about each other: Modeling Generalized Intergroup Bias and Emotion
- Venkata Subrahmanyan Govindarajan, Katherine Atwell, Barea Sinno, Malihe Alikhani, David Beaver, Junyi Jessy Li
- TLDR: We propose a new way of predicting interpersonal group relationship in NLP using fine-grained interpersonal emotions as an anchor.
Semantic Parsing for Conversational Question Answering over Knowledge Graphs
- Laura Perez-beltrachini, Parag Jain, Emilio Monti, Mirella Lapata
- TLDR: We present two different semantic parsers for conversational semantic parsing and show how to deal with large vocabularies and learn to generalise to new questions at test time.
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting
- Oscar Mañas, Pau Rodriguez Lopez, Saba Ahmadi, Aida Nematzadeh, Yash Goyal, Aishwarya Agrawal
- TLDR: We propose MAPL, a simple and parameter-efficient method that reuses frozen pre-trained unimodal models and leverages their strong generalization capabilities in multimodal vision-language (VL) settings.
ComSearch: Equation Searching with Combinatorial Strategy for Solving Math Word Problems with Weak Supervision
- Qianying Liu, Wenyu Guan, Jianhao Shen, Fei Cheng, Sadao Kurohashi
- TLDR: We propose a novel search algorithm with combinatorial strategy ComSearch, which can compress the search space by excluding mathematically equivalent equations.
Towards preserving word order importance through Forced Invalidation
- Hadeel Al-negheimish, Pranava Madhyastha, Alessandra Russo
- TLDR: We propose a simple approach called Forced Invalidation (FI) to improve the sensitivity of pre-trained language models to word order.
How Many and Which Training Points Would Need to be Removed to Flip this Prediction?
- Jinghan Yang, Sarthak Jain, Byron Wallace
- TLDR: We propose comparatively fast approximation methods to find a set of training data which, if removed, would flip the prediction of a given model prediction.
Reinforced Sequence Training based Subjective Bias Correction
- Karthic Madanagopal, James Caverlee
- TLDR: We propose a novel reinforcement learning approach for robust subjective bias correction that is cross-trained over multiple sources of bias and is used to fine-tune a large pre-trained transformer model to yield state-of-the-art performance in bias text correction task.
Detecting Lexical Borrowings from Dominant Languages in Multilingual Wordlists
- John Miller, Johann-mattis List
- TLDR: We present new methods for lexical borrowing detection in contact situations where dominant languages play an important role, applying two classical sequence comparison methods and one machine learning method to a sample of seven Latin American languages which have all borrowed extensively from Spanish.
Towards Integration of Discriminability and Robustness for Document-Level Relation Extraction
- Jia Guo, Stanley Kok, Lidong Bing
- TLDR: We propose a novel negative label sampling strategy for document-level relation extraction and a novel entropy minimization and contrastive learning algorithm for multi-label and long-tailed learning.
Penguins Don’t Fly: Reasoning about Generics through Instantiations and Exceptions
- Emily Allaway, Jena D. Hwang, Chandra Bhagavatula, Kathleen Mckeown, Doug Downey, Yejin Choi
- TLDR: We present a novel framework for generating generative exemplars that can be used to identify instances of generics that are not universally true.
Adding Instructions during Pretraining: Effective way of Controlling Toxicity in Language Models
- Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro
- TLDR: We propose two novel pretraining data augmentation strategies that significantly reduce model toxicity without compromising its utility.
Multi2Claim: Generating Scientific Claims from Multi-Choice Questions for Scientific Fact-Checking
- Neset Tan, Trung Nguyen, Josh Bensemann, Alex Peng, Qiming Bao, Yang Chen, Mark Gahegan, Michael Witbrock
- TLDR: We propose a pipeline for automatically converting multiple-choice questions into fact-checking data for scientific-fact-checking tasks.
On Evaluation of Document Classifiers using RVL-CDIP
- Stefan Larson, Gordon Lim, Kevin Leach
- TLDR: We show that the RVL-CDIP benchmark is not ideal for benchmarking document classifiers, and propose a new document classification benchmark that is more accurate and diverse.
Event Linking: Grounding Event Mentions to Wikipedia
- Xiaodong Yu, Wenpeng Yin, Nitish Gupta, Dan Roth
- TLDR: We propose a new natural language understanding task for Wikipedia articles that aims to link an event mention appearing in an article to the most appropriate Wikipedia page.
SwitchPrompt: Learning Domain-Specific Gated Soft Prompts for Classification in Low-Resource Domains
- Koustava Goswami, Lukas Lange, Jun Araki, Heike Adel
- TLDR: We propose a novel and lightweight prompting methodology for language models trained on general domain datasets that improves the performance of general-domain language models.
Do dialogue representations align with perception? An empirical study
- Sarenne Wallbridge, Peter Bell, Catherine Lai
- TLDR: We quantify the correlation between language models and human language comprehension behaviour and show that the strongest correlation is found in the response selection task.
Methods for Measuring, Updating, and Visualizing Factual Beliefs in Language Models
- Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer
- TLDR: We propose new metrics for evaluating and evaluating model factual beliefs and new methods for updating incorrect beliefs in language models.
Improving Sign Recognition with Phonology
- Lee Kezar, Jesse Thomason, Zed Sehyr
- TLDR: We use insights from research on American Sign Language (ASL) phonology to train models for isolated sign language recognition (ISLR), a step towards automatic sign language understanding.
Parameter-efficient Modularised Bias Mitigation via AdapterFusion
- Deepak Kumar, Oleg Lesota, George Zerveas, Daniel Cohen, Carsten Eickhoff, Markus Schedl, Navid Rekabsaz
- TLDR: We propose a novel approach to develop stand-alone debiasing functionalities separate from the model, which can be integrated into the model on-demand, while keeping the core model untouched.
LingMess: Linguistically Informed Multi Expert Scorers for Coreference Resolution
- Shon Otmazgin, Arie Cattan, Yoav Goldberg
- TLDR: We present LingMess, a linguistically motivated categorization of mention-pairs into 6 types of coreference decisions and learn a dedicated trainable scoring function for each category.
Finding the Law: Enhancing Statutory Article Retrieval via Graph Neural Networks
- Antoine Louis, Gijs Van Dijck, Gerasimos Spanakis
- TLDR: Graph-augmented dense statute retriever model that incorporates structure of legislation via a graph neural network to improve dense retrieval performance.
Behavior Cloned Transformers are Neurosymbolic Reasoners
- Ruoyao Wang, Peter Jansen, Marc-alexandre Cote, Prithviraj Ammanabrolu
- TLDR: We propose a novel method for augmenting interactive agents with information from symbolic modules to improve multi-step reasoning in text games.
Bridging the Gap Between BabelNet and HowNet: Unsupervised Sense Alignment and Sememe Prediction
- Xiang Zhang, Ning Shi, Bradley Hauer, Grzegorz Kondrak
- TLDR: We propose to use sense alignment via a novel unsupervised and explainable method to connect BabelNet with HowNet.
The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents
- Xing Han Lu, Siva Reddy, Harm De Vries
- TLDR: We present a new dataset for dialogue retrieval and response generation tasks, which can be directly used to help knowledge workers find relevant tables for live chat users.
Question Generation Using Sequence-to-Sequence Model with Semantic Role Labels
- Alireza Naeiji, Aijun An, Heidar Davoudi, Marjan Delpisheh, Muath Alzghool
- TLDR: We propose a novel question generation method that combines the benefits of rule-based and neural sequence-to-sequence (Seq2Seq) models.
StyLEx: Explaining Style Using Human Lexical Annotations
- Shirley Anugrah Hayati, Kyumin Park, Dheeraj Rajagopal, Lyle Ungar, Dongyeop Kang
- TLDR: StyLEx is a model that learns from human annotated explanations of stylistic features and jointly learns to perform the task and predict these features as model explanations.
Comparing Intrinsic Gender Bias Evaluation Measures without using Human Annotated Examples
- Masahiro Kaneko, Danushka Bollegala, Naoaki Okazaki
- TLDR: We propose a method to compare intrinsic gender bias evaluation measures without relying on human-annotated examples.
Faithfulness-Aware Decoding Strategies for Abstractive Summarization
- David Wan, Mengwen Liu, Kathleen Mckeown, Markus Dreyer, Mohit Bansal
- TLDR: We present a systematic study of the effect of generation techniques such as beam search and nucleus sampling on faithfulness in abstractive summarization.
Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views
- Katerina Margatina, Shuai Wang, Yogarshi Vyas, Neha Anna John, Yassine Benajiba, Miguel Ballesteros
- TLDR: We provide a holistic framework for evaluating the effect of temporal concept drift in NLP models.
Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow
- Anjana Arunkumar, Swaroop Mishra, Bhavdeep Singh Sachdeva, Chitta Baral, Chris Bryan
- TLDR: We propose VAIDA, a novel benchmark creation paradigm for NLP, that focuses on guiding crowdworkers, an under-explored facet of addressing benchmark idiosyncrasies.
COMPS: Conceptual Minimal Pair Sentences for testing Robust Property Knowledge and its Inheritance in Pre-trained Language Models
- Kanishka Misra, Julia Rayz, Allyson Ettinger
- TLDR: We present a collection of minimal pair sentences that jointly tests pre-trained language models (PLMs) on their ability to attribute properties to concepts and their ability demonstrate property inheritance behavior.
Probabilistic Robustness for Data Filtering
- Yu Yu, Abdul Khan, Shahram Khadivi, Jia Xu
- TLDR: We introduce probabilistic robustness rewarded data optimization (PRoDO) approach as a framework to enhance the model’s generalization power by selecting training data that optimizes our probabilistically robustness metrics.
Unsupervised Improvement of Factual Knowledge in Language Models
- Nafis Sadeq, Byungkyu Kang, Prarit Lamba, Julian Mcauley
- TLDR: We propose a novel approach for influencing language model pretraining in a way that can improve language model performance on a variety of knowledge-intensive tasks.
Learning to Ignore Adversarial Attacks
- Yiming Zhang, Yangqiaoyu Zhou, Samuel Carton, Chenhao Tan
- TLDR: We introduce rationale models that can explicitly learn to ignore attack tokens and show that this approach leads to robustness improvements over baseline models in robustness on three datasets for both BERT and RoBERTa.
Should You Mask 15% in Masked Language Modeling?
- Alexander Wettig, Tianyu Gao, Zexuan Zhong, Danqi Chen
- TLDR: We show that masking 40% of tokens is not universally optimal for language models, and that masked language models should adopt a higher masking rate.
How do Words Contribute to Sentence Semantics? Revisiting Sentence Embeddings with a Perturbation Method
- Wenlin Yao, Lifeng Jin, Hongming Zhang, Xiaoman Pan, Kaiqiang Song, Dian Yu, Dong Yu, Jianshu Chen
- TLDR: We propose a new evaluation metric for unsupervised sentence embedding models and a new perturbation method for un-supervised semantic analysis.
AutoTriggER: Label-Efficient and Robust Named Entity Recognition with Auxiliary Trigger Extraction
- Dong-ho Lee, Ravi Kiran Selvam, Sheikh Muhammad Sarwar, Bill Yuchen Lin, Fred Morstatter, Jay Pujara, Elizabeth Boschee, James Allan, Xiang Ren
- TLDR: We present a novel two-stage framework for named entity recognition that uses post-hoc explanations to generate rationales and strengthens a model’s prior knowledge.
Incorporating Task-Specific Concept Knowledge into Script Learning
- Chenkai Sun, Tie Xu, Chengxiang Zhai, Heng Ji
- TLDR: Goal-Oriented Script Completion with Concept Prompting and Contrastive Learning.
DeepMaven: Deep Question Answering on Long-Distance Movie/TV Show Videos with Multimedia Knowledge Extraction and Synthesis
- Yi Fung, Han Wang, Tong Wang, Ali Kebarighotbi, Mohit Bansal, Heng Ji, Prem Natarajan
- TLDR: We present a novel framework for deep movie/TV question answering that provides a new benchmark for long-distance movie QA and a new dataset for long video content understanding.
Salient Span Masking for Temporal Understanding
- Jeremy Cole, Aditi Chaudhary, Bhuwan Dhingra, Partha Talukdar
- TLDR: We present a new intermediate training task for temporal tasks that improves the downstream performance on three temporal tasks by an avg. +5.8 points.
PECO: Examining Single Sentence Label Leakage in Natural Language Inference Datasets through Progressive Evaluation of Cluster Outliers
- Michael Saxon, Xinyi Wang, Wenda Xu, William Yang Wang
- TLDR: We show that single sentence label label leakage persists in modern datasets, despite efforts to reduce it.
Weakly-Supervised Questions for Zero-Shot Relation Extraction
- Saeed Najafi, Alona Fyshe
- TLDR: We propose a novel algorithm for relation extraction that uses question-based QA to generate tail entities for unseen relations.
DiffQG: Generating Questions to Summarize Factual Changes
- Jeremy Cole, Palak Jain, Julian Eisenschlos, Michael Zhang, Eunsol Choi, Bhuwan Dhingra
- TLDR: We propose a new way to represent factual changes between paired documents as question-answer pairs, where the answer to the same question differs between two versions.
Contextual Dynamic Prompting for Response Generation in Task-oriented Dialog Systems
- Sandesh Swamy, Narges Tabari, Chacha Chen, Rashmi Gangadharaiah
- TLDR: We propose a new approach for contextual dynamic prompting in task-oriented dialog systems that improves response generation and improves the overall combined score of the system.
Why Can’t Discourse Parsing Generalize? A Thorough Investigation of the Impact of Data Diversity
- Yang Janet Liu, Amir Zeldes
- TLDR: We investigate the impact of genre diversity in parsing for high-resource languages such as English and show that state-of-the-art architectures trained on the standard English newswire benchmark do not generalize well, even within the news domain.
Enriching Biomedical Knowledge for Low-resource Language Through Large-scale Translation
- Long Phan, Tai Dang, Hieu Tran, Trieu Trinh, Vy Phan, Lam Chau, Minh-thang Luong
- TLDR: We use a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained and supervised data in the biomedical domains.
Syntax-guided Neural Module Distillation to Probe Compositionality in Sentence Embeddings
- Rohan Pandey
- TLDR: Syntactic NeurAl Module Networks are a strong compositional model of sentence embedding models.
Closed-book Question Generation via Contrastive Learning
- Xiangjue Dong, Jiaying Lu, Jianling Wang, James Caverlee
- TLDR: We propose a new QG model for closed-book question generation that is designed to better understand the semantics of long-form abstractive answers and store more information in its parameters through contrastive learning and an answer reconstruction module.
A Hybrid Detection and Generation Framework with Separate Encoders for Event Extraction
- Ge Shi, Yunyue Su, Yongliang Ma, Ming Zhou
- TLDR: We propose to use independent encoders to model event detection and event argument extraction, respectively, and use the output of event detection to construct the input of event argument extractions.
Path Spuriousness-aware Reinforcement Learning for Multi-Hop Knowledge Graph Reasoning
- Chunyang Jiang, Tianchen Zhu, Haoyi Zhou, Chang Liu, Ting Deng, Chunming Hu, Jianxin Li
- TLDR: We propose a metric to quantitatively estimate to what extent a path is spurious.
Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge
- Kosuke Nishida, Naoki Yoshinaga, Kyosuke Nishida
- TLDR: We propose a novel two-stage model for named entity recognition that learns usages of entities that have not been learned well.
When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization
- Faisal Ladhak, Esin Durmus, Mirac Suzgun, Tianyi Zhang, Dan Jurafsky, Kathleen Mckeown, Tatsunori Hashimoto
- TLDR: We trace the propagation of name-nationality biases in pre-trained language models to downstream summarization tasks and show that these biases manifest themselves as hallucinations in summarization, leading to factually incorrect summaries.
BERT Shows Garden Path Effects
- Tovah Irwin, Kyra Wilson, Alec Marantz
- TLDR: We present a new method for evaluating the semantic roles assigned to arguments of verbs in garden path and control sentences and show that the models are not as accurate as humans on these sentences.
Models Teaching Models: Improving Model Accuracy with Slingshot Learning
- Lachlan O’neill, Nandini Anantharama, Satya Borgohain, Simon Angus
- TLDR: We propose a novel semi-supervised method for learning a classifier that slingshot the performance of a high-quality teacher model to slingshots the performance performance of an intermediate student model in a cost-efficient manner.
A Federated Approach for Hate Speech Detection
- Jay Gala, Deep Gandhi, Jash Mehta, Zeerak Talat
- TLDR: We show that federated machine learning can help address privacy the concerns that are inherent to hate speech detection while obtaining up to 6.81% improvement in terms of F1-score.
Learning the Legibility of Visual Text Perturbations
- Dev Seth, Rickard Stureborg, Danish Pruthi, Bhuwan Dhingra
- TLDR: We propose a new dataset for quantifying the legibility of visually perturbed text and show that it is more effective than existing adversarial attacks in reducing model performance.
DyLoRA: Parameter-Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
- Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, Ali Ghodsi
- TLDR: We present a dynamic low-rank adaptation method for dynamic search-free models that can train dynamic search free models faster than the current low-ranking adaptation method.
Conversational Emotion-Cause Pair Extraction with Guided Mixture of Experts
- Dongjin Jeong, Jinyeong Bak
- TLDR: We propose a Pair-Relationship Guided Mixture-of-Experts model for emotion-cause pair extraction task, which learns relationship between utterances and advises a gating network to incorporate dialogue features in the evaluation.
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
- Sachin Kumar, Vidhisha Balachandran, Lucille Njoo, Antonios Anastasopoulos, Yulia Tsvetkov
- TLDR: We present a survey of practical methods for addressing potential threats and societal harms from language generation models.
TraVLR: Now You See It, Now You Don’t! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning
- Keng Ji Chow, Samson Tan, Min-yen Kan
- TLDR: We present a synthetic dataset for evaluating visio-linguistic representation learning and show that state-of-the-art models are only capable of cross-modal transfer and limited generalisation.
Paraphrase Acquisition from Image Captions
- Marcel Gohsen, Matthias Hagen, Martin Potthast, Benno Stein
- TLDR: We propose to use image captions from the Web as a previously underutilized resource for paraphrases (i.e., texts with the same “message”) and to create and analyze a corresponding dataset.
Generation-Based Data Augmentation for Offensive Language Detection: Is It Worth It?
- Camilla Casula, Sara Tonelli
- TLDR: We analyze the robustness of models trained on generated data in a variety of data augmentation setups and the potential injection of biases when using generated data to classify offensive language.
Quantifying Context Mixing in Transformers
- Hosein Mohebbi, Willem Zuidema, Grzegorz Chrupała, Afra Alishahi
- TLDR: We propose Value Zeroing, a novel context mixing score customized for Transformers that provides us with a deeper understanding of how information is mixed at each encoder layer.
KGVL-BART: Knowledge Graph Augmented Visual Language BART for Radiology Report Generation
- Kaveri Kale, Pushpak Bhattacharyya, Milind Gune, Aditya Shetty, Rustom Lawyer
- TLDR: We propose a knowledge graph Augmented Vision Language BART (KGVL-BART) model that takes as input two chest X-ray images- one frontal and the other lateral- along with tags which are diagnostic keywords, and outputs a report with the patient-specific findings.
A simple but effective model for attachment in discourse parsing with multi-task learning for relation labeling
- Zineb Bennis, Julie Hunter, Nicholas Asher
- TLDR: We present a discourse parsing model for conversation trained on the STAC that outperforms the state of the art in relation type prediction and attachment prediction.
How Far Can It Go? On Intrinsic Gender Bias Mitigation for Text Classification
- Ewoenam Tokpo, Pieter Delobelle, Bettina Berendt, Toon Calders
- TLDR: We show that intrinsic gender bias mitigation strategies in contextualized language models are able to hide it in such a way that significant gender information is retained in the embeddings.
Multimodal Event Transformer for Image-guided Story Ending Generation
- Yucheng Zhou, Guodong Long
- TLDR: We propose a multimodal event transformer for image-guided story ending generation and a multimode injector for event-based reasoning.
Improving Cross-modal Alignment for Text-Guided Image Inpainting
- Yucheng Zhou, Guodong Long
- TLDR: We propose a novel model for text-guided image inpainting by improving cross-modal alignment (CMA) and use adversarial training to enhance the model to fill the missing region in complicated structures effectively.
Semantic Specialization for Knowledge-based Word Sense Disambiguation
- Sakae Mizuki, Naoaki Okazaki
- TLDR: We propose a semantic specialization for knowledge-based Word Sense Disambiguation task based on the similarity of contextualized embeddings computed by a pre-trained language model.
Concept-based Persona Expansion for Improving Diversity of Persona-Grounded Dialogue
- Donghyun Kim, Youbin Ahn, Chanhee Lee, Wongyu Kim, Kyong-ho Lee, Donghoon Shin, Yeonsoo Lee
- TLDR: We propose a novel persona expansion framework for improving the quality of responses in diversity and richness.
RPTCS: A Reinforced Persona-aware Topic-guiding Conversational System
- Zishan Ahmad, Kshitij Mishra, Asif Ekbal, Pushpak Bhattacharyya
- TLDR: We propose a novel conversational dataset creation mechanism in which allows the conversational agent to drift to a set of target concepts depending on the persona of the speaker and the context of the conversation.
What Did You Learn To Hate? A Topic-Oriented Analysis of Generalization in Hate Speech Detection
- Tom Bourgeade, Patricia Chiril, Farah Benamara, Véronique Moriceau
- TLDR: We propose a novel, simple yet effective approach to study generalization across popular hate speech datasets.
End-to-end Case-Based Reasoning for Commonsense Knowledge Base Completion
- Zonglin Yang, Xinya Du, Erik Cambria, Claire Cardie
- TLDR: We present a novel approach to knowledge-intensive commonsense knowledge base completion tasks by providing retrieved passages that contain relevant knowledge as additional input to the CKBC task.
Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text
- Marwa Gaser, Manuel Mager, Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu
- TLDR: Morphological segmentation for machine translation in code-switched Arabic-English.
Identifying the limits of transformers when performing model-checking with natural language
- Tharindu Madusanka, Riza Batista-navarro, Ian Pratt-hartmann
- TLDR: We investigate the question of how the logical semantics of natural language affects transformers’ performance.
Improving the Generalizability of Collaborative Dialogue Analysis With Multi-Feature Embeddings
- Ayesha Enayet, Gita Sukthankar
- TLDR: We propose a multi-feature embedding method for conflict prediction that improves the generalizability of conflict prediction models trained on dialogue sequences.
MetaQA: Combining Expert Agents for Multi-Skill Question Answering
- Haritz Puerto, Gözde Şahin, Iryna Gurevych
- TLDR: We propose a novel architecture for combining expert agents with a novel, flexible, and training-efficient architecture that considers questions, answer predictions, and answer-prediction confidence scores to select the best answer among a list of answer predictions.
BERT Is Not The Count: Learning to Match Mathematical Statements with Proofs
- Weixian Li, Yftah Ziser, Maximin Coavoux, Shay B. Cohen
- TLDR: We present a new mathematical article matching task consisting in matching a proof to a given mathematical statement.
Lessons Learned from a Citizen Science Project for Natural Language Processing
- Jan-christoph Klie, Ji-ung Lee, Kevin Stowe, Gözde Şahin, Nafise Sadat Moosavi, Luke Bates, Dominic Petrak, Richard Eckart De Castilho, Iryna Gurevych
- TLDR: We present a new approach to crowdsourcing for NLP that uses Citizen Science to generate high-quality annotations and at- tract motivated volunteers.
Contrastive Learning with Keyword-based Data Augmentation for Code Search and Code Question Answering
- Shinwoo Park, Youngwook Kim, Yo-sub Han
- TLDR: We propose KeyDAC, a novel data augmentation approach for semantic code search and code question answering.
Large Scale Multi-Lingual Multi-Modal Summarization Dataset
- Yash Verma, Anubhav Jangra, Raghvendra Verma, Sriparna Saha
- TLDR: We present the largest multi-lingual multi-modal summarization dataset for 20 languages and present a multi-language multi-document-image summarization task utilizing it.
External Knowledge Acquisition for End-to-End Document-Oriented Dialog Systems
- Tuan Lai, Giuseppe Castellucci, Saar Kuzi, Heng Ji, Oleg Rokhlenko
- TLDR: We present EKo-Doc, an architecture for document-oriented conversations with access to external knowledge that improves response generation in document-orientated conversations.
In-Depth Look at Word Filling Societal Bias Measures
- Matúš Pikuliak, Ivana Beňová, Viktor Bachratý
- TLDR: We analyze the validity of two such measures and propose a new gender bias dataset for Slovak.
Retrieval-augmented Image Captioning
- Rita Ramos, Desmond Elliott, Bruno Martins
- TLDR: We present a new approach to image captioning that generates sentences given the input image and a set of captions retrieved from a datastore, as opposed to the image alone.
Automatic Evaluation and Analysis of Idioms in Neural Machine Translation
- Christos Baziotis, Prashant Mathur, Eva Hasler
- TLDR: We propose a novel metric for quantifying the frequency of literal translation errors in neural machine translation and explore the role of idiom context in the translation of idiomatic expressions.
Representation biases in sentence transformers
- Dmitry Nikolaev, Sebastian Padó
- TLDR: We show that sentence embeddings generated by SOTA sentence transformers are strongly influenced by the overlap of the set of noun participants in the input.
AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature
- Melissa Roemmele, Kyle Shaffer, Katrina Olsen, Yiyi Wang, Steve Deneefe
- TLDR: We present a new dataset for abridging English literature and describe the linguistic relations of these relations.
Self-training Reduces Flicker in Retranslation-based Simultaneous Translation
- Sukanta Sen, Rico Sennrich, Biao Zhang, Barry Haddow
- TLDR: We show that self-training improves the flicker-latency tradeoff in simultaneous translation by controlling monotonicity.
Social Commonsense for Explanation and Cultural Bias Discovery
- Lisa Bauer, Hanna Tischer, Mohit Bansal
- TLDR: We identify influential social commonsense knowledge in data and show its influence on model behavior and show how to identify and remove mislabeled examples.
Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous Pronouns
- Zhongbin Xie, Vid Kocijan, Thomas Lukasiewicz, Oana-maria Camburu
- TLDR: We propose a novel method to measure gender bias in language models and propose a new dataset to measure it.
The NLP Task Effectiveness of Long-Range Transformers
- Guanghui Qin, Yukun Feng, Benjamin Van Durme
- TLDR: We study the long-range attention in Transformer models and show that it is not only beneficial on content selection and query-guided decoding, but also leads to problems in approximation error and poor attention to distant tokens.
Creation and evaluation of timelines for longitudinal user posts
- Anthony Hills, Adam Tsakalidis, Federico Nanni, Ioannis Zachos, Maria Liakata
- TLDR: We propose a set of methods for segmenting longitudinal user posts into timelines likely to contain interesting moments of change in a user’s behaviour, based on their online posting activity.
Semi-supervised New Event Type Induction and Description via Contrastive Loss-Enforced Batch Attention
- Carl Edwards, Heng Ji
- TLDR: We present a novel approach to semi-supervised new event type induction using a masked contrastive loss, which learns similarities between event mentions by enforcing an attention mechanism over the data minibatch.
Multilingual Content Moderation: A Case Study on Reddit
- Meng Ye, Karan Sikka, Katherine Atwell, Sabit Hassan, Ajay Divakaran, Malihe Alikhani
- TLDR: We present a multilingual dataset for content moderation and propose a new approach to tackle the challenges of content moderation.
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models
- Archiki Prasad, Peter Hase, Xiang Zhou, Mohit Bansal
- TLDR: We present a new algorithm for improving task instructions for large language models in a zero-shot setting by improving the quality of instructions provided in prompts.
DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence
- Wei Zhao, Michael Strube, Steffen Eger
- TLDR: We introduce DiscoScore, a parametrized discourse metric, which uses BERT to model discourse coherence from different perspectives, driven by Centering theory.
Know your audience: specializing grounded language models with listener subtraction
- Aaditya Singh, David Ding, Andrew Saxe, Felix Hill, Andrew Lampinen
- TLDR: We show that training a speaker with two listeners that perceive differently, using our method, allows the speaker to adapt to the idiosyncracies of the listeners.
Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models
- Abteen Ebrahimi, Arya D. McCarthy, Arturo Oncevay, John Ortega, Luis Chiruzzo, Gustavo Giménez-lugo, Rolando Coto-solano, Katharina Kann
- TLDR: We present state-of-the-art word alignment methods for unseen languages, and evaluate their performance in terms of model adaptation and extrinsic evaluation.

« EMNLP 2022 Findings 2023 »