Workshop on Human Evaluation of NLP Systems
Short Papers
-
Trading Off Diversity and Quality in Natural Language Generation
Hugh Zhang, Daniel Duckworth, Daphne Ippolito and Arvind Neelakantan -
Towards Objectively Evaluating the Quality of Generated Medical Summaries
Francesco Moramarco, Damir Juric, Aleksandar Savkov and Ehud Reiter -
A Preliminary Study on Evaluating Consultation Notes With Post-Editing
Francesco Moramarco, Alex Papadopoulos Korfiatis, Aleksandar Savkov and Ehud Reiter -
The Great Misalignment Problem in Human Evaluation of NLP Methods
Mika Hämäläinen and Khalid Alnajjar -
Eliciting Explicit Knowledge From Domain Experts in Direct Intrinsic Evaluation of Word Embeddings for Specialized Domains
Goya van Boven and Jelke Bloem -
Detecting Post-Edited References and Their Effect on Human Evaluation
Věra Kloudová, Ondřej Bojar and Martin Popel
Long Papers
-
It’s Commonsense, isn’t it? Demystifying Human Evaluations in Commonsense-Enhanced NLG systems
Miruna-Adriana Clinciu, Dimitra Gkatzia and Saad Mahamood -
Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation
Jakob Nyberg, Maike Paetzel and Ramesh Manuvinakurike -
Towards Document-Level Human MT Evaluation: On the Issues of Annotator Agreement, Effort and Misevaluation
Sheila Castilho -
Is This Translation Error Critical?: Classification-Based Human and Automatic Machine Translation Evaluation Focusing on Critical Errors
Katsuhito Sudoh, Kosuke Takahashi and Satoshi Nakamura -
A View From The Crowd: Evaluation Challenges for Time-Offset Interaction Applications
Alberto Chierici and Nizar Habash -
Reliability of Human Evaluation for Text Summarization: Lessons Learned and Challenges Ahead
Neslihan Iskender, Tim Polzehl and Sebastian Möller -
On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs
Roman Grundkiewicz, Marcin Junczys-Dowmunt, Christian Federmann and Tom Kocmi -
A Case Study of Efficacy and Challenges in Practical Human-in-Loop Evaluation of NLP Systems Using Checklist
Shaily Bhatt, Rahul Jain, Sandipan Dandapat and Sunayana Sitaram -
Interrater Disagreement Resolution: A Systematic Procedure to Reach Consensus in Annotation Tasks
Yvette Oortwijn, Thijs Ossenkoppele and Arianna Betti