Workshop on Human Evaluation of NLP Systems (HumEval)

ACL’22, Dublin, Ireland, 27 May 2022

Second Call for Papers

Website: https://humeval.github.io/

Previous edition (at EACL 2021): https://humeval.github.io/2021/

Workshop overview

The Second Workshop on Human Evaluation of NLP Systems (HumEval’22) invites the submission of long and short papers on substantial, original, and unpublished research on all aspects of human evaluation of NLP systems with a focus on NLP systems which produce language as output. We welcome work on any quality criteria relevant to NLP, on both intrinsic evaluation (which assesses systems and outputs directly) and extrinsic evaluation (which assesses systems and outputs indirectly in terms of its impact on an external task or system), on quantitative as well as qualitative methods, score-based (discrete or continuous scores) as well as annotation-based (marking, highlighting).

Invited speakers

Samira Shaikh, University of North Carolina at Charlotte / Ally
Markus Freitag, Google

Important dates

15 February 2022: Submission deadline for ARR*
28 February 2022: Submission deadline for standard submissions (via the OpenReview Workshop Portal)
22 March 2022: Commitment deadline for ARR submissions with reviews
26 30 March 2022: Notification of acceptance
10 15 April 2022: Camera-ready papers due
27 May 2022: Workshop at ACL

All deadlines are 23:59 UTC-12.

*We suppose that all papers submitted to ARR until 15 February 2022 will be reviewed by 22 March (see ARR timeline), however we cannot guarantee it because it depends on ARR editors and not on us.

Papers

We invite papers on topics including, but not limited to, the following:

  • Experimental design and methods for human evaluations
  • Reproducibility of human evaluations
  • Work on inter-evaluator and intra-evaluator agreement
  • Ethical considerations in human evaluation of computational systems
  • Quality assurance for human evaluation
  • Crowdsourcing for human evaluation
  • Issues in meta-evaluation of automatic metrics by correlation with human evaluations
  • Alternative forms of meta-evaluation and validation of human evaluations
  • Comparability of different human evaluations
  • Methods for assessing the quality and the reliability of human evaluations
  • Role of human evaluation in the context of Responsible and Accountable AI

We welcome work from any subfield of NLP (and ML/AI more generally), with a particular focus on evaluation of systems that produce language as output.

Submission

Long papers

Long papers must describe substantial, original, completed and unpublished work. Wherever appropriate, concrete evaluation and analysis should be included. Long papers may consist of up to eight (8) pages of content, plus unlimited pages of references. Final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers’ comments can be taken into account. Long papers will be presented orally or as posters as determined by the programme committee. Decisions as to which papers will be presented orally and which as posters will be based on the nature rather than the quality of the work. There will be no distinction in the proceedings between long papers presented orally and as posters.

Short papers

Short paper submissions must describe original and unpublished work. Short papers should have a point that can be made in a few pages. Examples of short papers are a focused contribution, a negative result, an opinion piece, an interesting application nugget, a small set of interesting results. Short papers may consist of up to four (4) pages of content, plus unlimited pages of references. Final versions of short papers will be given one additional page of content (up to 5 pages) so that reviewers’ comments can be taken into account. Short papers will be presented orally or as posters as determined by the programme committee. While short papers will be distinguished from long papers in the proceedings, there will be no distinction in the proceedings between short papers presented orally and as posters.

Multiple submission policy

HumEval’22 allows multiple submissions. However, if a submission has already been, or is planned to be, submitted to another event, this must be clearly stated in the submission form.

Submission procedure and templates

Submission is electronic, through the OpenReview portal for the workshop with the deadline on 28 February 2022.

Both long and short papers must be anonymised for double-blind reviewing, must follow the ACL Author Guidelines, and must use the ACL 2022 templates available on the ACL Rolling Review website. The submitting author must have an OpenReview profile. Please ensure profiles are complete before submission. This tutorial from the ACL Rolling Review might be helpful.

In addition to standard submissions, submissions which were previously submitted to ARR and reviewed there can be committed to the workshop (together with the reviews) not later than 22 March 2022. We suppose that all papers submitted to ARR until 15 February 2022 will be reviewed by 22 March, however we cannot guarantee because it depends on ARR editors and not on us. Commitment link will be published soon.

Optional Supplementary Materials: Appendices, Software and Data

ARR encourages the submission of these supplementary materials to improve the reproducibility of results, and to enable authors to provide additional information that does not fit in the paper. Supplementary materials may include appendices, software or data. For example, pre processing decisions, model parameters, feature templates, lengthy proofs or derivations, pseudocode, sample system inputs/outputs, and other details that are necessary for the exact replication of the work described in the paper can be put into appendices. However, if the pseudo-code or derivations or model specifications are an important part of the contribution, or if they are important for the reviewers to assess the technical correctness of the work, they should be a part of the main paper, and not appear in appendices. Reviewers are not required to consider material in appendices. Appendices should come after the references in the submitted pdf, but do not count towards the page limit. Software should be submitted as a single .tgz or .zip archive, and data as a separate single .tgz or .zip archive. Supplementary materials must be fully anonymized to preserve the two-way anonymized reviewing policy.

Organisers

Anya Belz, ADAPT Centre, Dublin City University, Ireland
Maja Popović, ADAPT Centre, Dublin City University, Ireland
Ehud Reiter, University of Aberdeen, UK
Anastasia Shimorina, Orange, Lannion, France

For questions and comments regarding the workshop please contact the organisers at humeval.ws@gmail.com.

Programme committee

Eleftherios Avramidis, DFKI, Germany
Sheila Castilho, ADAPT, Dublin City University, Ireland
Sandipan Dandapat, Microsoft, India
Ondrej Dušek, Charles University, Czechia
Markus Freitag, Google, USA
Albert Gatt, Malta University, Malta
Behnam Hedayatnia, Amazon, USA
David Howcroft, Heriot Watt University, UK
Tom Kocmi, Microsoft, Germany
Filip Klubička, ADAPT, Technological University of Dublin, Ireland
Samuel Läubli, University of Zurich, Switzerland
Chris van der Lee, Tilburg University, Netherlands
Saad Mahamood, Trivago, Germany
Nitika Mathur, University of Melbourne, Australia
Margot Mieskes, UAS Darmstadt, Germany
Emiel van Miltenburg, Tilburg University, Netherlands
Mathias Mueller, University of Zurich, Switzerland
Sergiu Nisioi, University of Bucharest, Romania
Juri Opitz, University of Heidelberg, Germany
Maike Paetzel-Prüsmann, University Potsdam, Germany
Maxime Peyrard, EPFL, Switzerland
Tim Polzehl, TU Berlin, Germany
Martin Popel, UFAL, Czechia
Verena Rieser, Heriot Watt University, UK
Samira Shaikh, UNC, USA
Joel Tetreault, Dataminr, USA
Wei Zhao, TU Darmstadt, Germany