BERT-Assisted Semantic Annotation Correction for Emotion-Related Questions

I recently got a paper BERT-Assisted Semantic Annotation Correction for Emotion-Related Questions accepted at the ARDUOUS 2022 workshop (Annotation of useR Data for UbiquitOUs Systems) of the IEEE Pervasive Computing (PerCom) conference. It was going to be in Pisa, Italy, but unfortunately due to Covid it is now virtual/online.

I wanted to make a short post to describe the paper and share some resources that can be used to replicate it.

Summary

First, to summarize, the paper is about using machine learning (ML) models, BERT in particular, to assist annotation by correcting incorrect annotations. There’s been a lot of work in automating the annotation task and this is one take on it (other directions in this area are preannotation, using ML to roughly annotate the data first and having humans fix problems, and online, incremental, and active learing to complete either learn while the humans annotate and/or choosing informative examples for humans to annotate).

Motivation

I was inspired to look at this because I realized that the BERT models performed so well that they identified errors in my earlier annotations. When I was a grad student I did annotation by myself and didn’t have the opportunity to check inter-annotator agreement. I had already used funding to collect data on mturk and the performance of my system was good so I skipped the inter-annotator agreement, but it was always something I thought I should do at some point. This current paper didn’t give inter-annotator agreement but it solved the issue of finding annotation errors, which is one of the reasons to find inter-annotator agreement.

Data and Previous Work

The data came from EMO20Q, a dialog system to play an emotion-guessing game (20 questions, but limited to emotions). More info about the EMO20Q project and data can be found at the github repo. The most relevant previous paper is here and here are some other older publications as well as a newer poster by a student, Shanshan Kong.

Code

Thanks to the ease of sharing via github and the convenient GPU notebook platform Colab, it’s possible to easily reproduce the work. If you are interested, please see this notebook for details.