Work in Progress: Automating Anonymous Processing of Peer Evaluation Comments

Abstract

De-identifying qualitative datasets is time-consuming and expensive but is a critical step in protecting the confidentiality of study participants. Peer-to-peer comments are an important supplement to peer evaluation ratings in team-based learning courses. Those comments comprise valuable research data for educational study to investigate but they usually contain identifiable information, such as names. In this work in progress, we study and propose a pipeline tool to identify all names appearing in CATME team peer evaluation comments and replacing those names with pseudonyms such as Rater 1 and Rater 2. We explored several natural language processing techniques empowered by machine learning methods and then optimized to the final algorithm. At its core, the algorithm combines the long short-term memory (LSTM) and conditional random field (CRF) approaches most common in the field of named entity recognition. The current algorithm performs well, with a high recall of 0.8 with reasonable precision scores resulting in 76 of score as we want to put an emphasis on recalls. We also propose this as a tool to make a large amount of data available for research that would otherwise be sensitive due to personally identifiable information.

Publication
2020 ASEE Virtual Annual Conference Content Access