CS Capstone: Toxicity Mitigation with LLMs

Motivation:

The peer review stage is, of course, a vital part of the scientific research process. Authors need to hear feedback from peers and experts in their field in order to improve their papers for publication, and the community as a whole benefits from the oversight and review. Feedback, delivered in a constructive and respectful manner, is highly appreciated and valued. Unfortunately, feedback isn't always constructive. Sometimes reviewers express themselves in ways that aren't conducive to a productive review environment, and can at times be needlessly discouraging, resulting in an unnecessarily negative emotional impact on paper authors.

Our project seeks to avoid and mitigate unnecessary negative impact on paper authors by utilizing the advanced natural language understanding of LLMs to both identify and rephrase "toxic" sentences in paper reviews. As part of our research, we define a sentence-level annotation framework to allow for a more intuitive manual annotation process, semantic analysis, and overall review sentiment prediction. We also define a set of whole-review guidelines, which is informed by the sentence-level classifications, and describes our overall review categorization scheme.

Strategy and Accomplishments:

As part of this project, I've exercised my technical skills in programming, machine learning, Python, and data analysis, as well as my communication, teamwork, and leadership skills. This section is currently unfinished and will be updated soon.

  • Guidelines
  • Annotations, data collection
  • Scripts: Extraction, automation
  • ML: Bart/DeBERTa, kmeans, Simple RNN, Random Forest
  • Data Analysis

Outcome:

Through our research, we've determined that our framework is relatively informative in a "manually-featured" semantic space. We've determined thus far that base model LLM performance varies substantially, and context length is a significant limitation of smaller open source LLMs such as Mistral-7B and Zephyr-3 and -7B. That said, they're well-suited to the task of rephrasing sentences in more constructive and encouraging ways while maintaining a reviewer's critique.

Next Steps:

We have submitted our research to EMNLP 2024. Review pending.