September 23, 2023: Embarking on Model Editing Evaluation

Research Leadership

This research is spearheaded by Domenic Rosati. Domenic is pivotal in bringing this project to completion and bringing the team up to speed. Collaborating closely with him on this initiative are Yahya(Me), Melis, and Deepika.

Research Objective

The primary aim of this project is to establish a framework for assessing paragraph-length generations from edited large language models. As the capabilities of large language models grow, so does the importance of understanding their behavior, especially when edited.

Collaborative Effort

This research is a collaborative initiative involving:

Relevant Resources:

Weekly Meeting Time

Our team has set a weekly meeting every Wednesday from 10am to 11am AST to discuss progress, issues, and next steps.

Initial Steps

Our first task was to design a 10-passage survey aimed at discerning various properties and values related to model editing. The design specifics and outcomes of this survey will be detailed in the next update.

Resources & Readings

As we dive into this project, several readings and resources have been suggested by Domenic to lay a basic understanding of the subject matter:

October 7, 2023: Week 1 Milestones and Tasks

This Week’s Milestone: Introduction

Goals

Slides

To-Do for Next Week

Pick Motivation Paper to Look At

We have identified several papers that can serve as motivation for our work:

All of Us Read

October 14, 2023: Week 2 Milestones and Tasks

Last Week Recap

Questions and Feedback

Goals

Slides

To-Do for Next Week

Annotation Project

Annotation Session and Feedback

Session Overview

Issues and Feedback

Action Items for Next Annotation Session

Annotation Instructions

Task Description

In this task, you will read pairs of sentences and label them as consistent, inconsistent, or neutral.

Guidelines

Note

The focus is not on the factual accuracy of the sentences but on their consistency or inconsistency with each other.

October 21, 2023: Week 3 Milestones and Tasks

Last Week Recap

This Week’s Milestone: Methods for Data Collection and Annotation Guidelines

Goals

Slides

Updated Annotation Guidelines

Changes

To-Do for Next Week

Round Table and Annotations Discussion

October 28, 2023: Week 4 Milestones and Tasks

Last Week Recap

This Week’s Milestone: Achieve Higher Agreement

Goals

Slides

Round Table and Disagreement Discussion

To-Do for Next Week

Pretest and Annotation Strategies

Handling Missing Context

Other Tasks

Glossary and Context

Inter-Rater Reliability (IRR)

Krippendorff’s Alpha

The Need for Higher Agreement

Overview of the Project’s Technical Progress

In the final stages of our annotation accuracy project, significant technical contributions were made to enhance the reliability of our data analysis methods. A pivotal aspect of this project involved the training and fine-tuning of the DeBERTa V3 model, which played a crucial role in improving our understanding of annotator agreement and classification accuracy.

Deep Dive into DeBERTa Model Training

Utilizing the DeBERTa (Decoding-enhanced BERT with Disentangled Attention) model, I refined our annotation model. The model was trained on a meticulously prepared dataset where class proportions were balanced based on prior analysis:

This distribution ensured that our model could effectively learn from a balanced set of examples, reducing bias and improving the generalizability of the model.

Model Performance and Enhancements

Post-training, the DeBERTa model exhibited a substantial increase in inter-rater reliability, achieving an Inter-Rater Reliability (IRR) score well above our initial benchmarks. This was a significant improvement from earlier models and set a new standard for our project’s annotation accuracy.

Publication and Recognition

The results of this research and model training were submitted to a NAACL 2024 conference. I am pleased to report that our paper was accepted for publication, which marks a significant achievement for the team and underscores the quality and importance of our work.

The published paper can be accessed here: Long-form evaluation of model editing