September 23, 2023: Embarking on Model Editing Evaluation

Research Leadership

This research is spearheaded by Domenic Rosati. Domenic is pivotal in bringing this project to completion and bringing the team up to speed. Collaborating closely with him on this initiative are Yahya(Me), Melis, and Deepika.

Research Objective

The primary aim of this project is to establish a framework for assessing paragraph-length generations from edited large language models. As the capabilities of large language models grow, so does the importance of understanding their behavior, especially when edited.

Collaborative Effort

This research is a collaborative initiative involving:

Domenic, Yahya, Deepika, Melis.
Supervisors: Lizbeth Escobedo, Hassan Sajjad, Frank Rudzicz
External Collaborators: Jason (from Counterfact+ dataset), Andrew

Relevant Resources:

Weekly Meeting Time

Our team has set a weekly meeting every Wednesday from 10am to 11am AST to discuss progress, issues, and next steps.

Initial Steps

Our first task was to design a 10-passage survey aimed at discerning various properties and values related to model editing. The design specifics and outcomes of this survey will be detailed in the next update.

Resources & Readings

As we dive into this project, several readings and resources have been suggested by Domenic to lay a basic understanding of the subject matter:

Basics: An understanding of neural networks, transformers, tokens, and more.
Papers: From understanding the risks posed by language models to evaluating the ripple effects of knowledge editing.

October 7, 2023: Week 1 Milestones and Tasks

This Week’s Milestone: Introduction

Goals

Understand the overall project and its objectives.
Familiarize ourselves with the background necessary for writing the introduction section.
Frame an introduction for our paper.
Pick a motivation paper to read for next week.
Read and understand the editing survey we have designed.

Slides

Slides for this week will focus on the above-mentioned goals.

To-Do for Next Week

Pick Motivation Paper to Look At

We have identified several papers that can serve as motivation for our work:

Petroni, F., et al. (2019). “Language models as knowledge bases?”.
Jiang, Z., et al. (2020). “How can we know what language models know?”.
Elazar, Y., et al. (2021). “Measuring and improving consistency in pretrained language models”.
Lin, S., et al. (2021). “Truthfulqa: Measuring how models mimic human falsehoods”.
Gehman, S., et al. (2020). “Realtoxicityprompts: Evaluating neural toxic degeneration in language models”.
Bender, E. M., et al. (2021). “On the dangers of stochastic parrots: Can language models be too big?”.
Weidinger, L., et al. (2022). “Taxonomy of Risks posed by Language Models”.

All of Us Read

Yao, Y., et al. (2023). “Editing Large Language Models: Problems, Methods, and Opportunities”.
Visit the webpage: Rome Lab
Read Appendix J and Counterfactual AI Writing Study in Arxiv Paper

October 14, 2023: Week 2 Milestones and Tasks

Last Week Recap

Questions and Feedback

Addressed questions about model editing.
Discussed pre-test feedback, including issues with paragraph length and clarity of terminologies used for annotation.

Goals

Delve into related works, focusing on model editing and evaluation of model editing.
Deploy the survey designed in Week 1.

Slides

Slides for this week will center on the above goals.

To-Do for Next Week

Annotation Project

Reading (All - Optional):
- Papers on annotation best practices and quality management.
Brainstorm by Freeform Annotating:
- Group 1 (Dom)
- Group 2 (Deepika)
- Group 3 (Yahya)
- Group 4 (Melis)
Other Tasks:
- Develop a draft of Annotation Guidelines (Dom)
- Select a Platform for Annotations (Dom)
- Identify a Dataset for Annotation (Dom)

Annotation Session and Feedback

Session Overview

All team members participated in an annotation session.
We followed the annotation guidelines to label pairs of sentences as consistent, inconsistent, or neutral.

Issues and Feedback

The paragraphs for annotation were found to be too long, making the task cumbersome.
Some terminologies used in the annotation guidelines were confusing.

Action Items for Next Annotation Session

Shorten the paragraphs to be annotated.
Clarify terminologies and guidelines for annotation.

Annotation Instructions

Task Description

In this task, you will read pairs of sentences and label them as consistent, inconsistent, or neutral.

Guidelines

Consistent Sentence Pair: A pair of sentences that agree with each other.
- Example: The Eiffel Tower is in Rome. / Rome is home to the world famous Eiffel Tower.
Inconsistent Sentence Pair: A pair of sentences that contradict each other.
- Example: The Eiffel Tower is in Rome. / Paris is home to the world famous Eiffel Tower.
Neutral Sentence Pair: A pair of sentences that is neither consistent nor inconsistent.
- Example: The Eiffel Tower is in Rome. / Paris is a wonderful city to visit.

Note

The focus is not on the factual accuracy of the sentences but on their consistency or inconsistency with each other.

October 21, 2023: Week 3 Milestones and Tasks

Last Week Recap

Deployed the survey designed in earlier weeks.
Discussed related works.
Conducted a pre-study on annotations.

This Week’s Milestone: Methods for Data Collection and Annotation Guidelines

Goals

Finalize methods for data collection.
Update Annotation Guidelines based on feedback and discussions.
Pilot the new annotation guidelines.

Slides

Slides for this week will encapsulate the above goals.

Updated Annotation Guidelines

Changes

Video Tutorial: A video was made to explain the new guidelines.
Instructions: The task now involves reading a passage of text and labeling a claim about that passage.
Classification: The task has two parts:
1. Classifying the passage as supporting, contradicting, or neutral towards the claim.
2. Highlighting sentences that support or contradict the claim (if applicable).
Examples: New examples were provided to illustrate what supporting, contradicting, and neutral passages look like.

To-Do for Next Week

Show HyperMatrix Annotation Platform.
Send reminders to those who haven’t completed the survey.
Provide feedback on the annotation proposal.
Discuss the analysis plan for the survey.

Round Table and Annotations Discussion

A round table discussion will be conducted to discuss the new annotation guidelines and the pilot study.

October 28, 2023: Week 4 Milestones and Tasks

Last Week Recap

Conducted an annotation pretest.
Krippendorff’s Alpha was measured to be 0.54.

This Week’s Milestone: Achieve Higher Agreement

Goals

Aim for higher agreement among annotators.
Discuss discrepancies and plan for improvement.

Slides

Slides will focus on achieving higher inter-rater reliability and discussing disagreements.

Round Table and Disagreement Discussion

A round table discussion will be held to openly discuss disagreements among annotations.
Annotations will be reviewed on Light Tag.
Annotation guidelines will be revised and can be found here.

To-Do for Next Week

Pretest and Annotation Strategies

Conduct another pretest.
Based on the pretest results:
- If Inter-Rater Reliability (IRR) is less than ( \alpha = 0.7 ), annotations will be split 2 ways.
- If IRR is high, annotations will be split 4 ways.

Handling Missing Context

Missing context in annotations was identified as a challenge.
Two paths for addressing this:
1. Provide the whole paragraph along with the sentence to be measured.
2. Keep it at sentence pairs with a rule: if you don’t have enough information, label it as neutral.

Other Tasks

Open the option for open text feedback.
Complete 300 annotations.
I (Yahya) will look into resources for training DeBERTa V3 on annotations, starting with this course.

Glossary and Context

Inter-Rater Reliability (IRR)

What it is: Inter-Rater Reliability (IRR) is a metric used to measure the degree of agreement among multiple raters or annotators.
What it’s for: In the context of this project, IRR is important for ensuring that the annotations being made by different team members are consistent and reliable. This helps in the validation of the dataset being created.

Krippendorff’s Alpha

What it is: Krippendorff’s Alpha is a statistical measure used to determine the reliability of agreement among multiple coders. It is a more general form of the Cronbach’s alpha and can be applied to any number of coders providing ordinal, interval, or ratio data.
What it’s for: In this project, Krippendorff’s Alpha is used as a specific method to quantify IRR. It helps identify how much of the variation in the data can be attributed to the reliability of the annotators.

The Need for Higher Agreement

Why it’s important: Achieving higher agreement among annotators is crucial for the validity and reliability of the data being generated. It ensures that the interpretations of the text are consistent, thereby making the conclusions drawn from the data more robust.

Overview of the Project’s Technical Progress

In the final stages of our annotation accuracy project, significant technical contributions were made to enhance the reliability of our data analysis methods. A pivotal aspect of this project involved the training and fine-tuning of the DeBERTa V3 model, which played a crucial role in improving our understanding of annotator agreement and classification accuracy.

Deep Dive into DeBERTa Model Training

Utilizing the DeBERTa (Decoding-enhanced BERT with Disentangled Attention) model, I refined our annotation model. The model was trained on a meticulously prepared dataset where class proportions were balanced based on prior analysis:

Supports: 54.5752%
Contradicts: 24.4009%
Neutral: 21.0240%

This distribution ensured that our model could effectively learn from a balanced set of examples, reducing bias and improving the generalizability of the model.

Model Performance and Enhancements

Post-training, the DeBERTa model exhibited a substantial increase in inter-rater reliability, achieving an Inter-Rater Reliability (IRR) score well above our initial benchmarks. This was a significant improvement from earlier models and set a new standard for our project’s annotation accuracy.

Publication and Recognition

The results of this research and model training were submitted to a NAACL 2024 conference. I am pleased to report that our paper was accepted for publication, which marks a significant achievement for the team and underscores the quality and importance of our work.

The published paper can be accessed here: Long-form evaluation of model editing

September 23, 2023: Embarking on Model Editing Evaluation#

Research Leadership#

Research Objective#

Collaborative Effort#

Weekly Meeting Time#

Initial Steps#

Resources & Readings#

October 7, 2023: Week 1 Milestones and Tasks#

This Week’s Milestone: Introduction#

Goals#

Slides#

To-Do for Next Week#

Pick Motivation Paper to Look At#

All of Us Read#

October 14, 2023: Week 2 Milestones and Tasks#

Last Week Recap#

Questions and Feedback#

This Week’s Milestone: Related Works and Survey Deployment#

Goals#

Slides#

To-Do for Next Week#

Annotation Project#

Annotation Session and Feedback#

Session Overview#

Issues and Feedback#

Action Items for Next Annotation Session#

Annotation Instructions#

Task Description#

Guidelines#

Note#

October 21, 2023: Week 3 Milestones and Tasks#

Last Week Recap#

This Week’s Milestone: Methods for Data Collection and Annotation Guidelines#

Goals#

Slides#

Updated Annotation Guidelines#

Changes#

To-Do for Next Week#

Round Table and Annotations Discussion#

October 28, 2023: Week 4 Milestones and Tasks#

Last Week Recap#

This Week’s Milestone: Achieve Higher Agreement#

Goals#

Slides#

Round Table and Disagreement Discussion#

To-Do for Next Week#

Pretest and Annotation Strategies#

Handling Missing Context#

Other Tasks#

Glossary and Context#

Inter-Rater Reliability (IRR)#

Krippendorff’s Alpha#

The Need for Higher Agreement#

Overview of the Project’s Technical Progress#

Deep Dive into DeBERTa Model Training#

Model Performance and Enhancements#

Publication and Recognition#

September 23, 2023: Embarking on Model Editing Evaluation

Research Leadership

Research Objective

Collaborative Effort

Weekly Meeting Time

Initial Steps

Resources & Readings

October 7, 2023: Week 1 Milestones and Tasks

This Week’s Milestone: Introduction

Goals

Slides

To-Do for Next Week

Pick Motivation Paper to Look At

All of Us Read

October 14, 2023: Week 2 Milestones and Tasks

Last Week Recap

Questions and Feedback

This Week’s Milestone: Related Works and Survey Deployment

Goals

Slides

To-Do for Next Week

Annotation Project

Annotation Session and Feedback

Session Overview

Issues and Feedback

Action Items for Next Annotation Session

Annotation Instructions

Task Description

Guidelines

Note

October 21, 2023: Week 3 Milestones and Tasks

Last Week Recap

This Week’s Milestone: Methods for Data Collection and Annotation Guidelines

Goals

Slides

Updated Annotation Guidelines

Changes

To-Do for Next Week

Round Table and Annotations Discussion

October 28, 2023: Week 4 Milestones and Tasks

Last Week Recap

This Week’s Milestone: Achieve Higher Agreement

Goals

Slides

Round Table and Disagreement Discussion

To-Do for Next Week

Pretest and Annotation Strategies

Handling Missing Context

Other Tasks

Glossary and Context

Inter-Rater Reliability (IRR)

Krippendorff’s Alpha

The Need for Higher Agreement

Overview of the Project’s Technical Progress

Deep Dive into DeBERTa Model Training

Model Performance and Enhancements

Publication and Recognition