Annotating Data for Assessing and Defining Natural Language Processing

Friday, March 29 | 11:00 AM - 12:00 PM

Henry A. and Elvira H. Jubel Hall, 121

Denis Peskoff
Computing Innovation Fellow
Princeton University

Natural Language Processing has been built on straightforward tasks that fail to capture the complexity of human language. While models have improved on these artificial benchmarks, our assessments have shown that important shortcomings remain—even in the state of the art. Using experts in ten different domains, we show that large language models achieve surface credibility in answering domain-specific questions but fall short of true expertise. Using human evaluation, we show that the latest in neural topic models are indistinguishable from older probabilistic ones.

To meaningfully push forward the state of the art in NLP, we need challenging tasks which document the limits of current technology—tasks such as the study of dissent and deception. We study these complex ideas through diverse settings such as organizational transcripts and board games. We apply the latest approaches to these tasks—GPT prompting for dissent and fine-tuned LSTMs for detecting deception—revealing that the cutting edge is still imperfect relative to a human baseline. Creating annotations for this kind of task that are both accurate and scalable is difficult; grounding the creation of annotation in domain expertise, where applicable, is a step in the right direction.

Denis Peskoff is a Computing Innovation Fellow at Princeton University working with Professor Brandon Stewart.  He completed his PhD in Computer Science at the University of Maryland with Professor Jordan-Boyd Graber, during which he spent a year at LMU Munich under the DAAD grant.  He completed his Bachelors in Science, Technology, and International Affairs at the Georgetown School of Foreign Service.   His research focuses on using domain experts to assess and define the field of natural language processing.

