Naomi Baes
Naomi Baes

NLP & Computational Social Science Researcher | Diachronic Semantic Change

Words like trauma, mental illness, and schizophrenia do not stay still. Concept creep describes one way these shifts can happen: harm-related concepts have expanded in scope and softened in severity since the late twentieth century, coming to describe a broader and milder range of experiences than they once did. These changes matter for how people understand, label, and seek help for distress.

I build computational tools to measure this kind of change precisely — and to study semantic change more broadly. My frameworks, SIBling and LSC-Eval, are designed to detect and evaluate specific kinds of semantic change across domains: whether a word is broadening in scope, gaining emotional intensity, or shifting in evaluative tone. My PhD applies these tools to mental health concepts, tracing how terms like schizophrenia have changed their meaning across decades of historical text - and what those shifts reveal about how psychological categories are culturally constructed and contested. The methods are computational, but the questions are substantive.

Most recently, I collaborated with Change is Key! (Riksbankens Jubileumsfond) to develop SenseRel, a benchmark for evaluating how well language models capture meaning relations at the sense level. I am a PhD researcher at the University of Melbourne working at the intersection of natural language processing and social psychology, supported by the Australian Government Research Training Program Scholarship and ARC-funded Concept Creep research funding.

CV
Interests
  • Lexical Semantic Change
  • Mental Health Concepts
  • Concept Creep
  • Computational Social Science
  • NLP Evaluation
Education
  • PhD, Social Psychology & Natural Language Processing

    University of Melbourne (2023-26)

  • Graduate Diploma in Psychology (Advanced) with Honours

    University of Melbourne

Highlights
  • Association for Computational Linguistics (ACL) 2026 (Main Conference): Joint first author on SenseRel — a benchmark testing whether AI language models understand the denotational and connotational layers of meaning between word senses. Developed during my Change is Key! research internship.

  • International Conference on Web and Social Media (ICWSM) 2026: Lead author of a computational study examining how women are dehumanized compared to men in incel discourse.

  • ACL 2025 Findings (Computational Social Science & Cultural Analytics): Lead author of LSC-Eval — a framework for evaluating whether computational methods are sensitive to detecting dimensions of semantic change, using the mental health domain as a case study.

  • ACL 2024 Main (Computational Social Science & Cultural Analytics): Lead author of SIBling — a framework for studying lexical semantic change across multiple dimensions (Sentiment, Intensity, Breadth), with applications to the social sciences.

  • Invited talks: Presented work on computational methods for tracking meaning change in the mental health domain — including SIBling, LSC-Eval, and applied case studies in psychology — at Utrecht University, the University of Gothenburg, and the National Research Council Canada.

  • Awards and research support: Australian Government Research Training Program Scholarship; selected for Change is Key! international research internship (Riksbankens Jubileumsfond); Concept Creep research funding for conference travel and dissemination.

See all updates
Research Overview

I study how word meanings change as concepts move across scientific writing, news media, and general language. My work focuses especially on language with social, cultural, or psychological significance, including mental health concepts and contested social terms. I ask how these terms extend into new contexts, shift in emotional intensity, and acquire more positive or negative associations. To examine these questions, I combine NLP methods with theory from social psychology, computational social science, and corpus linguistics, using historical text corpora, contextual embeddings, lexical resources, large language models, and statistical modeling.

Research streams

My work clusters around three overlapping streams: (1) diachronic lexical semantic change and evaluation, (2) conceptual change in psychology, and (3) computational social science projects on socially contested language.

  • Diachronic lexical semantic change and evaluation
    I develop computational methods and evaluation resources for tracing different kinds of diachronic lexical semantic change in historical text corpora, with applications in psychology, computational social science, linguistics, and NLP method evaluation. Contributions include:

    • SIBling, a multidimensional framework for modeling whether words become broader, more emotionally intense, or acquire more positive or negative connotations over time.
    • LSC-Eval, an evaluation framework that uses LLM-generated historical datasets in experimental settings to test whether semantic change methods are sensitive to specific kinds of change.
    • A threshold-calibrated sense tracking pipeline for estimating the prevalence of a word’s senses in historical corpora. It touches on LSC interpretability by showing how LSC scores may not reflect sense change.
    • SenseRel, a sense-level benchmark for modeling semantic relations between word senses, connecting denotational semantic change types (e.g., metaphor, metonymy) with connotational dimensions of meaning (e.g., valence and arousal). We evaluate how well LLMs and fine-tuned models capture these relations.
  • Conceptual change in psychology and mental health language

    I study how psychological and mental health concepts change in meaning, salience, severity, and use across academic psychology, news media, books, and general language corpora. This stream includes substantive case studies of concepts such as trauma, mental illness, schizophrenia, introversion, and generic and emotion related mental health terminology. Together, these studies examine when concepts become more culturally prominent, more widely used, emotionally intense, or evaluatively charged, interpreted through psychological theory, concept creep research, and corpus evidence.

  • Social meaning in contested language
    I also contribute to collaborative projects that use NLP and computational social science methods to study socially important language, including dehumanization of women in incel discourse, mental health stigma detection in online communication, identity/person-first language for mental health conditions, and lay understandings of the common good. These projects extend my broader interest in how language reflects, organizes, and reshapes social and psychological categories.

Featured Work
Selected Presentations

Invited Talks
Utrecht University University of Gothenburg National Research Council Canada
Research Ethos
Much of my work begins with a measurement problem: existing methods for detecting semantic change often conflate distinct kinds of shift — broadening is not the same as softening or acquiring negative associations, yet most approaches treat these as one phenomenon. I believe getting measurement to approximate the construct as closely as possible is not a technical detail but a substantive one; imprecise tools produce imprecise claims about how language and culture change. SIBling, LSC-Eval, and SenseRel all emerged from this concern — frameworks designed to make semantic change research more interpretable, more evaluable, and more honest about what is and isn’t being measured.