Scalable Computational Cognitive Models of the Bilingual Lexicon

 

2 Minute read

This is one of two projects researching the semantics and syntaxes of different languages. The University of Melbourne is the home institution for this project. View the HUJI-based partner project.

This project aims to:

  1. Contribute novel insights to the validity of different SLA models by exposing them to diverse and naturalistic data, and testing them on a larger scale.
  2. Use findings to inform cross-lingual transfer of NLP models, i.e., the automatic adaptation of a model trained on one language to a different one

The details

Learning a second language (L2) is a major cognitive effort, yet humans are able to reliably acquire languages in addition to their native language (L1) with remarkable success, and decades of research have revealed intricate shifts in conceptual and linguistic representations caused by second language acquisition (SLA). This project will leverage machine learning (ML) and natural language processing (NLP) methods, as well as the availability of large-scale naturalistic data sets of learner language, in order to investigate the structure and development of the bilingual lexicon. We will expose established models of SLA to large corpora of native and learner languages. This project has a dual nature: First, it will contribute novel insights to the validity of different SLA models by exposing them to diverse and naturalistic data and testing them on a larger scale. Secondly, we will utilize our findings to inform the cross-lingual transfer of NLP models, i.e., the automatic adaptation of a model trained on one language to a different one.

Scalable Models of Lexical and Conceptual Representations in SLA. We will draw on recent developments in distributional and contextual language modelling, and incorporate mechanisms of lexical structure and development in SLA derived from established psycholinguistic models of bilingualism. We will evaluate our models on a broader scale than previously done, with the aim of drawing more robust conclusions and separating universal phenomena from language pair-specific ones. We will test our models on predicting controlled behavioural data as well as on predicting naturalistic learner data (observed in L2 essays).

Informed Priors for Cross-lingual Model Transfer. We will incorporate our insights on global and language-pair-specific shifts in lexical representations as priors in cross-lingual model transfer, hypothesizing that they will enable more effective transfer models. Cross-lingual domain adaptation, where models trained on one language (typically data-rich) is transferred to a different language (typically data-poorer), is an important, yet open, research problem in NLP. We will experiment with ways of incorporating priors in order to constrain and guide the adaptation process in an informed way.

Supervision team

University of Melbourne supervisor:
Dr Lea Frermann

Hebrew University of Jerusalem supervisor:
Dr Omri Abend

First published on 31 August 2022.


Share this article