
Scalable Computational Cognitive Models of the Bilingual Lexicon
A research project exploring the cognitive science behind how people acquire additional languages.
This is one of two projects researching the semantics and syntaxes of different languages. The Hebrew University of Jerusalem is the home institution for this project. To view the Melbourne-based partner project, click here.
The goals of this project are to:
The technological and theoretical importance of cross-linguistic applicability in semantic and syntactic representation has long been recognized, but achieving this goal has proved extremely difficult. The project will make progress towards a definition of a semantic and syntactic scheme that can be applied consistently across languages, by building on two major bodies of work:
Studying Cross-linguistic Alignment and Divergence Patterns through Parallel Corpora. The development of the Universal Dependencies (UD) and UCCA annotation schemes provides a basis for statistical in-depth studies of cross-linguistic syntactic divergences based on data from parallel corpora. This constitutes an improvement over traditional feature-based studies that treat languages as vectors of categorical features (as languages are represented, e.g., in databases such as WALS or AutoTyp). However, existing studies are mostly based on summary statistics over parallel corpora, such as relative frequencies of different word-order patterns, and do not reflect fine- grained cross linguistic mappings that are very important both for linguistic typology and practical NLP applications. For example,this methodology cannot directly detect that English nominal compounds and nominal-modification constructions are often translated with Russian adjectival-modification constructions or that English adjectival-modification and nominal-modification constructions routinely give rise to Korean relative clauses.
Preliminary work in Omri’s lab has manually word-aligned a subset of the Parallel Universal Dependencies corpus collection and conducted a quantitative and qualitative study based on it. The proposed project will not only extend the analysis to additional language pairs and to the use of UCCA categories, but also refine the representation with finer-grained distinctions, based on other sentence level schemes, such as AMR [24]. Moreover, the project will extend the analysis to include differences in the lexical semantics of the two languages, using an induced mapping between the distributional spaces of these languages.
Richer Mappings of Distributional Spaces across Languages. A complementary effort to studying the semantic mappings across languages by aligning parallel corpora, is aligning the vector space representations induced from monolingual data in each language. We will go beyond current approaches that attempt to find a global mapping of distributional spaces, mostly in terms of orthogonal linear transformations between the spaces. Instead, we will adopt a non-linear approach, based on topological data science theory.
The project also studies the relation between syntactic and lexical differences between languages, with the goal of understanding how both types of differences shape the geometry and topology of the embedding spaces of different languages.
Hebrew University of Jerusalem supervisor:
Dr Omri Abend
University of Melbourne supervisor:
Dr Lea Frermann
A research project exploring the cognitive science behind how people acquire additional languages.
A research project seeking to realize a biologically compatible noise spectroscopy system for studying ROS dynamics.
A research project investigating the role of brain oscillations in the mechanisms involved in predictive coding.
A research project furthering the research in cross-linguistic applicability in semantic and syntactic representation.