Natural Language Toolkit (NLTK)

Mark up and analyse large corpora using NLTK, and data management

Researchers are often confronted with natural language datasets so large that not even teams of researchers would be able to analyse by hand. Just reading the 100 million word British National Corpus for example, would take around 10,000 hours)

Python's Natural Language Toolkit provides powerful means of investigating large bodies of text for patterns that may otherwise be difficult to find.

Participants will be introduced to Python and NLTK. Key concepts and practices in Corpus linguistics/Distant Reading will be explained and used to investigate example data sets. Strategies for implementing NLTK writing your own research will also be discussed.

This course is FREE for researchers (PhDs, Postgrads, PostDocs & Profs)

Duration: 10 hours

Format: 4 sessions of 2.5 hours

Frequency: Quarterly

Notes: Bring your own computer