This hands on workshop goes through the common “preprocessing recipe” that is used as the foundation for a variety of other applications as well as some basic natural language processing techniques. These include: a) digitization (utf 8), b) removal of stopwords, numbers, punctuation, c) tokenization, d) calculation of word frequencies / proportions, e) part of speech tagging, and f) concordances.
Prior knowledge: We will be using the NLTK Python package, so basic familiarity with Python is required if you wish to follow along with the tutorial. Completion of D-Lab’s Python FUN!damentals workshop series will be sufficient.
This workshop is one of a four-part series that will prepare participants to move forward with text analysis research, with a special focus on humanities and social science applications. Please register for each workshop separately.