How can digital humanists examine the increasingly large quantity of digital texts which are now available? Big data techniques and textual analysis are hugely popular and influential in business and the Media, but what lies beneath the colourful visualizations?
This module will explore the theory and practice of corpus creation and cleaning, tool selection, analysis, and visualization. By the end of the module students will be able to create their own corpus and analyse it using a broad range of tools and techniques. Students will learn how to construct effective research questions, consider which tools are most appropriate, and present their findings.
The first part of the module will introduce students to a suite of ‘out-of-the-box’ digital tools which can be used to analysing the data in digitally encoded text files. Students will familiarise themselves with common techniques used to extract and visualise information from large volumes of unstructured text (eg. Topic Modelling).
The second part of the module will provide an introduction to textual analysis and word vector analysis in R (a free and open source piece of software). Students will be introduced to a range of techniques using R, learn how to customize their analysis, and how to create visualizations.