Doing text analytics for Digital Humanities and Social Sciences with CLARIN

Tutorial Website

The tutorial is co-organized by CLARIN and DARIAH-Ireland.

Background and motivation

Text is a basic material, a primary data layer, in many areas of humanities and social sciences. If we want to move forward with the agenda that the fields of digital humanities and computational social sciences are projecting, it is vital to bring together the technical areas that deal with automated text processing, and scholars in the humanities and social sciences. Much progress has been made in the last two decades in text analytics, a field that draws on recent advances in computational linguistics, information retrieval and machine learning. By now we know what to expect from basic tools, such as named entity recognition. To foster new areas of research, it is necessary to not only understand what is out there in terms of proven technologies and infrastructures such as CLARIN, but also how the developers of text analytics can work with researchers in the humanities and social sciences to understand the challenges in each other’s field better. What are the research questions of the researchers working on the texts? Can answering these questions be supported by computational models (in a non-reductionistic way)?


In two lectures, devoted to text analytics applied to the Humanities and the Social Sciences, Dong Nguyen (Alan Turing Institute, UK) and Antal van den Bosch (Meertens Institute and Radboud University, the Netherlands) introduce current challenges and present working solutions. Folgert Karsdorp (Meertens Institute, the Netherlands) then offers an afternoon introductory course on using Python for the humanities and social sciences (bring your own laptop). The tutorial program is concluded with an expert session featuring the three lecturers who will answer specific questions of attendants about the most suitable resources, technologies and methodology for their research. We will be gathering these specific questions beforehand, so that we have an idea of the number of interested people and issues to be discussed, and to be able to think about our answers. If you wish to participate in the expert session, please send a brief description of your questions (optionally with links to papers with background ideas) to before June 2 2017.


The tutorial is primarily intended for PhD students, post-docs and younger researchers working in the fields of Digital Humanities and Social Sciences. No programming knowledge is required but basic experience in working with digital text collections is a plus. For the hands-on session please bring your own laptop.


Antal van den Bosch, Meertens Institute and Radboud University, the Netherlands
Folgert Karsdorp, Meertens Institute, the Netherlands
Dong Nguyen, Alan Turing Institute, UK