Portuguese Literature Corpus for Distant Reading


  • Project identification: Portuguese Literature Corpus for Distant Reading
  • Group: Lexicology, Lexicography and Terminology
  • Principal Investigator: Raquel Amaro
  • Duration: Dec. 2018 – June 2020
  • Funding entity: co-funded by COST Action CA 16204; with the support of Biblioteca Nacional de Portugal (Portuguese National Library)
  • Keywords: Corpus linguistics; Literature; Distant Reading.


The project “Portuguese Literature Corpus for Distant Reading” aims at the constitution of a literary corpus of novels by Portuguese authors to integrate the European corpus ELTeC – European Literary Text Collection. This will assure the Portuguese contribution to the development of good practices and computational methods of textual analysis adapted to the European literary traditions, on the one hand, and to the study and analysis of fundamental concepts of Portuguese and European literary theory and history, on the other, in the context and development of the action CA 16204 – Distant Reading for European Literary History (http://www.cost.eu/COST_Actions/ca/CA16204).
The integration of the Portuguese language and culture in this European corpus will allow the joint development of new methods and new ways of conceiving European literary history, in the context of international cooperation projects, allowing the creation of new theoretical and practical frameworks and the contrastive analysis of languages and cultures through innovative and sophisticated data-based computational methods.


The corpora from ELTeC – European Literary Text Collection are open. It is possible to access to statistics and human-readable versions of each text of the corpora, including the Portuguese Literature Corpus for Distant Reading, at https://distantreading.github.io/ELTeC/.

The original source files are stored in a GitHub repository and can be downloaded freely at https://github.com/COST-ELTeC.


Raquel Amaro (principal investigator)
Paulo Pereira (Portuguese Literature Centre, FLUC)
Isabel Araújo Branco (CHAM – Humanities Centre, NOVA FCSH)
Adeliana Silva (CLUNL, research grant holder)
Diana Santos (Universidade de Oslo)