Corpora


CAL2 – L2 Acquisition Corpus

Description: CAL2 compiles the spontaneous production data (written and oral) collected under the Morphology and Syntax in L2 Acquisition project (read more)
Link: http://cal2.clunl.fcsh.unl.pt


CIPM – Digital Corpus of Medieval Portuguese

Description: CIPM consists of texts dating from the 12th to the 16th centuries, and it includes texts in prose, both literary texts (hagiographic, historical and travel narratives, doctrinal prose, philosophical treatises, texts of a moralistic and religious nature) and non-literary texts (private notarial documents, royal documents, wills, charters, i.e., primarily legal documents) (read more)
Link: http://cipm.fcsh.unl.pt


CORPORART – PT/IT specialized comparable corpora of Public Art

Description: CORPORART – PT/IT is a bilingual comparable corpus of the Public Art domain. It comprises sub corpora for contemporary European Portuguese and Italian, from 2000 to 2018, covering text types and subdomains representative of the production of specialized texts in this highly interdisciplinary domain (read more)
Link: https://clunl.fcsh.unl.pt/en/online-resources/corpora/corporart-corpus-comparavel-pt-it-de-especialidade-no-dominio-da-arte-publica/


Portuguese Literature Corpus for Distant Reading 

Description: The Portuguese Literature Corpus for Distant Reading is a literary corpus of non canonical novels by Portuguese authors, from the period 1840-1920 (read more)
Link: https://github.com/COST-ELTeC/ELTeC-por


G&T.Comenta

Description: The corpus G&T.Comenta was created within G&T.Comenta project for study and categorization of the commentary as an activity of language and textual practice. The corpus results from a collection of texts circulating in different media and from different sources (read more)
Link: https://projetos.dhlab.fcsh.unl.pt/s/GTComenta/item


MIGRANTE.PT

Description: Resulting from the project EXPRIMI, MIGRANTE.PT is an European Portuguese corpus for specific purposes with around 1,5 million tokens, of institutional texts concerning the integration of migrants in Portugal and directed to these migrants, collected from sites and materials freely available online (read more)
Link: https://clunl.fcsh.unl.pt/en/online-resources/corpora/migrante-pt/


Parallel sense-annotated corpus ELEXIS-WSD 1.0

Description: ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.0 contains sentences for 10 languages: Bulgarian, Danish, English, Spanish, Estonian, Hungarian, Italian, Dutch, Portuguese, and Slovene (read more)
Link: http://hdl.handle.net/11356/1674