Corpora
CAL2 – L2 Acquisition Corpus
Description: CAL2 compiles the spontaneous production data (written and oral) collected under the Morphology and Syntax in L2 Acquisition project (read more)
Link: http://cal2.clunl.fcsh.unl.pt
CIPM – Digital Corpus of Medieval Portuguese
Description: CIPM consists of texts dating from the 12th to the 16th centuries, and it includes texts in prose, both literary texts (hagiographic, historical and travel narratives, doctrinal prose, philosophical treatises, texts of a moralistic and religious nature) and non-literary texts (private notarial documents, royal documents, wills, charters, i.e., primarily legal documents) (read more)
Link: http://cipm.fcsh.unl.pt
CORPORART – PT/IT specialized comparable corpora of Public Art
Description: CORPORART – PT/IT is a bilingual comparable corpus of the Public Art domain. It comprises sub corpora for contemporary European Portuguese and Italian, from 2000 to 2018, covering text types and subdomains representative of the production of specialized texts in this highly interdisciplinary domain (read more)
Link: https://clunl.fcsh.unl.pt/en/online-resources/corpora/corporart-corpus-comparavel-pt-it-de-especialidade-no-dominio-da-arte-publica/
Corpus of Written Narratives PIPALE
Description: The Corpus of Written Narratives is a corpus of texts produced by primary school children (2nd and 3rd grades) obtained within the PIPALE project (read more)
Link: https://pipale.fcsh.unl.pt/corpus-de-narrativas-escritas/
Portuguese Literature Corpus for Distant Reading
Description: The Portuguese Literature Corpus for Distant Reading is a literary corpus of non canonical novels by Portuguese authors, from the period 1840-1920 (read more)
Link: https://github.com/COST-ELTeC/ELTeC-por
G&T.Comenta
Description: The corpus G&T.Comenta was created within G&T.Comenta project for study and categorization of the commentary as an activity of language and textual practice. The corpus results from a collection of texts circulating in different media and from different sources (read more)
Link: https://projetos.dhlab.fcsh.unl.pt/s/GTComenta/item
HEREDITermCorpus_en (V0.1)
Description: The HEREDITermCorpus_en_V0.1 compiles a curated selection of texts dedicated to the microbiota-gut-brain axis (MGBA) and its emerging role in neurodegenerative disorders. The dataset comprises 1,060 documents, 234,215 sentences, 4,132,486 words and 6,029,603 tokens (read more)
Link: https://doi.org/10.5281/zenodo.16968962
HEREDITermCorpus_pt (V0.1)
Description: The HEREDITermCorpus_pt_V0.1 compiles a curated selection of texts dedicated to the microbiota-gut-brain axis (MGBA) and its emerging role in neurodegenerative disorders. The dataset comprises 126 documents, 100,610 sentences, 1,999,301 words and 2,665,436 tokens (read more)
Link: https://doi.org/10.5281/zenodo.16969241
MIGRANTE.PT
Description: Resulting from the project EXPRIMI, MIGRANTE.PT is an European Portuguese corpus for specific purposes with around 1,5 million tokens, of institutional texts concerning the integration of migrants in Portugal and directed to these migrants, collected from sites and materials freely available online (read more)
Link: https://clunl.fcsh.unl.pt/en/online-resources/corpora/migrante-pt/
Parallel sense-annotated corpus ELEXIS-WSD 1.0
Description: ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.0 contains sentences for 10 languages: Bulgarian, Danish, English, Spanish, Estonian, Hungarian, Italian, Dutch, Portuguese, and Slovene (read more)
Link: http://hdl.handle.net/11356/1674
Parallel sense-annotated corpus ELEXIS-WSD 1.2
Description: ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.0 contains sentences for 10 languages: Bulgarian, Danish, English, Spanish, Estonian, Hungarian, Italian, Dutch, Portuguese, and Slovene (read more)
Link: http://hdl.handle.net/11356/2022
Parallel sense-annotated corpus ELEXIS-WSD 1.3
Description: ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.0 contains sentences for 10 languages: Bulgarian, Danish, English, Spanish, Estonian, Hungarian, Italian, Dutch, Portuguese, and Slovene (read more)
Link: http://hdl.handle.net/11356/2029
Menu
- Corpora
- CAL2 – L2 Acquisition Corpus
- CIPM – Digital Corpus of Medieval Portuguese
- CORPORART – PT/IT Specialized Comparable Corpora of Public Art
- Corpus of Written Narratives PIPALE
- Portuguese Literature Corpus for Distant Reading
- G&T.Comenta
- HEREDITermCorpus_en (V0.1)
- HEREDITermCorpus_pt (V0.1)
- MIGRANTE.PT
- Parallel sense-annotated corpus ELEXIS-WSD 1.0
- Parallel sense-annotated corpus ELEXIS-WSD 1.2
- Parallel sense-annotated corpus ELEXIS-WSD 1.3
- Lexicons, Dictionaries, Glossaries
- BDTT-AR – Terminological and Textual Database for the Portuguese Parliament
- Dicionário de Abreviaturas Digitais
- DLP – Portuguese Language Dictionary
- Multilingual Multidomain Dictionary
- DVPM – The Dictionary of Medieval Portuguese Verbs
- COVID-19 Collaborative Glossary
- Multilingual Terminological Glossaries for specific purposes within the Community of Portuguese Language Countries
- Basic terms in speech and language pathology diagnosis
- Ontologies
- Training material
- Other
- BILP – Bibliography of Portuguese Linguistics database
- Workbook: Ensinar com o dicionário: informações linguísticas e lexicográficas para ensino de Português
- CORPORART_GRAMM_IT_1.0: CORPORART Semantic Word Sketch Grammar for Italian
- CORPORART_GRAMM_PT_1.1: CORPORART Semantic Word Sketch Grammar for European Portuguese
- Assessment instrument PIPALE 1
- Percursos Didáticos
- Práticas de texto
PT