CORPORART – PT/IT Specialized Comparable Corpora of Public Art


CORPORART – PT/IT is a bilingual comparable corpus of the Public Art domain. It comprises sub corpora for contemporary European Portuguese and Italian, from 2000 to 2018, covering text types and subdomains representative of the production of specialized texts in this highly interdisciplinary domain.

Selection criteria (text type determination, sub domain specification and selection) and consequent representativeness were validated by domain experts, resulting in a balanced and representative specialized bilingual corpus.

More on the corpus:
Barbero, C. (2019). CORPORART – um corpus de arte pública para a extração de léxico: representatividade e comparabilidade em corpora de especialidade. Revista da Associação Portuguesa de Linguística, n. 5, 09/2019, pp. 43-57,

ISLRN: 848-863-329-623-9


Corpora files and respective documentation (metadata files) are available under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.
Corpora files are in compressed .txt format; metadata and wordlists are in .xlsx format.
To access CORPORART files, please fill in and sign the following statement and send it to:

Access metadata for:

Access wordlists for:


Chiara Barbero
Professor Raquel Amaro (coordination)
Professor Rita Ochoa (specialized consulting), University of Beira Interior

CORPORART was compiled by Chiara Barbero, under the scope of her PhD project, for the extraction of specialized lexicon. The work was funded by FCT – Fundação para a Ciência e a Tecnologia – KRUse PhD grant (PD/BD/128131/2016).