Corpus
CRPC

Corpus de Referência do Português Contemporâneo

Online queries to CRPC subcorpora

The following subcorpora of CRPC are available for online queries. They can be searched as a single corpus or as partial subcorpora according to text type. 
 

Corpora of European Portuguese

1) ELAN Corpus : 2.840.552 words

ELAN Corpus (ELAN - European Language Activity Network)

 

corpus_ELAN Number of words
newspaper (jornal_ELAN) 1.878.156
technical and scientific book (livrotec_ELAN) 510.562
periodical (revista_ELAN) 262.465
miscellaneous (varia_ELAN)  189.356
Total 2.840.552

 

2) RL Corpus: 8.670.438 words

Non-annotated RL Corpus (Language Resources for Portuguese: a corpus and tools for query and analysis)

 

corpus Number of words
spoken corpus (corpus_oral_RL) 105.964
written corpus (corpus_escrito_RL) 8.564.474
newspaper (jornal RL) 4.097.868
fiction book (livrolit RL) 1.792.590
technical and scientific book (livrotec RL) 1.440.625
periodical (revista RL) 420.792
miscellaneous (varia RL) 812.599
Total (spoken + written) 8.670.438

 

3) ELAN + RL Corpora: 11.405.026 words

ELAN Corpus (ELAN - European Language Activity Network) + Non-annotated corpus RL (Language Resources for Portuguese: a corpus and tools for query and analysis)

Tagged Corpus of European Portuguese

corpus_RL_ELAN Nº de palavras
newspaper (jornal_RL_ELAN) 5.976.024
technical and scientific book (livrotec_RL_ELAN) 1.951.187
periodical (revista_RL_ELAN) 683.257
miscellaneous (varia_RL_ELAN) 1.001.955

 

4) RL tagged Corpus: 501.042 words

Tagged Corpus RL (Language Resources for Portuguese: a corpus and tools for query and analysis)

Corpus of African Portuguese Varieties

Tagged corpus (corpus_anotado_RL) Number of words
newspaper (jornal_anotado_RL) 336.151
periodical (revista_anotado_RL)  25.908
book (livro_anotado_RL) 125.434
miscellaneous (varia_anotado_RL) 13.549
 Total 501.042



Annotation manual

It is also possible to query separately files which were automatically tagged, with no manual revision (Ex.: jornal_anot_auto_RL) and files which were manually revised (ex.: jornal_anot_rev_man_RL): 

 

newspaper (jornal_anot_auto_RL)  184.418
book (livro_anot_auto_RL)  60.344
periodical (revista_anot_auto_RL) 18.914
miscellaneous (varia_anot_auto_RL) 8.273

 

newspaper (jornal_anot_rev_man_RL) 184.131
book (livro_anot_rev_man_RL)  63.264
periodical (revista_anot_rev_man_RL)  15.328
miscellaneous (varia_anot_rev_man_RL)  8.319

 

To query a word in the tagged corpus, either ask for the lemma or for the word form and tag.

  

5) AFRICA Corpus: 3.000.000 words 

AFRICA Corpus (Linguistic Resources for the Study of African Varieties of Portuguese)

 

Countries Spoken corpus Written corpus
Angola 27.363 613.495
Cape Verde  25.413 612.120
Guinea-Bissau 25.016 615.404
Mozambique 26.166 615.297
Sao Tome and Principe 25.287 614.563
Total 129.245 3.070.879
Total of both corpora   3.070.879