The following subcorpora of CRPC are available for online queries. They can be searched as a single corpus or as partial subcorpora according to text type.
Corpora of European Portuguese
1) ELAN Corpus : 2.840.552 words
ELAN Corpus (ELAN - European Language Activity Network)
corpus_ELAN |
Number of words |
newspaper (jornal_ELAN) |
1.878.156 |
technical and scientific book (livrotec_ELAN) |
510.562 |
periodical (revista_ELAN) |
262.465 |
miscellaneous (varia_ELAN) |
189.356 |
Total |
2.840.552 |
2) RL Corpus: 8.670.438 words
Non-annotated RL Corpus (Language Resources for Portuguese: a corpus and tools for query and analysis)
corpus |
Number of words |
spoken corpus (corpus_oral_RL) |
105.964 |
written corpus (corpus_escrito_RL) |
8.564.474 |
newspaper (jornal RL) |
4.097.868 |
fiction book (livrolit RL) |
1.792.590 |
technical and scientific book (livrotec RL) |
1.440.625 |
periodical (revista RL) |
420.792 |
miscellaneous (varia RL) |
812.599 |
Total (spoken + written) |
8.670.438 |
3) ELAN + RL Corpora: 11.405.026 words
ELAN Corpus (ELAN - European Language Activity Network) + Non-annotated corpus RL (Language Resources for Portuguese: a corpus and tools for query and analysis)
Tagged Corpus of European Portuguese
corpus_RL_ELAN |
Nº de palavras |
newspaper (jornal_RL_ELAN) |
5.976.024 |
technical and scientific book (livrotec_RL_ELAN) |
1.951.187 |
periodical (revista_RL_ELAN) |
683.257 |
miscellaneous (varia_RL_ELAN) |
1.001.955 |
4) RL tagged Corpus: 501.042 words
Tagged Corpus RL (Language Resources for Portuguese: a corpus and tools for query and analysis)
Corpus of African Portuguese Varieties
Tagged corpus (corpus_anotado_RL) |
Number of words |
newspaper (jornal_anotado_RL) |
336.151 |
periodical (revista_anotado_RL) |
25.908 |
book (livro_anotado_RL) |
125.434 |
miscellaneous (varia_anotado_RL) |
13.549 |
Total |
501.042 |
It is also possible to query separately files which were automatically tagged, with no manual revision (Ex.: jornal_anot_auto_RL) and files which were manually revised (ex.: jornal_anot_rev_man_RL):
newspaper (jornal_anot_auto_RL) |
184.418 |
book (livro_anot_auto_RL) |
60.344 |
periodical (revista_anot_auto_RL) |
18.914 |
miscellaneous (varia_anot_auto_RL) |
8.273 |
newspaper (jornal_anot_rev_man_RL) |
184.131 |
book (livro_anot_rev_man_RL) |
63.264 |
periodical (revista_anot_rev_man_RL) |
15.328 |
miscellaneous (varia_anot_rev_man_RL) |
8.319 |
To query a word in the tagged corpus, either ask for the lemma or for the word form and tag.
5) AFRICA Corpus: 3.000.000 words
AFRICA Corpus (Linguistic Resources for the Study of African Varieties of Portuguese)
Countries |
Spoken corpus |
Written corpus |
Angola |
27.363 |
613.495 |
Cape Verde |
25.413 |
612.120 |
Guinea-Bissau |
25.016 |
615.404 |
Mozambique |
26.166 |
615.297 |
Sao Tome and Principe |
25.287 |
614.563 |
Total |
129.245 |
3.070.879 |
Total of both corpora |
|
3.070.879 |