The following subcorpora of CRPC are available for online queries. They can be searched as a single corpus or as partial subcorpora according to text type.

Corpora of European Portuguese

1) ELAN Corpus : 2.840.552 words

ELAN Corpus (ELAN - European Language Activity Network)

corpus_ELAN

Number of words

newspaper (jornal_ELAN)

1.878.156

technical and scientific book (livrotec_ELAN)

510.562

periodical (revista_ELAN)

262.465

miscellaneous (varia_ELAN)

189.356

  Total

2.840.552

2) RL Corpus: 8.670.438 words

Non-annotated RL Corpus (Language Resources for Portuguese: a corpus and tools for query and analysis)

corpus

Number of words

spoken corpus (corpus_oral_RL)

105.964

written corpus (corpus_escrito_RL)

8.564.474

newspaper (jornal RL)

4.097.868

fiction book (livrolit RL)

1.792.590

technical and scientific book (livrotec RL)

1.440.625

periodical (revista RL)

420.792

miscellaneous (varia RL)

812.599

Total (spoken + written)

8.670.438

3) ELAN + RL Corpora: 11.405.026 words

ELAN Corpus (ELAN - European Language Activity Network) + Non-annotated corpus RL (Language Resources for Portuguese: a corpus and tools for query and analysis)

Tagged Corpus of European Portuguese

corpus_RL_ELAN

Nº de palavras

newspaper (jornal_RL_ELAN)

5.976.024

technical and scientific book (livrotec_RL_ELAN)

1.951.187

periodical (revista_RL_ELAN)

683.257

miscellaneous (varia_RL_ELAN)

1.001.955

4) RL tagged Corpus: 501.042 words

Tagged Corpus RL (Language Resources for Portuguese: a corpus and tools for query and analysis)

Corpus of African Portuguese Varieties

Tagged corpus (corpus_anotado_RL)

Number of words

newspaper (jornal_anotado_RL)

336.151

periodical (revista_anotado_RL)

25.908

book (livro_anotado_RL)

125.434

miscellaneous (varia_anotado_RL)

13.549

 Total

501.042



Annotation manual

It is also possible to query separately files which were automatically tagged, with no manual revision (Ex.: jornal_anot_auto_RL) and files which were manually revised (ex.: jornal_anot_rev_man_RL):

newspaper (jornal_anot_auto_RL)

184.418

book (livro_anot_auto_RL)

60.344

periodical (revista_anot_auto_RL)

18.914

miscellaneous (varia_anot_auto_RL)

8.273

 

newspaper (jornal_anot_rev_man_RL)

184.131

book (livro_anot_rev_man_RL)

63.264

periodical (revista_anot_rev_man_RL)

15.328

miscellaneous (varia_anot_rev_man_RL)

8.319

To query a word in the tagged corpus, either ask for the lemma or for the word form and tag.

 

5) AFRICA Corpus: 3.000.000 words 

AFRICA Corpus (Linguistic Resources for the Study of African Varieties of Portuguese)

Countries

Spoken corpus

Written corpus

Angola

27.363

613.495

Cape Verde

25.413

612.120

Guinea-Bissau

25.016

615.404

Mozambique

26.166

615.297

Sao Tome and Principe

25.287

614.563

Total

129.245

3.070.879

Total of both corpora

3.070.879