Corpus SANTOS - European Portuguese
SANTOS - European Portuguese
Corpus of child and child-directed speech
Santos - European Portuguese is a corpus of child and child-directed speech, transcribed according to the CHILDES (Child Language Data Exchange System) system and using the CLAN software (MacWhinney, 2000). It includes around 52 hours of child-adult interaction, contains 27,595 child utterances and 70,736 adult utterances. The corpus is part of AcEP (for a full description see Santos, 2006 and Santos et al. 2014) and is available in the CHILDES Database, from this link. The corpus is annotated using a tagger developed at CLUL (Généreux, Hendrickx & Mendes, 2012) - the POS-tags which were used are presented here. This corpus is registered under the following ISLRN: 532-620-702-768-3. The corpus includes data involving three children, according to the description in the table:
Child | Age | MLUw | Number of files | Number of child’s utterances |
---|---|---|---|---|
INI | 1;6.6 - 3;11.12 | 1.530 - 3.827 | 21 | 6,591 |
TOM | 1;6.18 - 3;10.16 | 1.286 - 3.089 | 30 | 15,548 |
INM | 1;5.9 - 2;9.3 | 1.345 - 2.834 | 16 | 5,456 |
All types of work using this corpus as a source of information should cite:
Santos, A. L. (2006). Minimal Answers. Ellipsis, Syntax and Discourse in the Acquisition of European Portuguese. Ph.D. Dissertation. Universidade de Lisboa. (Published 2009, Amsterdam / Philadelphia: John Benjamins).
Santos, A. L., M. Génereux, A. Cardoso, C. Agostinho, S. Abalada (2014) A corpus of European Portuguese child and child-directed speech. In Proceedings of the 9th Conference on Language Resources and Evaluation – LREC 2014. European Language Resources Association (ELRA).
This corpus (or its previous versions) was used as basis for different databases:
Santos, Ana Lúcia, Maria João Freitas & Aida Cardoso (2014) CEPLEXicon - A Lexicon of Child European Portuguese. Lisboa: Anagrama (CLUL, FLUL). ISLRN: 408-817-203-152-3 , ELRA ID: ELRA-L0094
CDS_EP - A lexicon of child directed speech for European Portuguese from the FrePOP database