Grammar & Resources

The group is centered on modeling linguistic knowledge, integrating interfaces between different areas of grammar and knowledge about how language is put to use. Joint work in formal phonology, lexicon, syntax and semantics allows building an integrated model of grammar, considering how it is represented in the human mind, as well as how it can be computationally modelled; work on L1 and L2 acquisition is at the core of this work. The integration of models of language representation and models of language use is achieved through the study of corpora.

The production of corpora and resources is justified by the goal of developing documentation and providing descriptions of contemporary European Portuguese, but also of understudied contact languages or varieties (Portuguese-based creoles, national varieties of Portuguese in Africa and Asia). The group also produces resources for the study of L1 and L2 acquisition in different settings. The group integrates CLARIN LP.

Research on L1 and L2 acquisition contributes to CLUL’s general purpose of effectively articulating fundamental and applied research, namely in the areas of Educational Linguistics and Clinical Linguistics.

General goals:

- To produce new resources for the study of Portuguese and Portuguese-based creoles;

- To pursue basic research on natural language modeling, integrating knowledge on interfaces between language modules;

- To continue the documentation and description of understudied creoles and new varieties of Portuguese that emerged in a context of language contact;

- To develop the study of language acquisition with an emphasis on language contact situations (see new international Heritage Language Consortium) and on the comparison between typical and atypical development;

- To explore the potential of comparative linguistics in the production of resources for translation and to promote connections with the industry in the area of translation.

 

Resources Type
A Lexicon of Child European Portuguese - CEPLEXicon Lexicon
A Portuguese Native Language Identification Dataset - NLI-PT Database
Acquisition of European Portuguese Databank - AcEP Database
Child-Adult Interaction Corpus - CAI Corpus
Child-Adult interaction European Portuguese Database
Consonantic Sequences Oral and Written Production Tasks - PORESC Tool
Controlled Portuguese - CLG Database
Corpora of PLE Corpus
Corpus Almeida - European Portuguese / French Corpus
Corpus Angolar Corpus
Corpus C-ORAL-ROM Corpus
Corpus CCF Corpus
Corpus CINTIL Corpus
Corpus Fadambo Corpus
Corpus Leiria (1991) Corpus
Corpus of Cape Verdean Portuguese Corpus
Corpus of Sri Lanka Portuguese Corpus
Corpus of the Diaries of the Portuguese Parliament annotated with PoS - PTPARL Corpus
Corpus PESTRA Corpus
Corpus Português Fundamental - Corpus PF Corpus
Corpus Principense Corpus
Corpus REDIP Corpus
Corpus Santome Corpus
Corpus SANTOS - European Portuguese Corpus
Crosslinguistic Child Phonology Project - Português Europeu - CLCP-PE Tool
Dados Orais de Cabo Verde - CV Words Database
Demo de Subespecificação e Desambiguação de Escopo Tool
Dictionary of Hindi-Portuguese-Hindi Database
Diu Indo-Portuguese Data Set Database
Learner Corpus of Portuguese L2 - COPLE2 Corpus
LT Corpus (Literary Corpus) - LT Corpus Corpus
Modality Lexicon - MODAL-LEX-PT Lexicon
Multifunctional Computational Lexicon of Contemporary Portuguese Lexicon
Named Entity Recognizer - CRPC-NER Tool
Nominal Multiword Lexical Units in European Portuguese Lexicon
NPChunks: Corpus of 1000 sentences annotated with PoS and nominal chunks - NPChunks Corpus
Online Corpus of Writing and Speech of Children in the Early Years of Schooling - EFFE-On Corpus
Online Dictionary Portuguese-Slovak/Slovak-Portuguese Database
Pereira&Freitas - EP Corpus
Person-Machine Interaction in Natural Language - INQUER Database
PhonoDis Corpus
Phonological Awareness Tasks for First Grade School Children - TCFC Tool
Portuguese Biographies - Bio-PT Database
Portuguese Corpus Annotated for Modality - MODAL Corpus
Portuguese Lexicon of Discourse Markers - LDM-PT Lexicon
Portuguese Technical Lexica - LEXTEC Lexicon
Portuguese Discourse Bank - CRPC-DB Corpus
Quotations database - CRPC-quotations Database
Ramalho – EP Corpus
Reference Corpus of Contemporary Portuguese - CRPC Corpus
Santome Structure Dataset Database
Spoken Corpus Mozambique 1986-87 - SCM Corpus
Spoken Portuguese - Geographical and Social Varieties Corpus
Vocatives in European Portuguese Corpus
Word Combination in European Portuguese - LEX-MWE-PT Lexicon
WordNet.PT Lexicon
Artigo em Atas
Mendes, S., Necsulescu, S., & Bel, N. (2012). Synonym extraction using a language graph model. In Workshop on Semantic Relations II – Enhancing Resources and Applications at the 8th international conference on Language Resources and Evaluation – LREC 2012 (pp. 1-9). Istambul, Turquia. (Original work published 2012)
Marrafa, P., Amaro, R., & Mendes, S. (2011). WordNet.PTglobal – Extending WordNet.PT to Portuguese varieties. In DIALECTS’11 – the 1st workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties at the 2011 Conference on Empirical Methods in Natural Language Processing – EMNLP 2011 (pp. 70-74). Edimburgo, Escócia. (Original work published 2011)
Mendes, S., & Amaro, R. (2009). Modeling adjectives in GL: accounting for all adjective classes. In 5th International Conference on Generative Approaches to the Lexicon (pp. 176-183). Pisa, Itália. (Original work published 2009)
Mendes, S. (2009). Modeling the impact of adjective position in the construction of NP meaning. In 5th International Conference on Generative Approaches to the Lexicon (pp. 201-208). Pisa, Itália. (Original work published 2009)
Marrafa, P., & Mendes, S. (2006). Modeling Adjectives in Computational Relational Lexica. In 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics – COLING/ACL 2006 (pp. 555-562). Sidney, Austrália. (Original work published 2006)
Mendes, S. (2006). Adjectives in WordNet.PT. In 3rd Global WordNet Association Conference (pp. 225-230). Ilha Jeju, República da Coreia. (Original work published 2006)
Marrafa, P., Amaro, R., Chaves, R. P., Lourosa, S., Martins, C., & Mendes, S. (2006). WordNet.PT new directions. In 3rd Global WordNet Association Conference (pp. 319-320). Ilha Jeju, República da Coreia. (Original work published 2006)
Mendes, S. (2005). Event modifying adjectives in Portuguese. In 3rd International Conference on Generative Approaches to the Lexicon (pp. 159-166). Genebra, Suíça. (Original work published 2005)
Mendes, S., & Moriceau, V. (2004). L’analyse des questions: intérêts pour la génération des réponses. In Workshop Question-Réponse à la Conférence TALN 2004 (Traitement Automatique du Langage Naturel) (pp. 413-422). Fez, Marrocos. (Original work published 2004)
Mendes, S., & Chaves, R. P. (2001). Enriching WordNet with Qualia Information. In Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations at the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics – NAACL 2001 (pp. 108-112). Carnegie Mellon University, Pittsburgh, PA, EUA. (Original work published 2001)
Hagemeijer, T. (2019). Pontes entre os crioulos portugueses de África e a história do português: um caso de estudo de /b/ e /v/. In Ernestina Carrilho, Ana Maria Martins, Sandra Pereira e João Paulo Silvestre (orgs.), Estudos linguísticos e filológicos oferecidos a Ivo Castro, 766-779 (Lisboa: Centro de Linguística da Universidade de Lisboa).
Tiny, A., Amaro, H., Hendrickx, I., & Hagemeijer, T. (2012). O forro: A construção de um corpus. In Ana Cristina Roque, Gerhard Seibert e Vítor Rosado Marques (coord.). Livro de Atas - Colóquio Internacional: São Tomé e Príncipe numa perspectiva interdisciplinar, diacrónica e sincrónica. Lisboa: ISCTE-IUL; IICT, 597-609.
Hagemeijer, T., Hendrickx, I., Amaro, H., & Tiny, A. (2012). A Corpus of Santome. In Proceedings of the SALTMIL-AfLaT workshop, Istanbul, Turkey, 2012. European Language Resources Association (ELRA), 61-66.
Hagemeijer, T., & Santos, A. L. (2004). Elementos polares na periferia direita: negação aparentemente descontínua, afirmação enfática e tags. In Tiago Freitas & Amália Mendes (orgs.), Actas do XIX encontro nacional da Associação Portuguesa de Linguística, 465-476. Lisboa: APL.
Hagemeijer, T. (2000). Verbos e gramaticalização em são-tomense. In Ernesto d’Andrade, Maria Antónia Mota, Dulce Pereira (eds.), Actas do workshop sobre crioulos de base lexical portuguesa, 111-126. Lisboa: Colibri.
Andrade, A., & Rodrigues, C. (2005). Fusão de sibilantes: um processo de mudança/standardização?. In Actas do XX Encontro Nacional da Associação Portuguesa de Linguística (pp. 363-371). Lisboa.
Andrade, A., & Rodrigues, C. (2004). Um exemplo de sandhi consonântico variável em português. In Actas do XIX Encontro Nacional da Associação Portuguesa de Linguística (pp. 257-268). Lisboa.
Mateus, M. H., & Rodrigues, C. (2004). A vibrante em coda no português europeu. In Actas do XIX Encontro Nacional da Associação Portuguesa de Linguística (pp. 289-299). Lisboa.
Rodrigues, C. (2002). Questões de espraiamento em PE. In XVII Encontro Nacional da Associação Portuguesa de Linguística (pp. 419-432). Lisboa: APL.
Rodrigues, C. (2002). Variação linguística em Porto. In Actas do Encontro Comemorativo dos 25 anos do CLUP (pp. 119-130). Universidade do Porto.
Rodrigues, C. (2000). Novos dados acerca de /#øS$C/. In Actas do XV Encontro Nacional da Associação Portuguesa de Linguística (pp. 287-299). Faro: APL.
Rodrigues, C., & Martins, F. (2000). Espaço acústico das vogais acentuadas de Braga. In Actas do XV Encontro Nacional da Associação Portuguesa de Linguística (pp. 301-315). Faro: APL.
Andrade, E. d, & Rodrigues, C. (1999). Das escolas e das culturas: história de uma sequência consonântica. In Actas do XIV Encontro Nacional da Associação Portuguesa de Linguística (Volume II, pp. 117-133). Aveiro: APL.
Rodrigues, C., & Andrade, E. d. (1999). CPE VAR (Corpus de Português Europeu - Variação). In Poster in Actas do XIV Encontro Nacional da Associação Portuguesa de Linguística (Vol. II, pp. 627-629). Aveiro: APL.
Rodrigues, C. (1994). Nova proposta de datação de três manuscritos medievais. In Actas do IX Encontro Nacional da Associação Portuguesa de Linguística (pp. 363-376). Coimbra: APL - Colibri.
Branco, A., Mendes, A., Quaresma, P., Gomes, L., Silva, J., & Teixeira, A. (2020). Infrastructure for the Science and Technology of Language PORTULAN CLARIN. In LREC 2020 Worskhop IWLTP 2020 – 1st International Workshop on Language Technology Platforms (pp. 1-7). ELRA.
Freitas, M. J. (2004). The vowel [ɨ] in the acquisition of European Portuguese. In J. van Kampen & Baauw, S. (Eds.), GALA 2003 (pp. 163-174). Utrecht: LOT.
Matos, G. (2005). Parataxe como coordenação e justaposição – evidência a partir de um caso de elipse. In Actas do XX Encontro Nacional da Associação Portuguesa de Linguística (Duarte, I.; Leiria, I. , pp. 687-699). Lisboa: Associação Portuguesa de Linguistica. Retrieved from https://apl.pt/wp-content/uploads/2017/12/2004-55.pdf
Matos, G., & Prada, E. (2005). Construções contrastivas de focalização: adversativas vs. concessivas. In Actas do XX Encontro Nacional da Associação Portuguesa de Linguística (Duarte, I.; Leiria, I.). Retrieved from https://apl.pt/wp-content/uploads/2017/12/2004-56.pdf
Matos, G. (2004). Coordenação Frásica vs. Subordinação Adverbial. In Actas do XIX Encontro Nacional da Associação Portuguesa de Linguística (Freitas, T.; Mendes, A., pp. 555-567). Lisboa: Associação Portuguesa de Linguistica. Retrieved from https://apl.pt/wp-content/uploads/2017/12/2003-45.pdf