Grammar & Resources

The group is centered on modeling linguistic knowledge, integrating interfaces between different areas of grammar and knowledge about how language is put to use. Joint work in formal phonology, lexicon, syntax and semantics allows building an integrated model of grammar, considering how it is represented in the human mind, as well as how it can be computationally modelled; work on L1 and L2 acquisition is at the core of this work. The integration of models of language representation and models of language use is achieved through the study of corpora.

The production of corpora and resources is justified by the goal of developing documentation and providing descriptions of contemporary European Portuguese, but also of understudied contact languages or varieties (Portuguese-based creoles, national varieties of Portuguese in Africa and Asia). The group also produces resources for the study of L1 and L2 acquisition in different settings. The group integrates CLARIN LP.

Research on L1 and L2 acquisition contributes to CLUL’s general purpose of effectively articulating fundamental and applied research, namely in the areas of Educational Linguistics and Clinical Linguistics.

General goals:

- To produce new resources for the study of Portuguese and Portuguese-based creoles;

- To pursue basic research on natural language modeling, integrating knowledge on interfaces between language modules;

- To continue the documentation and description of understudied creoles and new varieties of Portuguese that emerged in a context of language contact;

- To develop the study of language acquisition with an emphasis on language contact situations (see new international Heritage Language Consortium) and on the comparison between typical and atypical development;

- To explore the potential of comparative linguistics in the production of resources for translation and to promote connections with the industry in the area of translation.

 

Resources Type
A Lexicon of Child European Portuguese - CEPLEXicon Lexicon
A Portuguese Native Language Identification Dataset - NLI-PT Database
Acquisition of European Portuguese Databank - AcEP Database
Child-Adult Interaction Corpus - CAI Corpus
Child-Adult interaction European Portuguese Database
Consonantic Sequences Oral and Written Production Tasks - PORESC Tool
Controlled Portuguese - CLG Database
Corpora of PLE Corpus
Corpus Almeida - European Portuguese / French Corpus
Corpus Angolar Corpus
Corpus C-ORAL-ROM Corpus
Corpus CCF Corpus
Corpus CINTIL Corpus
Corpus Fadambo Corpus
Corpus Leiria (1991) Corpus
Corpus of Cape Verdean Portuguese Corpus
Corpus of Sri Lanka Portuguese Corpus
Corpus of the Diaries of the Portuguese Parliament annotated with PoS - PTPARL Corpus
Corpus PESTRA Corpus
Corpus Português Fundamental - Corpus PF Corpus
Corpus Principense Corpus
Corpus REDIP Corpus
Corpus Santome Corpus
Corpus SANTOS - European Portuguese Corpus
Crosslinguistic Child Phonology Project - Português Europeu - CLCP-PE Tool
Dados Orais de Cabo Verde - CV Words Database
Demo de Subespecificação e Desambiguação de Escopo Tool
Dictionary of Hindi-Portuguese-Hindi Database
Diu Indo-Portuguese Data Set Database
Learner Corpus of Portuguese L2 - COPLE2 Corpus
LT Corpus (Literary Corpus) - LT Corpus Corpus
Modality Lexicon - MODAL-LEX-PT Lexicon
Multifunctional Computational Lexicon of Contemporary Portuguese Lexicon
Named Entity Recognizer - CRPC-NER Tool
Nominal Multiword Lexical Units in European Portuguese Lexicon
NPChunks: Corpus of 1000 sentences annotated with PoS and nominal chunks - NPChunks Corpus
Online Corpus of Writing and Speech of Children in the Early Years of Schooling - EFFE-On Corpus
Online Dictionary Portuguese-Slovak/Slovak-Portuguese Database
Pereira&Freitas - EP Corpus
Person-Machine Interaction in Natural Language - INQUER Database
PhonoDis Corpus
Phonological Awareness Tasks for First Grade School Children - TCFC Tool
Portuguese Biographies - Bio-PT Database
Portuguese Corpus Annotated for Modality - MODAL Corpus
Portuguese Lexicon of Discourse Markers - LDM-PT Lexicon
Portuguese Technical Lexica - LEXTEC Lexicon
Portuguese Discourse Bank - CRPC-DB Corpus
Quotations database - CRPC-quotations Database
Ramalho – EP Corpus
Reference Corpus of Contemporary Portuguese - CRPC Corpus
Santome Structure Dataset Database
Spoken Corpus Mozambique 1986-87 - SCM Corpus
Spoken Portuguese - Geographical and Social Varieties Corpus
Vocatives in European Portuguese Corpus
Word Combination in European Portuguese - LEX-MWE-PT Lexicon
WordNet.PT Lexicon
Artigo em Atas
Rodrigues, C., Martins, F., & Brissos, F. (2016). Investigação interdisciplinar em fonética forense: estudo de caso de identidade e disfarce de voz. In Actas da II Conferência do Instituto Medicina Legal e Ciências Forenses, 29-30 de Setembro 2015, Coimbra (Vol. 2015, pp. 60-61).
Santos, A. L., Freitas, M. J., & Cardoso, A. (2016). CEPLEXicon – A Lexicon of Child European Portuguese. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 23-28, Portorož, Slovenia (Calzolari, N. and K. Choukri and T. Declerck and S. Goggi and M. Grobelnik and B. Maegaard and J. Mariani and H. Mazo and A. Moreno and J. Odijk and S. Piperidis).
Sequeira, J., Gonçalves, T., Quaresma, P., Mendes, A., & Hendrickx, I. (2016). Using syntactic and semantic features for classifying modal values in the Portuguese language. In Proceedings of CICLing-16, 17th international Conference on Intelligent Text Processing and Computational Linguistics. Lecture Notes in Computer Science. Springer.
Comparin, L., & Mendes, S. (2017). Using error annotation to evaluate machine translation and human post-editing in a business environment. In 20th Annual Conference of the European Association for Machine Translation – EAMT 2017 (pp. 68-73). Praga, República Checa. Retrieved from https://pdfs.semanticscholar.org/c9d2/8db57b3cedfd75a2fe694dcc59ba8caf7029.pdf
Comparin, L., & Mendes, S. (2017). Error detection and error correction for improving quality in machine translation and human post-editing. In 20th International Conference on Intelligent Text Processing and Computational Linguistics – CICLing 2017, reprinted in International Journal of Computer Applications. Retrieved from https://repositorio.ul.pt/bitstream/10451/33007/1/error%20detection_Comparin%26Mendes2017.pdf
Mendes, A., Antunes, S., & Quaresma, P. (2017). The Annotation Coreference Task at IberEval’2017: The experience of CLUL/UE. In Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval’2017), co-located with 33rd Conference of the Spanish Society for Natural Language Processing (SEPLN 2017). Murcia, Spain.
Quaresma, P., Mendes, A., Hendrickx, I., & Gonçalves, T. (2014). Tagging and Labeling Portuguese Modal Verbs. In J. Baptista & Mamede, N. (Eds.), PROPOR 2014 - LNCS 3960. Springer-Verlag. (Original work published oct)
Antunes, S., & Mendes, A. (2014). An Evaluation of the Role of Statistical Measures and Frequency for MWE Identification. In Proceedings of the Ninth International Conference on Language Resources and Evaluation – LREC’14, May 26-31, Reykjavik, Iceland (pp. 4046-4051).
Hagemeijer, T., Généreux, M., Hendrickx, I., Mendes, A., Tiny, A., & Zamora, A. (2014). The Gulf of Guinea Creole Corpora. In Proceedings of the Ninth International Conference on Language Resources and Evaluation – LREC’14, May 26-31, Reykjavik, Iceland (pp. 523-529).
Quaresma, P., Mendes, A., Hendrickx, I., & Gonçalves, T. (2014). Automatic tagging of modality: identifying triggers and modal values. In H. Bunt (Ed.), Proceedings 10th Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation (pp. 95-101).
Généreux, M., Mendes, A., & Hamon, T. (2013). Experiments in synonymy: weakly supervised term matching to concepts. In Proceedings of the 10th International Conference on Terminology and Artificial Intelligence (pp. 181-184). (Original work published oct)
Antunes, S., & Mendes, A. (2013). MWE in Portuguese: proposal for a typology for annotation in running text. In The 9th Workshop on Multiword Expressions (MWE 2013), Workshop at NAACL 2013. (Original work published jun)
Mendes, A., Hendrickx, I., Salgueiro, A., & Ávila, L. (2013). Annotating the Interaction between Modality and Focus: the case of exclusive particles. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse (LAW VII), Association for Computational Linguistics, August 8-9 2013, Sofia, Bulgaria (pp. 228-237).
Mendes, A., Généreux, M., Hendrickx, I., Pereira, L., Bacelar do Nascimento, M. F., & Antunes, S. (2012). CQPWeb: Uma nova plataforma de pesquisa para o CRPC. In A. L. Costa, Flores, C., & Alexandre, N. (Eds.), XXVII Encontro Nacional da Associação Portuguesa de Linguística. Textos Seleccionados 2011. Lisboa: APL.
Hendrickx, I., Mendes, A., & Mencarelli, S. (2012). Modality in Text: a proposal for corpus annotation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation - LREC 2012, May 21-27 2012, Istanbul (pp. 1805-1812).
Généreux, M., Hendrickx, I., & Mendes, A. (2012). Introducing the Reference Corpus of Contemporary Portuguese On-Line. In Proceedings of the Eighth International Conference on Language Resources and Evaluation - LREC 2012, May 21-27 2012, Istanbul (pp. 2237-2244).
Généreux, M., Hendrickx, I., & Mendes, A. (2012). A Large Portuguese Corpus On-Line : Cleaning and Preprocessing. In Computational Processing of the Portuguese Language. Proceedings of the 10th International Conference PROPOR1012 (H. Caseli et al., pp. 113-120). Berlin, Heidelberg: Springer-Verlag.
Miguel, M., Mendes, A., & Mota, M. A. (2012). Fenómenos de concordância em variedades do português: construções com verbos copulativos e com verbos transitivos predicativos. In La lengua, lugar de encuentro, Actas del XVI Congresso Internacional de la ALFAL (Cestero Mancera, A. M., I. M. Martos, F. P. Garcia).
Hendrickx, I., Mendes, A., Pereira, S., Gonçalves, A., & Duarte, I. (2010). Complex Predicates annotation in a corpus of Portuguese. In Proceedings of the fourth Linguistic Annotation Workshop (LAW IV), Association for Computational Linguistics, Uppsala, Sweden (pp. 100-108).
Hendrickx, I., Mendes, A., & Antunes, S. (2010). Proposal for Multi-word Expression annotation in running text. In Proceedings of the fourth Linguistic Annotation Workshop (LAW IV), Association for Computational Linguistics, Uppsala, Sweden (pp. 152-156).
Généreux, M., Mendes, A., Bacelar do Nascimento, M. F., & Pereira, L. (2010). Lexical analysis of pre and post revolution discourse in Portugal. In Proceedings of the Third Workshop on Building Comparable Corpora, 7th International Conference on Language Resources and Evaluation (LREC 2010), Malta.
Gonçalves, A., Oliveira, F., Miguel, M., Mendes, A., Cunha, L. F., Silvano, P., et al. (2010). Propriedades Predicativas dos Verbos Leves Dar, Ter e Fazer: Estrutura Argumental e Eventiva. In P. C. López, Ansoar, S. C., Quiroga, B. D., López, I. F., & Varela, L. Z. (Eds.), Actas del XXXIX Simpósio de la Sociedad Española de Lingüística. Santiago de Compostela: Unidixital (CD-Rom).
Mendes, A., & Pereira, S. (2010). Anotação de predicados complexos num corpus de português. In Actas del XXXIX Simpósio de la Sociedad Española de Lingüística (P. C. López, S. C. Ansoar, B. D. Quiroga, I. F. López, L. Z. Varela). Santiago de Compostela: Unidixital (CD-Rom).
Duarte, I., Colaço, M., Gonçalves, A., Mendes, A., & Miguel, M. (2009). Predicados complexos do tipo "verbo leve-nome derivado": uma análise baseada em corpora. In D. da Hora (Ed.), Anais do VI Congresso Internacional da Abralin (D. da Hora, pp. 1858-1867). Idéia.
Mendes, A., Bacelar do Nascimento, M. F., Estrela, A., & Pereira, L. (2008). Corpus annotation and lexical analysis of African varieties of Portuguese. In V. Lyding (Ed.), Proceedings of LULCL II - Lesser Used Languages and Computer Linguistics (V. Lyding, pp. 43-57). Bolzano: Institute for Specialised Communication and Multilingualism.
Bacelar do Nascimento, M. F., Estrela, A., Mendes, A., & Pereira, L. (2008). On the use of comparable corpora of African varieties of Portuguese for linguistic description and teaching/learning applications. In P. Zweigenbaum (Ed.), Proceedings of the Workshop on Building and Using Comparable Corpora. VI Language Resources and Evaluation Conference - LREC2008 (P. Zweigenbaum et al., pp. 39-46). Marrakech.
Barreto, F., Branco, A., Ferreira, E., Mendes, A., Bacelar do Nascimento, M. F., Nunes, F., & Silva, J. R. (2006). Linguistic Resources and Software for Shallow Processing. In F. Oliveira, Barbosa, J., & Oliveira, F. (Eds.), Actas do XXI Encontro Nacional de Linguística (pp. 203-217). Lisboa: Associação Portuguesa de Linguística.
Antunes, S., Bacelar do Nascimento, M. F., Casteleiro, J. M., Mendes, A., Pereira, L., & Sá, T. (2006). A Lexical Database of Portuguese Multiword Expressions. In R. Vieira (Ed.), PROPOR 2006 - LNCS 3960 (pp. 238-243). Berlin: Springer-Verlag.
Barreto, F., Branco, A., Ferreira, E., Mendes, A., Bacelar do Nascimento, M. F., Nunes, F., & Silva, J. R. (2006). Open Resources and Tools for the Shallow Processing of Portuguese: the TagShare project. In Proceedings of the V International Conference on Language Resources and Evaluation - LREC2006, May 22-28 2006, Genoa.
Mendes, A., Antunes, S., Bacelar do Nascimento, M. F., Casteleiro, J. M., Pereira, L., & Sá, T. (2006). COMBINA-PT: a Large Corpus-extracted and Hand-checked Lexical Database of Portuguese Multiword Expressions. In Proceedings of the V International Conference on Language Resources and Evaluation - LREC2006, May 22-28 2006, Genoa.