Portuguese Lexicon of Discourse Markers
The Lexicon of Discourse Markers (LDM-PT) provides a set of lexical items in Portuguese that have the function of structuring discourse and ensuring textual cohesion and coherence at intra-sentential and inter-sentential levels. Each connective is associated to the set of its rhetorical senses, following the PDTB - Penn Discourse Treebank typology. The lexicon was created in the scope of CLUL's participation in the COST Action TextTLinK - Structuring Discourse in Multilingual Europe.
Discourse markers are taken as a broad category that includes cohesive devices and also pragmatic markers with interactional and modal meanings but the lexicon includes for now discourse connectives. We consider that discourse connectives do not vary regarding inflection, they express a two-place semantic relation, have propositional arguments and are not integrated in the predicative structure. This includes conjunctions, adverbs and phrases, but also prepositions, which we consider in our list of connectives. We also include in the lexicon Alternative Lexicalizations (AltLex), i.e, alternative expressions that denote a cohesive relation, making it redundant to supply an implicit connective in the context. For instance, the cohesive relation contrast is frequently denoted through the following AltLex: acontece que ‘it happens that’, diga-se que ‘let it be said that’, dito isso / posto isso ‘this being said’, não deixa de ser verdade que ‘it is nevertheless true that’.
The lexicon is structured as pairs of discourse connectives/rhetorical senses, so as to cover polysemous connectives. The lexicon includes at the moment 270 pairs of discourse connectives/rhetorical senses (using the three-level structure of PDTB3), with the following information:
- Properties of the connective: type (primary, secondary, alternative lexicalization); category (POS); continuous/discontinuous; single/phrasal;
- Mood selection (indicative or subjunctive);
- Modifiers of the connective;
- Equivalent English connective;
- Corpus example.
The contents and structure of the lexicon are inspired by both LEXCONN and DIMLex.
The lexicon is an excel file automatically exported to an XML structured document. It follows the main components of DIMLEX: orthographical, syntactic (including semantic information) and examples. However, the contents of each component differ due to specificities of each project.
LDM-PT has recently been integrated to a multilingual platform online that combines lexicons of French, German, Italian and Portuguese discourse connectives: Connective-Lex.info.
The lexicon is viewed as an open list that integrates both the results of the contrastive analysis between English and Portuguese discourse connectives and of our corpus annotation following the PDTB model. We believe this resource will certainly prove to be useful for applications dealing with tasks such as parsing, text processing and summarization of Portuguese.
When using the lexicon, please cite:
Mendes, Amália and Pierre Lejeune (2016) LDM-PT. A Portuguese Lexicon of Discourse Markers. In: Degand, Liesbeth, Csilla Dér, Péter Furkó and Bonnie Webber (eds.) Conference Handbook of TextLink – Structuring Discourse in Multilingual Europe Second Action Conference, Budapest, 11-14 April 2016, 89-92. [pdf]
 Prasad, Rashmi, Aravind Joshi and Bonnie Webber (2010) Realization of Discourse Relations by Other Means: Alternative Lexicalizations, in Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Beijing, 2010, 1023–1031, http://www.aclweb.org/anthology/C10-2118.
 Roze, Charlotte, Laurence Danlos and Philippe Muller (2012) Lexconn: a French lexicon of discourse connectives, Revue Discours (2012), http://discours.revues.org/8645.
 Stede, Manfred (2002) DiMLex: A Lexical Approach to Discourse Markers, in A. Lenci – V. Di Tomaso (ed.), Exploring the Lexicon - Theory and Computation, Alessandria (Italy), Edizioni dell'Orso.