MODAL-LEX-PT is a modality lexicon for Portuguese, automatically extracted from the Portuguese modality corpus (Hendrickx et al.,
2012).
The corpus contains 3,079 triggers of different POS and was used as training data in experiments in modality annotation.
We extracted the modal components of the annotated corpus. The triggers were annotated for POS using the tagger and lemmatizer described in (Généreux et al., 2012) with some post-processing adaptations.
For each trigger, information is provided on POS, lemma, modal value, polarity of the trigger, ambiguity, frequencies, and representative
examples are selected from the corpus. The lexicon contains 331 entries in XML format.
We plan to use the lexicon for the automatic tagging of modality, as a complement to the annotated corpus. As few resources are available for Portuguese on semantics and subjectivity, the lexicon provides additional data for applications in opinion mining, fact checking and sentiment analysis.
References
Généreux, M., I. Hendrickx, I. & A. Mendes (2012). Introducing the reference corpus of contemporary portuguese on-line. In Nicoletta Calzolari, et al., editors, LREC’2012 – Eighth International Conference on Language Resources and Evaluation, pages 2237–2244, Istanbul, Turkey, May. European Language Resources Association (ELRA).
Hendrickx, I., A. Mendes, A. & S. Mencarelli (2012). Modality in text: a proposal for corpus annotation. In Nicoletta Calzolari, et al., editors, LREC’2012 – Eighth International Conference on Language Resources and Evaluation, pages 1805–1812, Istanbul, Turkey, May. European Language Resources Association (ELRA).