This lexicon is a result of a study on Natural Language Processing which main goal was the development of a semantic taxonomy to classify nominal Multiword Lexical Units (MLU) for European Portuguese (EP). Despite being built by single words, MLU don’t have a compositional meaning and have morphosyntactic restrictions. These units are so important in any text that their identification and classification is essential for information extraction and retrieval in Natural Language Processing.

We adapted and applied a semantic taxonomy, based on the Lancaster semantic lexicon1, to a list of MLU extracted from CETEMPúblico2.

The automatic extraction of MLU from CETEMPúblico was made with Unitex3. The list obtained from the automatic extraction was then manually annotated, in order to exclude non-nominal MLU, named entities and repetitions.­ The final list has 5068 nominal MLU. 

Therefore, this resource includes two lists: (i) List of Nominal MLU in EP; and (ii) List of Nominal MLU in EP Semantically Classified. In the first one, we present the nominal MLU and, in the second one, the nominal MLU semantically classified. The classified list results of the application of our semantic taxonomy adapted from Lancaster semantic lexicon to the list of nominal MLU.

 

Piao, Scott et alii (2005) "A Large Semantic Lexicon for Corpus Annotation". In Proceedings from The Corpus Linguistics Conference Series, Corpus Linguistics 2005. Birmingham.
2 http://www.linguateca.pt/cetempublico/.
3 http://www-igm.univ-mlv.fr/~unitex/.

Abalada, S., Cardoso, A., & Cabarrão, V. (2010). Proposta de Classificação Semântica de Unidades Lexicais Multipalavra Nominais. In XXV Encontro Nacional da Associação Portuguesa de Linguística. Textos Seleccionados (Ana Maria Brito, Fátima Silva, João Veloso & Alexandra Fiéis, pp. 81-94). Porto: Edições Colibri/APL.