Amália Mendes
José Aires

ParlaMint II is an extension of the ParlaMint project, and its goal is to include new corpora of parliamentary data for languages not considered in the first part of the project, such as Portuguese, and to improve existing corpora.

Coordinators of ParlaMint II: Maciej Ogrodniczuk (Institute of Computer Science, Polish Academy of Sciences) and Petya Osenova (Institute of Information and Communication Technologies, Bulgaria)

One of the most important characteristics of parliamentary data is its direct correspondence to the most recent events, including the ones with global impact on human health, social life and economics, such as the current COVID-19 pandemic. By comparing the data synchronically and diachronically in a cross-lingual context, scientific and civil communities from various disciplines are able to track the pan-European discussion.

The project provides data for focused observations on trends, opinions, decisions on lockdowns and restrictive measures as well as on the consequences with respect to health, medical care systems, employment, etc. in times of emergencies. For the ParlaMint project, the emergency case is the COVID-19 pandemic. However, the methodology is scalable to other events as well, such as economic crises, environmental issues, etc.

The first ParlaMint project produced uniformly sampled, annotated and encoded comparable parliamentary corpora for 17 European countries. The corpora, which comprise reference and COVID-19 sections, contain rich metadata about the mandates, sessions, and speakers and their political party affiliations etc., are linguistically annotated for NER and Universal Dependencies morphological features and syntax, and encoded to a common and very strict schema, so their format is not merely interchangeable but also interoperable. The corpora have been released under the CC BY licence in the scope of the CLARIN.SI B-centre repository in three versions (1.0 with initial 4 languages, 2.0 with 16 languages, and 2.1 with corrected errata and 17 languages ( and, not only in their source XML TEI format, but also in a number of derived and immediately useful formats. The ParlaMint corpora have also been, over and above what has been proposed in the project proposal, used in the DHH Hackathon, giving them increased visibility as well as providing useful feedback for the structure of the final version 2.1 corpora of the project. The project has thus produced a novel and highly valuable resource for a broad range of comparative trans-national SSH studies that is openly available and has already proved itself in practice.

Due to the prolonged pandemics the COVID-19 section of the existing corpora needs to be extended with new data. Also, parliamentary corpora of new countries and languages should be added and current ones extended. CLUL will be participating in the ParlaMint II Project to produce a corpus of the Parliamentary sessions (Diário da Assembleia da República) for the period 2015-01-01 to 2022-02-01.

