DUPLEX - Doubles and Expletives in European Portuguese Dialect Syntax

DUPLEX

DUPLEX - Doubles and Expletives in European Portuguese Dialect Syntax

Concluded

Date

01 January 2008 - 30 September 2011

Reference

PTDC/LIN/71559/2006

Funding institution

FCT – Fundação para a Ciência e a Tecnologia

Project PI

Ernestina Carrilho

Grupo

Dialectology and Diachrony

Description
Team

The project DUPLEX aimed at promoting the study of European Portuguese dialect syntax by means of a twofold approach:

(i) implementation of an online linguistic resource feeding the empirical demands of dialect syntax;

(ii) theoretically-oriented investigation of concerted topics in Portuguese dialect syntax, focused on doubling and expletive constructions within a Principles and Parameters perspective.

This project contributed to the research developed within the project Syntax-oriented Corpus of Portuguese Dialects (CORDIAL-SIN): firstly, as an enhancement of the compiled dialectal corpus (CORDIAL-SIN 2007, 2010), which is now provided with sentence-based annotation for syntactic structure, thus becoming a more efficient resource for the purpose of studying syntax; secondly, as an in-depth study of a selection of topics in the domain of Portuguese dialect syntax.

The research developed within DUPLEX shapes the team's collaboration at international fora on dialect syntax, namely the European network of (dialect) syntacticians established by the Edisyn project (Meertens Institute, Amsterdam) and the Wedisyn network (Dialect Syntax in Westmost Europe).

Project description:

DUPLEX is a three-year project (2008-2010) aimed at promoting the study of European Portuguese dialect syntax by means of a twofold approach: (i) implementation of an online linguistic resource feeding the empirical demands of dialect syntax; (ii) theoretically-oriented investigation of concerted topics in Portuguese dialect syntax, focused on doubling and expletive constructions within a Principles and Parameters perspective.

The project extends the research developed within the projects CORDIAL-SIN (PRAXIS XXI/P/PLP/113046/1998), CORDIAL-SIN-2 (POSI/1999/PLP/33275) and Dialectal Syntax (POCTI/LIN/46980/2002): firstly, as an enhancement of the compiled dialectal corpus (CORDIAL‑SIN), which is now provided with sentence-based annotation for syntactic structure, thus becoming a more efficient resource for the purpose of studying syntax; secondly, as an in-depth study of a selection of topics in the domain of Portuguese dialect syntax.

The syntactic annotation of CORDIAL-SIN

The compilation of the Syntax-Oriented Corpus of Portuguese Dialects (CORDIAL-SIN) was achieved in 2007. This corpus is based on a geographically representative body of selected excerpts of spontaneous and semi-directed speech, drawn from the rich recorded speech collection gathered by the ATLAS research team of the group Dialectology and Diachrony at CLUL. Since 2007, the corpus (600,000 words) is available online for download, by geographical location and under three different formats: verbatim transcripts, normalized transcripts, part-of-speech tagged files. One of the aims of the project DUPLEX is to make CORDIAL-SIN available as a parsed corpus.

The syntactic annotation is implemented over CORDIAL-SIN part-of-speech tagged texts. The annotation system, in the Penn Treebank format, has been set up in close collaboration with other research groups engaged in the building up of syntactically annotated corpora, namely the Tycho Brahe Corpus and the Penn Parsed Corpora of Historical English. The annotation results in a tree representation in the form of labeled brackets, marking constituent boundaries, phrase and clause dependencies, sentence types, grammatical relations, null categories and some transformational relations (see CORDIAL-SIN Syntactic Annotation Manual). The CORDIAL-SIN annotation system also counts with a set of labels/annotation conventions for pragmatic units. The syntactically annotated corpus allows automatic searches for syntactic structure through the search program CorpusSearch2, written by Beth Randall (open source software, downloadable from Sourceforge), which is compatible with syntactic annotations in the Penn Treebank format.

CORDIAL-SIN is searchable online through the Edisyn Search Engine

The syntactic annotation of CORDIAL-SIN is released in a phased manner. The present release provides access to the annotated texts of the thirty-three locations marked on the map below.

Download zip-file – annotated data in labeled bracketing format.

The annotated texts are automatically searchable with CorpusSearch, a search engine for parsed corpora developed by Beth Randall (open source software, downloadable from Sourceforge).

Documentation: CORDIAL-SIN Syntactic Annotation System Manual

1.	VPA	Vila Praia de Âncora (Viana do Castelo)
2.	CTL	Castro Laboreiro (Viana do Castelo)
3.	PFT	Perafita (Vila Real)
4.	AAL	Cast.Vide, Porto da Esp., S. Salv. Aramenha, Sapeira, Alpalhão, Nisa (Portalegre)
5.	PAL	Porches, Alte (Faro)
6.	CLC	Câmara de Lobos, Caniçal (Funchal)
7.	PST	Camacha, Tanque (Funchal)
8.	MST	Monsanto (Castelo Branco)
9.	FLF	Fajãzinha (Horta)
10.	MIG	Ponta Garça (Ponta Delgada)
11.	OUT	Outeiro (Bragança)
12.	CBV	Cabeço de Vide (Portalegre)
13	MIN	Arcos de Valdevez, Bade, S. Lourenço da Montaria (Viana do Castelo)
14.	FIG	Figueiró da Serra (Guarda)
15.	ALV	Alvor (Faro)
16.	SRP	Serpa (Beja)
17.	LVR	Lavre (Évora)
18.	ALC	Alcochete (Setúbal)
19.	COV	Covo (Aveiro)
21	PVC	Porto de Vacas (Coimbra)
22	EXB	Enxara do Bispo (Lisboa)
24.	MTM	Moita do Martinho (Leiria)
25.	LAR	Larinha (Bragança)
27.	FIS	Fiscal (Braga)
28.	GIA	Gião (Porto)
30.	UNS	Unhais da Serra (Castelo Branco)
31.	VPC	Vila Pouca do Campo (Coimbra)
34.	GRC	Graciosa (Angra do Heroísmo)
36.	STA	Santo André (Vila Real)
38.	CLH	Calheta (Angra do Heroísmo)
39.	CPT	Carrapatelo (Évora)
41.	STE	Santo Espírito (Ponta Delgada)
42.	CDR	Cedros (Horta)