EP-Poland : building a bilingual parallel corpus for interpreting research
This paper reports on the process of building the EP-Poland corpus and on the first empirical applications thereof. This extensive bidirectional English-Polish corpus of original parliamentary contributions paired with professional simultaneous interpretations includes 11 European Parliament deba...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Penerbit Universiti Kebangsaan Malaysia
2022
|
Online Access: | http://journalarticle.ukm.my/18572/1/46647-178358-1-PB.pdf http://journalarticle.ukm.my/18572/ https://ejournal.ukm.my/gema/issue/view/1467 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper reports on the process of building the EP-Poland corpus and on the first empirical
applications thereof. This extensive bidirectional English-Polish corpus of original
parliamentary contributions paired with professional simultaneous interpretations includes 11
European Parliament debates held between January 2016 and February 2020. The main topic
of these debates is the rule of law crisis triggered by the Law and Justice government in Poland.
The corpus contains over 157,000 tokens and about 20 h 45 min of recordings, counting both
source and target texts. The two interpreting directions (English-Polish and Polish-English) are
represented almost evenly. The annotation of the corpus completed so far includes mark-up
information, POS tagging, labelling disfluency phenomena, and all forms of explicitating
shifts. Manual annotation for personal deixis is in progress. An additional interesting feature is
the speaker identification performed employing the X-vector method, which allowed us to
identify 36 interpreters. We begin with an overview of the existing interpreting corpora. Then
we proceed to explain the design features of the EP-Poland and report on two completed
empirical studies analysing idiosyncratic interpreting behaviour. We conclude by outlining
future development pathways and offering some remarks on corpus significance and its
limitations. |
---|