Simply complex! A multimodal and interdisciplinary approach to examine linguistic complexity within Leichte Sprache
The Research group “Einfach komplex – Leichte Sprache” works on the project which aimes at investigating the psycholinguistic and neurolinguistic correlates of Leichte Sprache, a German version of plain language. Using an interdisciplinary and innovative approach, the program is intended to advance the framework of Leichte Sprache and its rules. By applying methods such as eye-tracking (ET), electroencephalography (EEG) and functional magnetic resonance imaging (fMRI), a neurobiologically feasible model of Leichte Sprache is developed. As empirical research on the reception of Leichte Sprache has been very limited, the project examines the impact of complexity reduction on several linguistic levels, including those of syntax, morphology, lexicon and semantics.
Linked to the research group are scholarships that address outstanding graduates of linguistics, translation studies, life sciences or the like. For further information see https://leichtesprache.uni-mainz.de/.
Tra&Co — Translation and Cognition
The project Tra&Co (Translation & Cognition) funded by the GFK (Gutenberg Forschungskolleg) aims to investigate translation processes via a multimethod approach. The investigation of the processes taking place in the translators' and interpreters' “black box” and the cognitive challenges they are facing are of particular importance for this project. In empirical studies, data resulting from translation products, corpora, questionnaires, screen recordings, keylogging, eyetracking, and EEG experiments are analyzed and triangulated in order to test and support theoretical approaches in Translation Studies.
In these studies, innovative methods are used and research questions generally cover trends and developments in translation practice, e.g., new translation technologies, post-editing, easy-to-read language, media translation or revision of translation products. The research is thus situated at the interface between Translation Studies and didactics, cognitive sciences and neurosciences as well as corpus and computer linguistics. The cross-disciplinary projects and participation in PhD summer schools, symposia and conferences fosters interdisciplinary exchange.
LES is more — comprehensibility of plain language in administrative context and easy-to-read language
Comprehensible language plays an important role in the digital age. Information should be provided in a way that is barrier-free, accessible and comprehensible for everyone. However, information can be presented in many different ways. The spectrum ranges from technical communication with complex word order and convoluted syntactic structures to plain language in administrative context and easy-to-read language or very reduced presentations, e.g. in pictograms.
Accordingly, the cognitive effort required to process the information varies widely. It can be determined via analyzing eye movements (eyetracking) and via comprehensibility tests adapted to the target groups. In a reception study conducted at the department of English Linguistics and Translation Studies (Prof. Dr. Silvia Hansen-Schirra) of Johannes Gutenberg university Mainz, websites of the Ministry of Social Affairs, Labor, Health and Demography of Rhineland-Palatinate are presented to various target groups in three different levels of difficulty (technical language, plain language in administrative context and easy-to-read language). The study addressed the question which version is best suited for people with cognitive limitations, foreigners with limited knowledge of the German language and elderly people with little internet experience. By means of eye movement data (e.g., long and frequent fixations), problems and areas with potential for optimization can be identified. The study focuses on difficult linguistic structures but also on administrative jargon and aspects of usability.
CompAsS - Computer-Assisted Subtitling
With growing research interest and advances in automatic speech recognition (ASR) and neural machine translation (NMT) and their increasing application particularly in the captioning of massive open online courses, implementing these technologies in the domain of TV subtitling is considered as well. The CompAsS project aims at researching and optimizing the overall multilingual subtitling process for offline public TV programmes by developing a multimodal subtitling platform leveraging state-of-the-art ASR, NMT and cutting-edge translation management tools. Driven by scientific interest and professional experience, the outcome will reduce resources required to re-purpose high-quality creative content for new languages, allowing subtitling companies and content producers to be more competitive in the international market. Human and machine input will be combined to make the process of creating interlingual subtitles as efficient and fit for purpose as possible from uploading the original video until burning in the final subtitles. Post-editing of written texts is standard in the translation industry, but is typically not used for subtitles. By post-editing subtitles, the project hopes to make significant gains in productivity while maintaining acceptable quality standards. The planned pipeline foresees the use of ASR as a first step for automatic film transcript extraction, followed by human input, which converts the ASR texts into monolingual subtitles. These subtitles are then translated via NMT into English and the target languages (e.g., German, Swedish, French etc.) and finally post-edited. Each subprocess will be evaluated with research methodologies such as eyetracking and keylogging providing the basis for development. The project's outcome will be an innovative interface which combines video navigation, language and translation management tools and allows several subtitlers to work simultaneously on the same film, providing an end-to-end solution for releasing audiovisual material quickly in several languages by saving time and money while maintaining an acceptable level of quality.
TICQ — Translation and Interpreting Competence Questionnaire
Various empirical research fields look at translators and interpreters as key populations to test a wide range of hypotheses. However, measures of translation competence in relevant experiments are either absent or based on informal, ad hoc instruments. This scenario casts doubts on the ensuing findings and hinders comparability across studies. To address this problem, we introduce the Translation and Interpreting Competence Questionnaire (TICQ), a customizable online tool to collect quantitative and qualitative data on multiple aspects of translation and interpreting skills.
The TICQ includes three parts. Section A taps into the participants’ demographics, language acquisition history, and bi- (or multi-) lingual competence. Section B assesses translation competence and Section C focuses on interpreting competence. The latter two sections comprise self-rating scales on modality-specific skills, questions about professional practice and knowledge, and items addressing procedural aspects of key processes. Studies on translators would use sections A and B, while those targeting interpreters would use sections A and C. The TICQ is freely available via LimeSurvey. It can be customized, run online to save logs on the cloud or locally, or downloaded for pen-and-paper administration. The ability of the TICQ to discriminate between participants with more or less experience has been tested with a large group of participants from relevant populations. In brief, the TICQ offers comprehensive, fine-grained subject-level information for empirical research on translation and interpreting, while revealing the extent to which samples are comparable within and across studies.
CRITT TPR DB — Translation Process Research Database
We host and analyze the CRITT Translation Process Database (TPR-DB) which is a publicly available database of recorded translation sessions for Translation Process Research (TPR). It contains user activity data (UAD) of translators behavior collected in approximately 30 translation (and text production) studies with Translog-II and with the CASMACAT workbench. This data acquisition software logs keystrokes and gaze data during text perception and text production. The data currently amounts to more than 500 hours of text production gathered in more than 1400 sessions. In addition to the raw logging data, a post-processed database (TPR-DB) is made available which compiles this data into a set of tab separated tables that can be more easily processed by various visualization and analysis tools. We are responsible for the English-German part of the database and the extensions towards revision and spoken language.
DigiLing is a project funded by Erasmus+ with the aim of developing an innovative, internationally accepted curriculum for the formation of digital linguists, a trans-European E-Learning hub containing modules on essential issues of digital linguistics, an interdisciplinary research area on competence and knowledge acquisition in the field of understanding and processing digital linguistic content – all in order to fill the gaps on the European job market. The online courses include, among other things, linguistic data processing, corpus analyses, post-editing of machine translation and programming for linguists. Representatives of five European partner universities and one industrial partner develop one or two courses, each of which will be localized into all languages involved in the project.
PLATO — Positive Learning in the Age of Information
Every day a vast quantity of information is spread via social media and mass media, reaching billions of people all around the world. But much of the information or data circulating online are vague, unverified or simply false. Often they also conflict with fundamental democratic and humanistic values or pose problems of ethical and moral nature. Internet users generating knowledge on the basis of such information engage in “negative Learning”. Current scientific models do not offer a nuanced explanation of the phenomenon of negative Learning in the digital age. However, the spread of blurred knowledge not based on facts is a risk to human and social development. Our subproject deals with the following research question: to what extent do linguistic complexity as well as multilingualism influence the learning behavior?
The Translation Project Simulator
The teaching project funded by the GLK (Gutenberg Lehrkolleg) is a pilot project using workshops on complementary transfer of knowledge for advanced BA students of FB06. With the help of the E-learning platform moodle, BA students can work on semi authentic translation projects remotely. In doing so they gain insights into the complex translation task which – in professional practice – implies know how from the domains of project management, terminology, technology and revision. Up to now, these skills have been insufficiently imparted.
However, in this project, knowledge is transferred in accompanying workshops via experts in their respective field at FB06. The transfer takes place partly in a direct way: participants of an accompanying BA seminar are trained to supervise the project as project managers, and partly in an indirect way: the participants working as peers process and pass the information from the workshops to a groups of translators. MA students accompany the project and offer support and input for the improvement of the workshops, which will be included into the choice of modules available to students.
MULTILEX — Multilingual Lexicon Extraction from Comparable Corpora
Given large collections of parallel (i.e. translated) texts, it is a well-known technique to establish links between corresponding words across languages, thus extracting bilingual dictionaries from parallel corpora. This is done by successively applying a sentence- and a word-alignment step. However, parallel texts are a scarce resource for most language pairs involving lesser-used languages. On the other hand, human second language acquisition seems not to require the reception of large amounts of translated texts, which indicates that there must be another way of crossing the language barrier. It appears that the human capabilities are based on looking at comparable resources, i.e. texts or speech on related topics in different languages, which, however, are not translations of each other.
Comparable (written or spoken) corpora are far more common than parallel corpora, thus offering the chance to overcome the data acquisition bottleneck. Despite its cognitive motivation, in the MULTILEX-project (funded within the EU Marie Curie programme) we do not attempt to simulate the complexities of human second language acquisition, but try to show that it is possible by purely technical means to automatically extract information on word- and multiword-translations from comparable corpora. The aim is to push the boundaries of current approaches, which typically utilize correlations between co-occurrence patterns across languages, in several ways:
1. Eliminating the need for initial lexicons by using a bootstrapping approach which only requires a few seed translations.
2. Implementing a new methodology which first establishes alignments between comparable documents across languages, and then computes cross-lingual alignments between words and multiword-units.
3. Improving the quality of computed word translations by applying an interlingua approach, which, by relying on several pivot languages, allows a highly effective multi-dimensional cross-check.
4. Showing that, by looking at foreign citations, language translations can even be derived from a single monolingual text corpus.
TREC — Translation, Research, Empiricism, Cognition
TREC is a network of Translation scholars and Research groups united by their joint interest in Empiricism and the rigorous investigation of the human translation process, especially with respect to Cognition. The aim of TREC is to facilitate research and enhance comparability across studies within empirical and experimental translation research with a focus on cognition. Translating is a highly complex and demanding cognitive task and as such has been the object of empirical and experimental research since the 1980s. A variety of approaches have been tested, but empirical and experimental translation research with a focus on the process / product interface still faces methodological and theoretical challenges. There is a need for validated data collection instruments and established data analysis models, which calls for the exchange of theoretical models, research tools and available technology among researchers.
The thematic network of empirical and experimental research in translation creates a link between several research groups, and its main aim is to promote the exchange of data and information in order to improve the development of this type of research. The thematic network goals are:
1. To promote the exchange and transference of knowledge about empirical and experimental research in translation process, translation competence and translation competence acquisition.
2. To foster the cooperation among researchers working on empirical and experimental research in translation.
3. To optimize the use of methodological resources as well as technological tools to collect data for research in translation.
4. To work together towards the building of the foundations of future empirical and experimental research projects in translation.
Modelling Parameters of Cognitive Effort in Translation Production (Memento)
Over the last few decades Translation Process Research (TPR) has been prolific in the generation of hypothesis and models (e.g. De Groot, 1993; Halverson, 2003; Tirkkonen-Condit, 2005; Schaeffer & Carl, 2013; He & Li, 2015), which, among other issues, are all concerned with three interconnected questions: How are translations represented in the mind and how can the cognitive architecture be modelled? What kinds of typical translation phenomena are present in the translation product? How are translations produced? Existing translation models are either purely theoretical, or address only two of the above-mentioned research questions empirically.
Addressing these questions in an international coordinated and collaborative context, the project seeks to integrate evidence from translation product research and translation process research, and to develop an integrated cognitive model of the translating mind based on empirical evidence, which can be validated in a broad range of language combinations. Besides corpus-based research, keylogging, eye-tracking and fNIRS technologies will be used to systematically investigate the translation process.