Within the context of the SignOn project funded by the European Horizon 2020 programme, the Centre for Computational Linguistics (CCL), part of the ComForT research unit at KU Leuven, seeks to hire a PhD student to carry out research on the subject of representation learning for sign language translation.
The SignON project, which unites 17 European partners, aims to facilitate the exchange of information among deaf, hard of hearing, and hearing individuals across Europe by developing automatic sign language translation tools. Automatic sign language translation (the task of automatically translating a visual-gestural sign language utterance to an oral language utterance and vice versa) is an application that has the potential to reduce communicative barriers for millions of people. The World Health Organisation reports that there are about 466 million people in the world today with disabling hearing loss; and according to the World Federation of the Deaf over 70 million people communicate primarily via a sign language.
Sign languages are, just like verbal languages, highly structured systems governed by a set of linguistic rules. There are, however, also linguistic characteristics of signed languages that are modality specific. As a consequence, sign language translation cannot be considered as a one-to-one mapping from signs to spoken language words. Recent machine learning methods have greatly improved the state-of-the art in natural language processing applications, including the multi-modal problem of sign language translation. However, due to the inherent complexity of the task, most approaches do not favour an end-to-end approach (i.e., directly translating sign to text), but first transform the signs to an intermediate, gloss-based transcription (sign to gloss), and in a second step translate the intermediate representation to verbal language (gloss to text). Using glosses as an interface for sign to language translation is fairly successful, but also poses a number of problems. Gloss annotations are an imprecise representation of sign language; in this respect, they are often an impoverished representation that does not do justice to the complex multi-channel production of sign language.
The PhD candidate will focus on the intermediate representation that functions as an interface between sign language and verbal language in the context of sign language translation. Research will be carried out along two tracks:
- Firstly, the project will consider the development of a multi-faceted interlingual representation for sign language translation, that can function as a sufficiently rich interface between sign language and verbal language, and is tailored towards machine learning methods. Crucially, the representation needs to be sufficiently rich to capture the intricacies of elaborate, multi-channel sign language, but at the same time lenient enough to be incorporated into a classification-based optimization objective that is inherent to machine learning approaches. This task will be carried out in close cooperation with linguistically-formed sign language experts; the representation will be developed using Flemish Sign Language as a test-bed, but the resulting representational framework should be generally applicable. Additionally, the representational framework will be augmented with various knowledge-based resources (such as WordNet and FrameNet) as well as machine-learning based optimizations (i.e., informed by word and sentence embeddings).
- Secondly, the project will examine how the resulting representations can be exploited as soft constraints to improve the output predictions of a neural machine translation architecture for sign language. Specifically, the linguistic knowledge that is encoded within the representation can be used to constrain the neural network’s output probability distribution. Learning-based approaches suffer from a lack of resources: large-scale annotated sign language corpora are few and far between. As a consequence, the resulting output predictions are potentially syntactically unsound, semantically improbable, or otherwise linguistically incongruous. By augmenting the network output with representation-based constraints modeled as a priori distributions on the neural network’s output distribution, possible discrepancies can be mediated. Additionally, the knowledge encoded in the representational framework can be used to rerank the various candidates yielded by the neural network architecture.
- You hold a Master in linguistics or computer science, or equivalent education.
- You have solid programming skills.
- You exhibit excellent proficiency in English and good communication skills.
- Working knowledge of Dutch is recommended; candidates without knowledge of Dutch are welcome to apply if they are willing to learn Dutch upon arrival.
- Experience with neural networks (deep learning) for natural language processing is a plus.
- Knowledge of a sign language, particularly Flemish Sign Language, is a valuable asset.
- Candidates who are deaf or hard of hearing are particularly encouraged to apply.
- We offer a fulltime PhD position for 1 year, extendable to 4 years after initial positive evaluation.
- You will be able to conduct scientific research within a high-level research environment, leading to a doctoral degree.
- You will work in a larger project, in cooperation with other Flemish and European research groups.
- You will have the opportunity to participate in international conferences, and benefit from academic training and workshops.
To apply, please send a motivation letter, a CV and the contact details of two references with your application. For more information please contact Tim Van de Cruys, mail: firstname.lastname@example.org.
You can apply for this job no later than February 04, 2021 via the online application tool
KU Leuven seeks to foster an environment where all talents can flourish, regardless of gender, age, cultural background, nationality or impairments. If you have any questions relating to accessibility or support, please contact us at diversiteit.HR@kuleuven.be.