This page offers an overview of our previous research endeavours.

Completed third-party funded projects

The DFG-funded project "Development of a Model Repository and Automatic Font Recognition for OCR-D" aims to improve the recognition rates of OCR procedures for historical prints. Since existing models have usually been trained either on the basis of modern corpora or unfiltered historical corpora with a large variety of fonts, the extent to which they suited for this task is limited. By training font-specific OCR models, the aim is to improve the reliability of text recognition in image digitisations of historical prints.

More information on the project can be found in the corresponding GEPRIS entry. The project is part of our former research focus "OCR & Layout Recognition".


Previous Research Foci


OCR and Layout Recognition, i.e. the automated transformation of scans of physical text documents into machine-readable, digital documents, plays a crucial role in the Digital Humanities, especially when it comes to computational research into historical sources.


In this area we use computational approaches to digitise and analyse symbolic music (sheet music).


In quantitative drama analysis, we use different methods from the fields of NLP and text mining to facilitate a distant reading of stage plays. A particular focus of our work in this area is sentiment analysis.


