LOAR
Find our open datasets containing raw text from monographs, newspapers and audio recordings in our Library Open Access Repository (LOAR).

Photo: Peopleimages.com
Det Kgl. Bibliotek continually takes new initiatives to support data science.
In our Library Open Access research data Repository (LOAR), we have included the following material:
- datasets based on books printed up to 1881 (due to the 140 year copyright rule)
- datasets with Freedom of Press Writings
- a large collection of OCR (optical character recognition) text based on digitised newspapers from 1660 to 1877
- The Ruben collection which contains Denmark's first sound recordings (1889-1895)
The datasets can be used for natural language processing, text and data mining for research and teaching use.
Contact us via kb@kb.dk if you have questions about the metadata and uses of the datasets.