Programme and abstracts

On November 11 and 12, 2024, Royal Danish Library cenvene the seminar Digital stories. The focal point for the two days is how the digital transformation has affected the cultural heritage sector.

Fotoopstilling til digitalisering af foto

Photo: Det Kgl. Bibliotek

During the seminar, we will examine this on both a methodological and strategic level and through a wide selection of examples from the Nordic countries.

Below you will find the programme, abstracts and the original call for papers.
 

Monday November 11, 2024 - morning

9.30-10.00Arrival, registration and coffee 
10.00-10.15WelcomeSøren Bitsch Christensen, Deputy Director (Royal Danish Library)
10.15-11.15Keynote on digital strategies: Mass digitisation in Norwegian:
A look at the digitisation project for the Norwegian National Library
Henrik Grue Bastiansen, professor (Volda University College)
11.15-11.30Break 
11.30-12.00Like grafting old varieties onto a new apple tree – the work of bringing legacy toponym data into play in modern research and dissemination?Peder Gammeltoft (University Library, University of Bergen)
12.00-12.30Digital Stories – The story of the retro-digitisation at Royal Danish LibraryStig Svenningsen, Peter Thiesen, Ulla Bøgvad Kejser and Ditte Laursen (Royal Danish Library)
12.30-1.30 p.mLunch 

Monday November 11, 2024 - afternoon

13.30-14.00Artificial intelligence can do (most of) the work - designing specialised AI software for cataloguingMichael Monefeldt (University Library of Southern Denmark)
14.00-14.30Serendipitous contextuality: Everything beyond that (retro) digitisation gives usMuhamed Fajkovic (Royal Danish Library)
14.30-15.00Break 
15.00-15.30A digital representation of a city's historySøren Bitsch Christensen (Royal Danish Library)
15.30-16.00Store it, Don't Show It: Building Sustainable Infrastructures for Digital Scholarly PublishingKatrine F. Baunvig, Krista SG Rasmussen, Kirsten Vad and Jon Tafdrup (Aarhus University)
16.00-16.30Perspectives from the day. Picking up and further discussion 
16.30-17.30Reception 

Tuesday November 12, 2024 - morning

9.00-10.00Keynote on digital methods: Digital methods developed in Link-LivesBarbara Revuelta-Eugercios, archivist and research lecturer (National Archives) and Anne Løkke, professor (University of Copenhagen)
10.00-10.30The use of cultural heritage data in interdisciplinary historical health researchMads Villefrance Perner (Roskilde University)
10.30-11.00Break 
11.00-11.30How the archives with AI and text recognition can contribute to personalised medicine and new knowledgeJeppe Klok Due (National Archives)
11.30-12.00Past contexts and latent spaces of meaning: The digitisation of absolutism newspapersJohan Heinsen and Camilla Bøgeskov (Aalborg University)
12.00-12.50Lunch 

Tuesday November 12, 2024 - afternoon

12.50-13.20Open access to the web archive: Digital text analysis of news from the webJon Tønnesen (National Library)
13.20-13.50The concept of community singing in Danish newspapersAnne Agersnap (Aarhus University)
13.50-14.20Look at my dress. Possibilities and challenges in using "computer vision" at Royal Danish Library's photographs from 1870-1950Laura Søvsø Thomasen (Royal Danish Library), Mette Kia Krabbe Meyer (Royal Danish Library), Henrik Kragh Sørensen (University of Copenhagen)
14.20-14.40Break 
14.40-15.10Commenting in a digital age. Exploration and automation of editorial comments in digital, text-critical publicationsKirsten Vad and Katrine Frøkjær Baunvig (Aarhus University)
15.10-15.50Perspectives from the day and the further process 

Practical information

The seminar takes place in the Blixen hall at Royal Danish Library in Copenhagen on November 11 to 12, 2024.

It is free to participate, but there is a limited number of places. You can register by writing to digitalehistorier@kb.dk no later than November 1, 2024.

It is possible to participate for one or two days. Registration is binding.

Abstracts

Keynote 1 on digital strategies:
Mass digitisation in Norwegian: A look at the digitisation project for the Norwegian National Library

Henrik Grue Bastiansen, professor (Volda University College)

One of the world's largest mass digitisation projects has taken place in Norway since 2006. There, the National Library has now for almost 20 years carried out a digitisation project which is unusual, even in an international context. The goal has been to digitise everything that has ever been published in Norway - in all types of media, and throughout all times.
In this lecture, Professor Henrik G. Bastiansen presents the emergence and development of this project. He wonders what digitisation has done to the historical sources and also points to the opportunities and challenges for the researchers that lie in the fact that an entire country's cultural heritage has now become available digitally. The lecture is based on Bastiansen's book "When the past becomes digital: Media, sources and history in the age of digitisation", which was published by the Norwegian University Press in 2023.

Keynote 2 on digital methods:
Digital methods developed in Link-Lives

Barbara Revuelta-Eugercios, archivist and research lecturer (National Archives) and Anne Løkke, professor (University of Copenhagen)
Link-Lives is a research project that reconstructs simple life courses for (almost) all people who lived in Denmark 1787-1968. We do this by combining domain expertise with machine learning, so that we can connect historical information about the same person from censuses, church registers and Copenhagen funeral records. The algorithms do the heavy lifting of sifting through millions of personal records to find a trustworthy match. But before you get that far, you need reliable training and test data. That is why we have developed specialised software (ALA = Assisted Linkage Application) that our team of historians, who have specialist knowledge of our sources and their context, use to link. ALA logs the linking process and provides the ability to compare links made by different linkers. In this way, we create domain-expert links that we can describe the quality of, which in turn makes it possible to measure the quality of the computer-generated links.

Link-Lives will end in the summer of 2025, but the National Archives has decided to further develop competences, methods and tools from Link-Lives. The vision is that, in the long term, an ever-growing research infrastructure, the Historical Person Register (HisPeR), will be established, which can successively integrate new historical data sets, both from the National Archives' own transcription projects and from other institutions. In the presentation, we explain how we create the life courses, disseminate our data, and give examples of which new issues can be illuminated with this infrastructure.

Like grafting old varieties onto a new apple tree – the work of bringing legacy toponym data into play in modern research and dissemination?

Peder Gammeltoft (University Library, University of Bergen)

Norway was very early in digitising central works and sources. First up was the Registration Center for Historical Data at the University of Tromsø, but digitisation efforts gained momentum in the latter half of the 1990s, when the Documentation Project (University of Oslo) and the Digital Archive (Arkivverket) began mass digitisation. In addition, universities and county municipal archives have also digitised regional collections.

These digitisation efforts are of course commendable, but one thing has been missing in the work – a common digitisation practice. This has meant that subject-specific digitisations have been created in different ways and with vastly different structures. For toponym digitisations, the result is a collection of material types, parts of which are coordinated, others have sound and images, whereas others are copies of source works, et cetera.
In toponym research circles, there has long been a desire to have a common entry for digital toponym sources with a common name display, but digitisation experts have consistently claimed that it was impossible to coordinate data in such a way. But with name-theoretical and methodological insight, this is entirely possible. The language collections at the University Library in Bergen have been working on this for the last few years - and now the result is here: Stadnamnportalen (the City Name Portal), where millions of source forms can be accessed - both as individual sources and as part of place name listings - for the benefit of research and communication. This presentation shows the way to the 'impossible' - which in principle resembles grafting fruit trees.

Digital Stories – The story of the retro-digitisation at Royal Danish Library

Stig Svenningsen, Peter Thiesen, Ulla Bøgvad Kejser and Ditte Laursen (Royal Danish Library)

The start of the digital transformation of the library sector can be traced back to the 1970s, and in the 1980s retro-digitisation of library catalogs began. In the 1990s, retro-digitisation of the physical collections followed, and later the collection of electronically born materials.

Where the digitisation of catalogs facilitated the retrieval of materials and administrative processes, retro-digitisation marked the start of a more fundamental transformation in the use of collections. The retro-digitisation of the physical collections has provided completely new opportunities for research and dissemination, while in many cases the use has been made independent of access to the physical reading rooms. The retro-digitisation removes many of the limitations of the physical world for the use of the collections across geographical and material boundaries. However, digitisation comes with a price. Very large parts of the physical collections have not been digitised, and digitisation therefore risks marginalising the physical collections, as their use is resource-intensive.

The choices that are made in connection with the selection of materials for digitisation thus have a great impact on which materials are available. However, as a user of Royal Danish Library's collections it is difficult to form an overview of the selections and non-selections, as well as technical and legal options that underlie the selection of materials. Users demand, rightly, transparent criteria for prioritising works and collections for digitisation and thus also reasons for opting out. Much of this information exists only internally within the organisation and is not well documented.

With this presentation, we want to investigate how retro-digitisation at Royal Danish Library as an expression of the digital transformation of the library sector has developed over time and how changing technical and administrative conditions have affected the retro-digitised collections that are available today on the digital library shelves.

Artificial intelligence can do (most of) the work - designing specialised AI software for cataloguing

Michael Monefeldt (University Library of Southern Denmark)

I have developed an AI powered tool, Urania. It will help to catalogue the many unregistered special collections at SDU's (Southern Danish University) Library, with the aim of improving access to the physical materials for researchers, students and other stakeholders.

With my new digital solution, the library staff simply have to take a picture of a title page with their mobile phone. The program then OCR processes the images and categorises the elements in the digitised text (title, author's name, et cetera) so that everything ends up in the right boxes in the system. Tests tell us that the tool can save the library more than 90% of the workload.

It is important to be critical of computer vision and AI-generated material, as neither technology delivers error-free results yet. And Urania was precisely developed based on the principle that we cannot have blind trust. The artificial intelligence is one element in a larger and more intelligent design, which takes into account that everything must be inspected.

It is my belief that the solution at SDU can reduce bias when it comes to meta-dating. It would require an unimaginable amount of resources to register all the collections manually, and therefore we resort to the most natural alternative: The collection managers select and register what they consider to be of greatest importance. However, each choice is also an opt-out – physical originals that could be of value to researchers remain invisible, and a great deal of research potential is lost. With Urania, it becomes much more realistic to achieve a complete registration of the collections.

Serendipitous contextuality: Everything beyond that (retro) digitisation gives us

Muhamed Fajkovic (Royal Danish Library)

Searching in digital archives often involve “luck” — we find useful results in ways that may appear “random”: It was not our goal to find these particular results, and we did not imagine they existed (1,2 ). This can be due to everything from the fact that our search actions are highly idiosyncratic, to the fact that the serendipitous is a calculated mechanism in these archives (3).

In this presentation, I would like to define and introduce a special subspecies of this type of situation and especially its effect: serendipitous contextualisation.

It happens, especially in retro-digitised archives, that we find the desired objects, but often discover that they come accompanied by all kinds of "paratextual" elements. These “paratextuals” are the serendipitous ingredient here; they are often by-products of the digitisation process and stand along the threshold of the main object, weighed down by their ambiguity: outside the object, but still an integral and inseparable part of it. So: library stamps, unrelated advertisements, a project employee's finger, which was also scanned during digitisation, and many different others — they can be funny, educational, or even very valuable; by definition, they are unpredictable and ghostly.

I will reflect on several examples of this that I have come across while working on my latest published activities (4,5). I will argue that these unexpected "gifts" can help us establish either historical, socio-cultural or literary historical context; however, my main thesis is that any attempt to work systematically with them would not only be futile, but would also mean denying the fundamentals of their being.

A digital representation of a city's history

Søren Bitsch Christensen (Royal Danish Library)

The archival task is changing for public GLAM institutions. One source of change is the theoretical and practical development of the archivist profession under the influence of the archival turn's focus on representation and decentralised archival and memory practices. Another is the expectations of the outside world with target groups that can be both broader and more specific than before. A third is the public sector's change towards economisation, self-service and co-creation. The fourth source unites all three and puts on its own turbo. It is digitisation. Both of the archive material, the processing of data, the communication and the cataloguing.

The presentation will tell about the work to create a digital strategy for a local conservation institution at Aarhus City Archives. The strategy included threads between the archive system, communication channels, accessions, public IT infrastructure and the involvement of the public and citizen scientists. Ultimately with the aim of developing and qualifying the local community's political structures, decision-making capacity, level of information, sense of community and participation.

Store it, Don't Show It: Building Sustainable Infrastructures for Digital Scholarly Publishing

Katrine F. Baunvig, Krista SG Rasmussen, Kirsten Vad and Jon Tafdrup (Aarhus University)

The scope of digital scholarly editions is comprehensive. But the projects are characterised by an (over)focus on short-term displays on various websites rather than long-term preservation of the cultural heritage material they process. Despite standardised production processes, there is thus a lack of sustainable solutions for unifying data management strategies. It leads to isolated solo projects whose future availability is uncertain.

Against this background, we encourage stakeholders from universities, GLAM institutions, foundations and politicians to collaborate on a Danish infrastructure that ensures a clear division of labor and responsibility for data production, short-term material use and long-term storage.
By aligning ourselves with FAIR principles, we aim to protect cultural heritage and encourage foundations to prioritise sustainable storage solutions as a prerequisite for support.

The use of cultural heritage data in interdisciplinary historical health research

Mads Villefrance Perner (Roskilde University)

In the hectic time during the first wave of the COVID-19 pandemic, it became clear that history has great value for health researchers and policy makers. The new pathogen was compared directly to the historically large flu outbreaks of 1918-20, 1957-58 and 2009-10, and the media picture was filled with tales of plague, cholera, and other past health crises.

At the new research center PandemiX - Center for the Interdisciplinary Study of Pandemic Signature Features, the study of historical epidemiology is a central component of the work to build a knowledge base that can prepare us for the next pandemic.

My presentation is about one of the centre's projects, which aims to make us more aware of the diseases of the past, through a social and spatial mapping of the large drop in mortality - the so-called epidemiological transition - in Copenhagen in the years approximately 1860 to 1940. For this purpose, the censuses of the National Archives and the burial records of the Copenhagen City Archives are primarily used, but also a wide selection of digitised archives, maps and printed publications, which can be used to enrich the basic data in various ways. The presentation illustrates both the opportunities and challenges of using cultural heritage data in a health science context, and at the same time emphasises how cultural heritage data can come into play in ways you might not have imagined.

How the archives with AI and text recognition can contribute to personalised medicine and new knowledge

Jeppe Klok Due (National Archives)

The digital age was kicked off in 1968 with the establishment of a unique individual ID in the CPR register. From then on, you can combine information about health, illness and social conditions for all Danes. This has made Denmark a leader in registry research, because researchers can study correlations about, for example, vaccines and side effects, which cannot be studied anywhere else in the world. But biological and historical trends and causalities go back further than 1968. For example, if researchers want to study correlations between the fetal environment represented by birth weight and the development of metabolic diseases, then they can only study it for living individuals over the age of fifty, who have only just reached the age of risk for metabolic diseases.

If you want to understand trends that go back further than the 1970s, you have to go to the National Archives, which has information on all Danes from cradle to grave. The problem is that they are relatively inaccessible on paper. The National Archives uses AI methods for image analysis and text recognition to establish historical records about the millions of individuals stored in the archives. When the information on all individuals is linked to the CPR register, they gain a completely new application potential, as researchers no longer need to look for information on individuals with characteristics, diagnoses, but can study all individuals. It will, for example, give researchers the opportunity to analyse the connection between birth weight and the development of metabolic diseases later in life for all individuals who have had a metabolic disease. If such a correlation exists, the individual will be able to obtain information about his own birth weight, whereby this can be included in a possible choice of personal treatment course or prevention.

Past contexts and latent spaces of meaning: The digitisation of absolutism newspapers

Johan Heinsen and Camilla Bøgeskov (Aalborg University)

Over the past two years, historians at AAU (Aalborg University) have worked to re-digitise the dictatorship's address books. On a standing basis, the corpus covers most major newspapers up to approximately 1830. Newspapers from Christiania and Bergen are also included until 1814. The total text material is around 380,000 pages, which have been digitised using various machine learning tools including Transkribus for layout and text recognition, as well as a combination of word2vec and randomforest for the segmentation of the recognised text. The text recognition has a high precision – c. 97% at word level. Compared to the original OCR in Mediestream (c. 50% accuracy), this is an improvement that opens up many new possibilities. The project is work in progress.

Our paper will present the work with the newspapers as well as the perspectives involved in training so-called word and paragraph embeddings on the material. Basically, digitisation projects like ours run the risk of creating entrances to collections that, by relying on keyword searches, isolate small bits of text and provide a form of contextual blindness. We hope to be able to create alternative routes through the material that open up readings informed by overlapping contexts. Embedding techniques are useful in this context because they place many elements into an abstract, compressed meaning space of numerical vectors that can be used computationally. The paper will present some examples of how this can open up context exploration.

Open access to the web archive: Digital text analysis of news from the web

Jon Tønnesen (National Library)

The National Library (NL) has since the 1990s archived enormous amounts of content from the internet.
The collection has great potential value for research and knowledge production, but access has long been limited due to copyright and privacy concerns. A central question is how the collection can be offered to more people, while at the same time meeting ethical and legal obligations.

The presentation will show the work to provide open access to a corpus of more than 1.5 million texts from online newspapers. By offering online newspaper text as data through NL's Laboratory for Digital Humanities (DH-lab), we enable remote reading on a large scale, in line with FAIR principles, while at the same time taking into account copyright and privacy.

I will first go through how the text content is retrieved from the "archive original" and transformed into a unique text object, through:
a) extraction from Web ARChive files (WARC);
b) scoping and filtering of the corpus,
c) tokenisation of text for databases.

Next, I will demonstrate how the user can tailor the corpus for their own use and
analyse texts on a large scale – both with user-friendly web apps and programmatically with notebooks against the API. The demonstration highlights some of the limitations of the approach, but also the great opportunities that open up for digital text analysis of content from the web.

In conclusion, I will discuss how collections as data provide wider access and new perspectives on online archives: Open access means that news text can be used in new contexts, such as teaching at universities. With user-friendly web apps, the threshold for remote reading of large volumes of text is also lowered, so that even non-technicians can use tools for the analysis of large collections of digitally created material.

The concept of community singing in Danish newspapers

Anne Agersnap (Aarhus University)

Royal Danish Library's newspaper archive Mediestream is an absolutely crucial resource in current research into Danish community singing culture. At the Unit for Singing Research at Aarhus University, we use the archive to investigate the cultural-historical development of the concept of community singing from 1788-2001. Previous studies of the use and history of community singing have often focused on song book publications or dealt with concrete communities that have actively used community singing to mobilise and maintain communities. In other words, they have often looked towards empiricism, which in itself has been centered around community singing as a phenomenon.

In my ongoing research, digitised newspapers allow me to observe the concept's function and development in a genre that was not created to write and interpret common song. They provide the opportunity to observe how the term "ecological" has developed over time in the public discourse and which semantic fields it moves into and out of. In my paper I will present the work of sampling, reading and analysing newspaper articles containing the words "community singing". I will highlight the opportunities and challenges of working with scanned newspapers, and I will present preliminary findings regarding the representation and development of the concept of community singing in Danish newspapers.

Look at my dress. Possibilities and challenges in using "computer vision" at Royal Danish Library's photographs from 1870-1950

Laura Søvsø Thomasen (Royal Danish Library), Mette Kia Krabbe Meyer (Royal Danish Library), Henrik Kragh Sørensen (University of Copenhagen)

Royal Danish Library's extensive digitised image archives provide the opportunity to train various models on photographs from the period 1870-1950. Although today you can quite easily access image recognition and "object detection" with large models, it has turned out that if you want to analyse older photographs, it can be a bigger challenge: A 1910s car can quite simply not be recognised as a car, and this also applies to a large extent to clothing et cetera, which looked completely different from 100 years ago.

By training an object-detection model on images from Royal Danish Library's Electric field collection (approximately 180,000 images) and Business card collection (approximately 14,000 images) we now have the opportunity to create a "vintage detector", which is trained specifically on older photographs. It will be a very useful tool in research not only in fashion, but in historical research in general. For example, in the research project "Queer women 1880-2020", where Mette Kia Krabbe Meyer investigates how women's emancipation was linked to changes in clothing. It will also have an impact on other areas. For example, a dating made on the basis of clothing can be used in research in general.
The detector will use the collections, but also eventually printed material, department store catalogues, et cetera. It will be a revolutionary tool in relation to meta-dating and searching in general. Here it can form the basis for a human-in-the-loop system, where information can be fed in and used.

In this presentation, we will show some of the possibilities and pitfalls of a vintage detector. Furthermore, we will discuss how new ways of generating and thinking about metadata also affect how to ask interesting research questions to large amounts of data. In the presentation, we illustrate the process of integrating questions, technique and data in a specific case about dresses. This leads us to rethink how metadata is produced and used.

Commenting in a digital age. Exploration and automation of publishing
comments in digital, text-critical publications

Kirsten Vad and Katrine Frøkjær Baunvig (Aarhus University)

The text-critical point commentary connects the context of the text's creation with the reader's present and serves as a demonstration of the relevance and communication ability of a scientific edition. This paper examines how artificial intelligence (AI) can be used to sharpen and improve point commentary in digital text critical editions (DSE).

It is relevant to discuss the role of point commentary in digital editions and examine how computational methods can be used in publishing work – can AI, for example, be used to generate point comments automatically? The focus will be on the publishing project Grundtvigs Værker (GV), supplemented by experiences from other Nordic projects.

GV has published 56% of the total Grundtvig corpus (N=1073). In publishing the writings of N.F.S. Grundtvig (1783-1872), we reuse several forms of data, such as information about people, places, and mythological entities, stored in databases and used across texts. Automation tools are for example used for  designation and marking of entities. All verbal comments – currently 143,316 – are manually annotated, often with repetitions of words that require explanation in their historical context.

We want to outline how AI models can identify and explain comment-requiring words to streamline the publishing process. The use of AI in the production, display and exploration of cultural heritage data has already shed new light on Grundtvig's authorship (Baunvig 2023; Baunvig and Nielbo 2022). We see great development potential within edition philology and text-critical publishing - and with this presentation we want to exemplify how both discriminative and generative AI can be implemented in a text-critical publishing process.
 

Call for papers

NB: Abstracts can no longer be submitted.