Research Strand: Open Data

The Living Archives project is structured through parallel and complementary research strands: Performing Memory and Open Data. The Open Data strand of the Living Archives project takes a broad approach to open data in the context of archives. It explores openness in the context of archival content and metadata, both from an analytical perspective, participatory processes and the necessary technical architecture, where data is not just flowing from the content- and data owners to the public, but where the public participates actively in the creation of content and metadata.

A sketch of the field: Open Data

According to the Open Data Handbook, open data is

data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.

Government data is currently being made available by many national, regional and local governments, as well as the United Nations and the World Bank. Example projects related to open data include Europeana Linked Open Data with open metadata on 2.4 million texts, images, videos and sounds; and Open Archives focussing on interoperability standards for digital resources. Common usages of open data are dedicated applications, as well as visualisations and mashups. Examples include tools for exploring public spending (e.g., openspending.org) and mashups for visualizing crime statistics with the intention to dispel myths about crime. With more relevance to Living Archive’s participatory and performative approach to data, Oomen and Aroyo (2011) describe different ways of using crowdsourcing in the cultural heritage domain: correction and transcription tasks, contextualisation, complementing collections, classification, co-curation, and crowdfunding. One example of this is Waisda? (slang for the Dutch equivalent of “What‟s that?‟), which is a multi-player video labeling game, developed by the Dutch national broadcasting archive (Gligorov et al. 2010).

While there are many government mandates to open data to the public, there are not enough ways for the average citizen to engage with or use the data and agencies often won’t encourage it. There is also a need to account for explicit and implicit crowdsourcing: an example of explicit crowdsourcing is when someone asks us to collect and share film in a certain location on a specific date (e.g. onedayonearth.org); an example of implicit crowdsourcing is the way automated “check ins‟ can be the basic building blocks of a world-wide archive (e.g. Foursquare entertainment guide). Project member Marie Gustafsson Friberger is currently part of an initiative aiming at providing open government data in the region of Skåne. Further, she has experience with Semantic Web technologies (e.g., Falkman et al. 2008), which are a set of advanced methods for making open data available, as well as with user motivation in knowledge sharing (Gustafsson 2008). She is currently experimenting with how open data can be used as game content (Gustafsson Friberger and Togelius, 2012).

Research Strand: Open Data

The open data strand of the Living Archives project takes a broad approach to open data in the context of archives. It explores openness in the context of archival content and metadata, both from an analytical perspective, participatory processes and the necessary technical architecture, where data is not just flowing from the content- and data owners to the public, but where the public participates actively in the creation of content and metadata.

METADATA ARE ESSENTIAL TO RETRIEVE MEANING – Items in archives are usually accompanied by metadata, such as a location, date or person depicted. Such metadata are essential to being able to retrieve items and determining their context. Professional annotators, often using controlled vocabularies, have traditionally carried out the creation of such metadata. This has drawbacks, such as a limited number of annotators and the vocabularies used often not being aligned with what the general public would use. It is also difficult or impossible to add new metadata widely as interests in a particular archive change. Furthermore, this process only captures the perspective of one individual or institution. Much work has been carried out into machine-generated metadata and while this is often added, it is inherently limited in that it cannot capture the meaning or personal experience that an audience member or citizen attaches to a piece of material.

OPENING UP THE PROCESS OF METADATA CREATION – Within this project, we examine how to open up the process of metadata creation by building a platform for collaborative metadata generation. One aspect of this is the process of generating the metadata, where we want to explore how to combine the participation of the public, through for example crowdsourcing or gamification approaches, while also keeping the expert in the loop and making use of algorithmic methods for, e.g., image recognition. Another aspect is how to represent the metadata, which involves mapping the vocabulary used by the public (probably using a folksonomy approach) to the vocabulary currently used by expert annotators, as well as the technical infrastructure for this. One final aspect that cannot be neglected is that of ownership and control – particularly where new intellectual property is generated or derived.

LINK ARCHIVAL CONTENT TO OPEN DATA – When it comes to opening the content of archives, one aspect will be to work closely with the archive owners with material related to the city of Malmö, to identify how the material can be made available and how to provide appropriate licensing. Related to the concept of opening up the archival content is to link this to related open data. We believe that open data, both current and historical, can be a part of making archives coming alive. To do this, there is a need for both relevant data as well as technical and conceptual means for linking the data with the archival contents.

By opening up archival contents, generating metadata relevant to the public, and providing linkages to open government and historical data, we can explore how to use these as building blocks for using archives in new and innovative ways. This can involve tools to help people with varying expertise in technology and the archive material itself to, for example, find, combine, annotate, and curate material that is relevant to them. It can also relate to performative aspects. Through these activities and prototypes, we will also be investigating what makes the public engage with data and archival content. Building functioning crowdsourcing systems is not trivial. Through the open data strand, we will examine these issues for the domains of open data and archives.

A final and important component of opening archives is to explore how to open up the process of contributing to existing archives or facilitating new groups in building their own archives, for example by creating prototypes for how this can be done. This is consistent with the grassroots approach and crowdsourcing methods taken by the project as a whole. There is potential for the open data strand to have journalistic and activist outcomes, which we may also explore.

Read more about the other research strand: Performing Memory.

Comments are closed.