Speech technology and colorization for audiovisual archives

[This post describes and is based on Rudy Marsman‘s MSc thesis and is partly based on a Dutch blog post by him]

The Netherlands Institute for Sound and Vision (NISV) archives Dutch broadcast TV and makes it available to researchers, professionals and the general public. One subset are the Polygoonjournaals (Public News broadcasts) that are published under open licenses as part of the OpenImages platform. NISV is also interested in exploring new ways and technologies to make interaction with the material easier and to increase exposure to their archives. In this context, Rudy explored two options.

Two stills from the film ‘Steegjes‘, with the right frame colorized. Source: Polygoon-Profilti (producent) / Nederlands Instituut voor Beeld en Geluid  / colorized by Rudy Marsman, CC BY-SA

One part of the research was the autonomous colorization of old black-and-white video footage using Neural Networks. Rudy used a pre-trained NN (Zhang et al 2016) that is able to colorize black and white images. Rudy developed a program to split videos into frames, colorize the individual frames using the NN and then ‘stitch’ them back together into colorized videos. The stunning results were very well received by NISV employees. Examples are shown below.

Tour de France 1954 (colorized by Rudy Marsman in 2016), Polygoon-Profilti (producent) / Nederlands Instituut voor Beeld en Geluid (beheerder), CC-BY SA

Results from the comparison of the different variants of the method on different corpora
Results from the comparison of the different variants of the method on different corpora

In the other part of his research, Rudy investigated to what extent the existing news broadcast corpus, with a voice-overs from the famous Philip Bloemendal  can be used to develop a modern text-to-speech engine with his voice. To do so he have mainly focused on natural language processing and the determination to what extent the language used by Bloemendal in the 1970s is still comparable enough to contemporary Dutch.

Rudy used precompiled automatic speech recognition (ASR) results to match words to sounds and developed a slot-and-filler text-to-speech system based on this. To increase the limited vocabulary, he implemented a number of strategies, including term-expansion through the use of Open Dutch Wordnet and smart decompounding (this mostly works for Dutch, mapping ‘sinterklaasoptocht’ to ‘sinterklaas’ and ‘optocht’. The different strategies were compared to a baseline. Rudy found that a combination of the two resulted in the best performance (see figure). For more information:

Share This:

ArchiMediaL proposal granted by Volkswagen Stiftung

Volkswagen stiftung letterI received a good news letter from Volkswagen Stiftung who decided to award us a research grant for a 3-year Digital Humanities project named “ArchiMediaL” around architectural history. This project will be a collaboration between  architecture historians from TU Delft,  computer scientists from TU Delft and VU-Web and Media. A number of German scholars will also be involved as domain experts. The project will combine image analysis software with crowdsourcing and semantic linking to create networks of visual resources which will foster understanding of understudied areas in architectural history.
From the proposal:In the mind of the expert or everyday user, the project detaches the digital images from its existence as a single artifact and includes it into a global network of visual sources, without disconnecting it from its provenance. The project that expands the framework of hermeneutic analysis through a quantitative reference system, in which discipline-specific canons and limitations are questions. For the dialogue between the history of architecture and urban form this means a careful balancing of qualitative and quantitative information and of negotiating new methodological approaches for future investigation.

Share This:

The Role of Narratives in DIVE

[This post is based on Maartje Kruijt‘s Media Studies Bachelor thesis: “Supporting exploratory search with features, visualizations, and interface design: a theoretical framework“.]

In today’s network society there is a growing need to share, integrate and search in collections of various libraries, archives and museums. For researchers interpreting these interconnected media collections, tools need to be developed.  In the exploratory phase of research the media researcher has no clear focus and is uncertain what to look for in an integrated collection. Data Visualization technology can be used to support strategies and tactics of interest in doing exploratory research

Dive screenshotThe DIVE tool is an event-based linked media browser that allows researchers to explore interconnected events, media objects, people, places and concepts (see screenshot). Maartje Kruijt’s research project involved investigating to what extent and in what way the construction of narratives can be made possible in DIVE, in such a way that it contributes to the interpretation process of researchers. Such narratives can be either automatically generated on the basis of existing event-event relationships, or be constructed  manually by researchers.

The research proposes an extension of the DIVE tool where selections made during the exploratory phase can be presented in narrative form. This allows researchers to publish the narrative, but also share narratives or reuse other people’s narratives. The interactive presentation of a narrative is complementary to the presentation in a text, but it can serve as a starting point for further exploration of other researchers who make use of the DIVE browser.

Within DIVE and Clariah, we are currently extending the user interface based on the recommendations made in the context of this thesis. You can read more about it in Maartje Kruijt’s thesis (Dutch). The user stories that describe the needs of media researchers are descibed in English and found in Appendix I.

Share This:

CLARIAH Linked Data Workshop

[This blog post is co-written with Marieke van Erp and Rinke Hoekstra and is cross-posted from the Clariah website]

Linked Data, RDF and Semantic Web are popular buzzwords in tech-land and within CLARIAH. But they may not be familiar to everyone within CLARIAH. On 12 september, CLARIAH therefore organized a workshop at the Vrije Universiteit Amsterdam to discuss the use of Linked Data as technology for connecting data across the different CLARIAH work packages (WP3 linguistics, WP4 structured data and WP5 multimedia).

Great turnout at Clariah LOD workshop

The goal of the workshop was twofold. First of all, to give an overview from the ‘tech’ side of these concepts and show how they are currently employed in the different work packages. At the same time we wanted to hear from Arts and Humanities researchers how these technologies would best suit their research and how CLARIAH can support them in familiarising themselves with Semantic Web tools and data.

The workshop
Monday afternoon, at 13:00 sharp, around 40 people showed up for the workshop at the Boelelaan in Amsterdam. The workshop included plenary presentations that laid the groundwork for discussions in smaller groups centred around the different types of data from the different WPs (raw collective notes can be found on this piratepad).

Rinke Hoekstra presented an Introduction Linked Data: What is it, how does it compare to other technologies and what is its potential for CLARIAH. [Slides]
In the discussion that followed, some concerns about the potential for Linked Data to deal with data provenance and data quality were discussed.
After this, three humanities researchers from each of the work packages discussed experiences, opportunities, and challenges around Linked Data. Our “Linked Data Champions” of this day were:

  • WP3: Piek Vossen (Vrije Universiteit Amsterdam) [Slides]
  • WP4: Richard Zijdeman (International Institute of Social History)
  • WP5: Kaspar Beelen and Liliana Melgar (University of Amsterdam) [Slides]

Marieke van Erp, Rinke Hoekstra and Victor de Boer then discussed how Linked Data is currently being produced in the different work packages and showed an example of how these could be integrated (see image). [Slides]. If you want to try these out yourself, here are some example SPARQL queries to play with.hisco integrated data example

Break out sessions
Finally, in the break out sessions, the implications and challenges for the individual work packages were further discussed.

  • For WP3, the discussion focused on formats. There are manynatural language annotation formats used, some with a long history, and these formats are often very closely connected to text analysis software. One of the reasons it may not be useful to WP3 to convert all tools and data to RDF is that performance cannot be guaranteed, and in some cases has already been proven to not be preserved when doing certain text analysis tasks in RDF. However, converting certain annotations, i.e. end results of processing to RDF could be useful here. We further talked about different types of use cases for WP3 that include LOD.
  • The WP4 break-out session consisted of about a dozen researchers, representing all working packages. The focus of the talk was on the expectations of the tools and data that were demonstrated throughout the day. Various persons were interested to apply QBer, the tool that allows one to turn csv files into Linked Data. The really exciting bit about this, is that the interest was shared by persons outside WP4, thus from persons usually working with text or audio-video sources. This does not just signal the interest in interdisciplinary research, but also the interest for research based on various data types. A second issue discussed was the need for vocabularies ((hierarchical) lists of standard terms). For various research fields such vocabularies do not yet exist. While some vocabularies can be derived relatively easily from existing standards that experts use, it will prove more difficult for a large range of variables. The final issue discussed was the quality of datasets. Should tools be able to handle ‘messy’ data? The audience agreed that data cleaning is the responsibility of the researcher, but that tools should be accompanied by guidelines on the expected format of the datafile.
  • In the WP5 discussion, issues around data privacy and copyrights were discussed as well as how memory institutions and individual researchers can be persuaded to make their data available as LOD (see image).

wp5 result

The day ended with some final considerations and some well-deserved drinks.

Share This:

Crowd- and nichesourcing for film and media scholars

[This post describes Aschwin Stacia‘s MSc. project and is based on his thesis]

There are many online and private film collections that lack structured annotations to facilitate retrieval. In his Master project work, Aschwin Stacia explored the effectiveness of a crowd-and nichesourced film tagging platform,  around a subset of the Eye Open Beelden film collection.

Specifically, the project aimed at soliciting annotations appropriate for various types of media scholars who each have their own information needs. Based on previous research and interviews, a framework categorizing these needs was developed. Based on this framework a data model was developed that matches the needs for provenance and trust of user-provided metadata.

Fimtagging screenshot
Screenshot of the FilmTagging tool, showing how users can annotate a video

A crowdsourcing and retrieval platform (FilmTagging) was developed based on this framework and data model. The frontend of the platform allows users to self-declare knowledge levels in different aspects of film and also annotate (describe) films. They can also use the provided tags and provenance information for retrieval and extract this data from the platform.

To test the effectiveness of platform Aschwin conducted an experiment in which 37 participants used the platform to make annotations (in total, 319 such annotations were made). The figure below shows the average self-reported knowledge levels.

Average self-reported knowledge levels on a 5-point scale. The topics are defined by the framework, based on previous research and interviews.
Average self-reported knowledge levels on a 5-point scale. The topics are defined by the framework, based on previous research and interviews.

The annotations and the platform were then positively evaluated by media scholars as it could provide them with annotations that directly lead to film fragments that are useful for their research activities.

Nevertheless, capturing every scholar’s specific information needs is hard since the needs vary heavily depending on the research questions these scholars have.

  • Read more details in Aschwin’s thesis [pdf].
  • Have a look at the software at https://github.com/Aschwinx/Filmtagging , and maybe start your own Filmtagging instance
  • Test the annotation platform yourself at http://astacia.eculture.labs.vu.nl/ or watch the screencast below

Share This:

Connecting collections across national borders

Items from two collections shown side-by-sideAs audiovisual archives are digitizing their collections and making these collections available online, the need arises to also establish connections between different collections and to allow for cross-collection search and browsing. Structured vocabularies can be used as connecting points by aligning thesauri from different institutions. The project “Gemeenschappelijke Thesaurus voor Uniforme Ontsluiting” was funded by the Taalunie -a cross-national organization focusing on the Dutch language- and executed by the Netherlands Institute for Sound and Vision and the Flemish VIAA archive. It involved a case study where partial collections of the two archives were connected by aligning their thesauri. This involved the conversion of the VRT thesaurus to the SKOS format and linking it to Sound and Vision’s GTAA thesaurus.cultuurlink screenshotThe interactive alignment tool CultuurLINK, made by Dutch company Spinque was used to align the two thesauri (see the screenshot above).


The links between the collections can be explored using a cross-collection browser, also built by Spinque. This allows users to search and explore connections between the two collections. Unfortunately, the collections are not publicly available so the demonstrator is password-protected, but a publicly accessible screencast (below) shows the functionalities.

The full report can be accessed through the VIAA site. There, you can also find a blog post in Dutch.

Update: a paper about this has been accepted for publication:

  • Victor de Boer, Matthias Priem, Michiel Hildebrand, Nico Verplancke, Arjen de Vries and Johan Oomen. Exploring Audiovisual Archives through Aligned Thesauri. To appear in Proceedings of 10th Metadata and Semantics Research Conference. [Draft PDF]

Share This:

IetsNieuws: Are you a great newscaster?

Are you as good a newscaster as the legendary Philip Bloemendal?
Are you as good a newscaster as the legendary Philip Bloemendal?

In the context of the Observe project and Lukas Hulsbergen’s thesis, we developed the interactive game/web toy “IetsNieuws“. In the game participants are asked to do voiceovers for Sound and Vision’s OpenImages videos. One player takes on the role of a newscaster, while the other player remixes news footage. Based on this players’ performance, he/she is presented an achievement screen.

Because of the limited game explanation, players created their own style of play leading to “emergent gameplay. An experiment was done to examine whether players experience the relationship between each other when playing the game in the presence of an audience as competitive or cooperative. The results of the observations during the experiment and feedback through a questionnaire show that the subjects saw the other player as a team player and not as an opponent.

Play the game at http://tinyurl.com/ietsnieuwsgame

For more information, read Lukas’ Thesis Iets Nieuws – Lukas Hulsbergen (in Dutch) or have a look at the code on github. Watch players play the game in the experimental setting https://youtu.be/64xi63d9iCc


Share This:

Clarin video showcases Dutch Ships and Sailors project

The CLARIN framework commissioned the production of dissemmination videos showcasing the outcomes of the individual CLARIN projects. One of these projects was the Dutch Ships and Sailors project, a collaboration between VU Computer Science, VU humanities and the Huygens Institute for National History. In this project, we developed a heterogeneous linked data cloud connecting many different maritime databases. This data cloud allows for new types of integrated browsing and new historical research questions. In the video, we (Victor de Boer together with historians Jur Leinenga and Rik Hoekstra) explain how the data cloud was formed and how it can be used by maritime historians.

CLARIN Dutch Ships & Sailors from CLARIN-NL (Dutch, with Dutch or English subtitles)  See also other DSS-related posts on this website.


Share This:

Paper about automatic labeling in IJDL

mompeltOur paper  “Evaluating Unsupervised Thesaurus-based Labeling of Audiovisual Content in an Archive Production Environment” was accepted for publication in the International Journal on Digital Libraries (IJDL). This paper, co-authored with Roeland Ordelman and Josefien Schuurman reports on a series of information extraction experiments carried out at the Netherlands Institute for Sound and Vision (NISV). Specifically, in the paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using subtitles. We look at how such an approach can provide acceptable results given requirements with respect to archival quality, authority and service levels to external users.


For this, we developed a text extraction pipeline (TESS), pictured here which extracts key terms and matches them to the NISV thesaurus, the GTAA. This journal paper is an extended version of the paper previously accepted at the TPDL conference and here provide an analysis of the term extraction after being taken into production, where we focus on performance variation with respect to term types and television programs. Having implemented the procedure in our production work-flow allows us to gradually develop the system further and to also assess the effect of the transformation from manual to automatic annotation from an end-user perspective.

The paper will appear on the Journal site shortly. A final draft version of the paper can be found here: deboer_ijdl2016evaluating_draft [PDF].



Share This:

CultuurLINK Linking Award

Happy and suprised to find the first (and so far only) CultuurLink Linking Award in my mail box yesterday! I checked with the nice people over at Spinque.com and it turns out it was a token of appreciation for being a prolific Cultuurlink user 🙂

I think the vocabulary alignment tool is great and easy to work with, so I can recommend it to anyone with a SKOS vocabulary who wants to match it with any of the major cultural thesauri in the ‘Hub’. Thanks to the people at Spinque for the great tool and the nice gesture!


Share This: