Speech technology and colorization for audiovisual archives

[This post describes and is based on Rudy Marsman‘s MSc thesis and is partly based on a Dutch blog post by him]

The Netherlands Institute for Sound and Vision (NISV) archives Dutch broadcast TV and makes it available to researchers, professionals and the general public. One subset are the Polygoonjournaals (Public News broadcasts) that are published under open licenses as part of the OpenImages platform. NISV is also interested in exploring new ways and technologies to make interaction with the material easier and to increase exposure to their archives. In this context, Rudy explored two options.

Two stills from the film ‘Steegjes‘, with the right frame colorized. Source: Polygoon-Profilti (producent) / Nederlands Instituut voor Beeld en Geluid  / colorized by Rudy Marsman, CC BY-SA

One part of the research was the autonomous colorization of old black-and-white video footage using Neural Networks. Rudy used a pre-trained NN (Zhang et al 2016) that is able to colorize black and white images. Rudy developed a program to split videos into frames, colorize the individual frames using the NN and then ‘stitch’ them back together into colorized videos. The stunning results were very well received by NISV employees. Examples are shown below.


Tour de France 1954 (colorized by Rudy Marsman in 2016), Polygoon-Profilti (producent) / Nederlands Instituut voor Beeld en Geluid (beheerder), CC-BY SA

Results from the comparison of the different variants of the method on different corpora
Results from the comparison of the different variants of the method on different corpora

In the other part of his research, Rudy investigated to what extent the existing news broadcast corpus, with a voice-overs from the famous Philip Bloemendal  can be used to develop a modern text-to-speech engine with his voice. To do so he have mainly focused on natural language processing and the determination to what extent the language used by Bloemendal in the 1970s is still comparable enough to contemporary Dutch.

Rudy used precompiled automatic speech recognition (ASR) results to match words to sounds and developed a slot-and-filler text-to-speech system based on this. To increase the limited vocabulary, he implemented a number of strategies, including term-expansion through the use of Open Dutch Wordnet and smart decompounding (this mostly works for Dutch, mapping ‘sinterklaasoptocht’ to ‘sinterklaas’ and ‘optocht’. The different strategies were compared to a baseline. Rudy found that a combination of the two resulted in the best performance (see figure). For more information:

Share This:

Connecting collections across national borders

Items from two collections shown side-by-sideAs audiovisual archives are digitizing their collections and making these collections available online, the need arises to also establish connections between different collections and to allow for cross-collection search and browsing. Structured vocabularies can be used as connecting points by aligning thesauri from different institutions. The project “Gemeenschappelijke Thesaurus voor Uniforme Ontsluiting” was funded by the Taalunie -a cross-national organization focusing on the Dutch language- and executed by the Netherlands Institute for Sound and Vision and the Flemish VIAA archive. It involved a case study where partial collections of the two archives were connected by aligning their thesauri. This involved the conversion of the VRT thesaurus to the SKOS format and linking it to Sound and Vision’s GTAA thesaurus.cultuurlink screenshotThe interactive alignment tool CultuurLINK, made by Dutch company Spinque was used to align the two thesauri (see the screenshot above).

 

The links between the collections can be explored using a cross-collection browser, also built by Spinque. This allows users to search and explore connections between the two collections. Unfortunately, the collections are not publicly available so the demonstrator is password-protected, but a publicly accessible screencast (below) shows the functionalities.

The full report can be accessed through the VIAA site. There, you can also find a blog post in Dutch.

Update: a paper about this has been accepted for publication:

  • Victor de Boer, Matthias Priem, Michiel Hildebrand, Nico Verplancke, Arjen de Vries and Johan Oomen. Exploring Audiovisual Archives through Aligned Thesauri. To appear in Proceedings of 10th Metadata and Semantics Research Conference. [Draft PDF]

Share This:

Paper about automatic labeling in IJDL

mompeltOur paper  “Evaluating Unsupervised Thesaurus-based Labeling of Audiovisual Content in an Archive Production Environment” was accepted for publication in the International Journal on Digital Libraries (IJDL). This paper, co-authored with Roeland Ordelman and Josefien Schuurman reports on a series of information extraction experiments carried out at the Netherlands Institute for Sound and Vision (NISV). Specifically, in the paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using subtitles. We look at how such an approach can provide acceptable results given requirements with respect to archival quality, authority and service levels to external users.

tess_alg

For this, we developed a text extraction pipeline (TESS), pictured here which extracts key terms and matches them to the NISV thesaurus, the GTAA. This journal paper is an extended version of the paper previously accepted at the TPDL conference and here provide an analysis of the term extraction after being taken into production, where we focus on performance variation with respect to term types and television programs. Having implemented the procedure in our production work-flow allows us to gradually develop the system further and to also assess the effect of the transformation from manual to automatic annotation from an end-user perspective.

The paper will appear on the Journal site shortly. A final draft version of the paper can be found here: deboer_ijdl2016evaluating_draft [PDF].

 

 

Share This:

CultuurLINK Linking Award

Happy and suprised to find the first (and so far only) CultuurLink Linking Award in my mail box yesterday! I checked with the nice people over at Spinque.com and it turns out it was a token of appreciation for being a prolific Cultuurlink user 🙂

I think the vocabulary alignment tool is great and easy to work with, so I can recommend it to anyone with a SKOS vocabulary who wants to match it with any of the major cultural thesauri in the ‘Hub’. Thanks to the people at Spinque for the great tool and the nice gesture!

spinqeprijs

Share This:

Two TPDL papers accepted!

Today, the TPDL (International Conference on Theory and Practice of Digital Libraries) results came in and both papers on which I am a co-author got accepted. Today is a good day 🙂 tess_algThe first paper, we present work done during my stay at Netherlands Institute for Sound and Vision on automatic term extraction from subtitles. The interesting thing about this paper was that it was mainly how these algorithms were functioning in a ‘real’ context, that is within a larger media ecosystem. The paper was co-authored with Roeland Ordelman and Josefien Schuurman.

Screenshot of the QHP toolOn the second paper, I am one of the co-authors. In the paper “Supporting Exploration of Historical Perspectives across Collections”, we present an exploratory search application that highlights different perspectives on World War II across collections (including Verrijkt Koninkrijk). The project is funded by the Amsterdam Data Science seed project with Daan Odijk, research assistants Cristina Gârbacea and Thomas Schoegje, VU/CWI-colleagues Laura Hollink and Jacco van Ossenbruggen and  historian Kees Ribbens (NIOD). You can read more about it on Daan’s blog.

Share This:

Victor at Sound and Vision

A fresh start for me! As of July 1st, I work as a researcher at the Netherlands Institute for Sound and Vision (Beeld en Geluid). They have an awesome building, with awesome people and an awesome audiovisual collection. The latter could do with some Semantic Web technology, so that is what I will be working on.

Beeld en geluid building

I will keep this space for updates on past, present and future projects.

Share This: