Speech technology and colorization for audiovisual archives

[This post describes and is based on Rudy Marsman‘s MSc thesis and is partly based on a Dutch blog post by him]

The Netherlands Institute for Sound and Vision (NISV) archives Dutch broadcast TV and makes it available to researchers, professionals and the general public. One subset are the Polygoonjournaals (Public News broadcasts) that are published under open licenses as part of the OpenImages platform. NISV is also interested in exploring new ways and technologies to make interaction with the material easier and to increase exposure to their archives. In this context, Rudy explored two options.

Two stills from the film ‘Steegjes‘, with the right frame colorized. Source: Polygoon-Profilti (producent) / Nederlands Instituut voor Beeld en Geluid  / colorized by Rudy Marsman, CC BY-SA

One part of the research was the autonomous colorization of old black-and-white video footage using Neural Networks. Rudy used a pre-trained NN (Zhang et al 2016) that is able to colorize black and white images. Rudy developed a program to split videos into frames, colorize the individual frames using the NN and then ‘stitch’ them back together into colorized videos. The stunning results were very well received by NISV employees. Examples are shown below.

Tour de France 1954 (colorized by Rudy Marsman in 2016), Polygoon-Profilti (producent) / Nederlands Instituut voor Beeld en Geluid (beheerder), CC-BY SA

Results from the comparison of the different variants of the method on different corpora
Results from the comparison of the different variants of the method on different corpora

In the other part of his research, Rudy investigated to what extent the existing news broadcast corpus, with a voice-overs from the famous Philip Bloemendal  can be used to develop a modern text-to-speech engine with his voice. To do so he have mainly focused on natural language processing and the determination to what extent the language used by Bloemendal in the 1970s is still comparable enough to contemporary Dutch.

Rudy used precompiled automatic speech recognition (ASR) results to match words to sounds and developed a slot-and-filler text-to-speech system based on this. To increase the limited vocabulary, he implemented a number of strategies, including term-expansion through the use of Open Dutch Wordnet and smart decompounding (this mostly works for Dutch, mapping ‘sinterklaasoptocht’ to ‘sinterklaas’ and ‘optocht’. The different strategies were compared to a baseline. Rudy found that a combination of the two resulted in the best performance (see figure). For more information:

Share This:

Multitasking Behaviour and Gaze-Following Technology for Workplace Video-Conferencing.

[This post was written by Eveline van Everdingen and describes her M.Sc. project]

Working with multiple monitors is very common at the workplace nowadays. A second monitor can increase work efficiency, structure and a better overview in a job. Even in business video-conferencing, dual monitors are used. Although the purpose of dual monitor use might be clear to the multitasker, this behaviour is not always perceived as positive by their video-conferencing partners.

Gaze direction of the multitasker with the focus on the primary monitor (left), on the dual monitor (middle) or in between two monitors when switching (right).

Results show that multitasking on a dual screen or mobile device is indicated as less polite and acceptable than doing something else on the same screen. Although the multitasker might be involved with the meeting, he or she seems less engaged with the meeting, resulting in negative perceptions.

Effect of technology on politeness of multitasking

Improving the sense of eye-contact might result in a better video-conferencing experience with the multitasker, therefore a gaze-following tool with two webcams is designed (code available at https://github.com/een450/MasterProject ). When the multitasker switches to the dual screen, another webcam will catch the frontal view of the multitasker. Indeed, participants indicate the multitasking behaviour as being more polite and acceptable with the dynamic view of the multitasker. The sense of eye-contact is not significantly more positive rated with this experimental design.

These results show that gaze-following webcam technology can be successful to improve collaboration in dual-monitor multitasking.

For more information, read Eveline’s thesis [pdf] or visit the project’s figshare page.

Example of a video presented to the experiment participants.

Share This: