Speech technology and colorization for audiovisual archives

[This post describes and is based on Rudy Marsman‘s MSc thesis and is partly based on a Dutch blog post by him]

The Netherlands Institute for Sound and Vision (NISV) archives Dutch broadcast TV and makes it available to researchers, professionals and the general public. One subset are the Polygoonjournaals (Public News broadcasts) that are published under open licenses as part of the OpenImages platform. NISV is also interested in exploring new ways and technologies to make interaction with the material easier and to increase exposure to their archives. In this context, Rudy explored two options.

Two stills from the film ‘Steegjes‘, with the right frame colorized. Source: Polygoon-Profilti (producent) / Nederlands Instituut voor Beeld en Geluid  / colorized by Rudy Marsman, CC BY-SA

One part of the research was the autonomous colorization of old black-and-white video footage using Neural Networks. Rudy used a pre-trained NN (Zhang et al 2016) that is able to colorize black and white images. Rudy developed a program to split videos into frames, colorize the individual frames using the NN and then ‘stitch’ them back together into colorized videos. The stunning results were very well received by NISV employees. Examples are shown below.

Tour de France 1954 (colorized by Rudy Marsman in 2016), Polygoon-Profilti (producent) / Nederlands Instituut voor Beeld en Geluid (beheerder), CC-BY SA

Results from the comparison of the different variants of the method on different corpora
Results from the comparison of the different variants of the method on different corpora

In the other part of his research, Rudy investigated to what extent the existing news broadcast corpus, with a voice-overs from the famous Philip Bloemendal  can be used to develop a modern text-to-speech engine with his voice. To do so he have mainly focused on natural language processing and the determination to what extent the language used by Bloemendal in the 1970s is still comparable enough to contemporary Dutch.

Rudy used precompiled automatic speech recognition (ASR) results to match words to sounds and developed a slot-and-filler text-to-speech system based on this. To increase the limited vocabulary, he implemented a number of strategies, including term-expansion through the use of Open Dutch Wordnet and smart decompounding (this mostly works for Dutch, mapping ‘sinterklaasoptocht’ to ‘sinterklaas’ and ‘optocht’. The different strategies were compared to a baseline. Rudy found that a combination of the two resulted in the best performance (see figure). For more information:

Share This:

Msc project: Low-Bandwith Semantic Web

[This post is based on the Information Sciences MSc. thesis by Onno Valkering]

To make widespread knowledge sharing possible in rural areas in developing countries, the notion of the Web has to be downscaled based on the specific low-resource infrastructure in place. In this paper, we introduce SPARQL over SMS, a solution for exchanging RDF data in which HTTP is substituted by SMS to enable Web-like exchange of data over cellular networks.

SPARQL in an SMS architecture
SPARQL over SMS architecture

The solution uses converters that take outgoing SPARQL queries sent over HTTP and convert them into SMS messages sent to phone numbers (see architecture image). On the receiver-side, the messages are converted back to standard SPARQL requests.

The converters use various data compression strategies to ensure optimal use of the SMS bandwidth. These include both zip-based compression and the removal of redundant data through the use of common background vocabularies. The thesis presents the design and implementation of the solution, along with evaluations of the different data compression methods.

Test setup with two Kasadakas
Test setup with two Kasadakas

The application is validated in two real-world ICT for Development (ICT4D) cases that both use the Kasadaka platform: 1) An extension of the DigiVet application allows sending information related to veterinary symptoms and diagnoses accross different distributed systems. 2) An extension of the RadioMarche application involves the retrieval and adding of current offerings in the market information system, including the phone number of the advertisers.

For more information:

  • Download Onno’s Thesis. A version of the thesis is currently under review.
  • The slides for Onno’s presentation are also available: Onno Valkering
  • View the application code at https://github.com/onnovalkering/sparql-over-sms


Share This:

Kasadaka 1.0

[This post is based on Andre Baart’s B.Sc. thesis. The text is mostly written by him]

Generic overview of Kasadaka
Generic overview of Kasadaka

In developing (rural) communities, the adoption of mobile phones is widespread. This allows information to be offered to these communities through voice-based services. This research explores the possibilities of creating a flexible framework (Kasadaka) for hosting voice services in rural communities. The context of the developing world poses special requirements, which have been taken into account in this research. The framework creates a voice service that incorporates dynamic data from a data store. The framework allows for a low-effort adaptation to new and changing use cases. The service is hosted on cheap, low-powered hardware and is connected to the local GSM network through a dongle. We validated the working and flexibility of the framework by adapting it to a new use case. Setting up this new voice server was possible in less than one hour, proving that it is suitable for rapid prototyping. This framework enables further research into the effects and possibilities of hosting voice based information services in the developing world. The image below shows the different components and the dataflow between these components when a call is made. Read more in Andre Baart‘s thesis (pdf).

All information on how to get started with Kasadaka can be found on the project’s GitHub page: https://github.com/abaart/KasaDaka 


The different components and dataflow
The different components and dataflow (see below)

Text in italics only takes place when setting up the call.

  1. Asterisk receives the call from the GSM dongle, answers the call, and connects it to VXI.
    Asterisk receives the user’s input and forwards it to VXI.
  2. VXI requests the configured VoiceXML document from Apache.
    VXI requests the configured VoiceXML document from Apache. Together with the request, it sends the user input.
  3. Apache runs the Python program (based on Flask), in which data from the triple store has to be read or written. Python sends the SPARQL query to ClioPatria.
  4. ClioPatria runs the query on the data present, and sends the result of the query back to the Python program.
  5. Python renders the VoiceXML template. The dynamic data is now inserted in the VoiceXML document, and it is sent back to VXI.
  6. VXI starts interpreting the VoiceXML document. In the document there are references to audio files. It sends requests to Apache for the referenced files.
  7. Apache sends a request for the file to the file system.
  8. The file is read from the file system.
  9. Apache responds with the requested audio files.
  10. VXI puts all the audio files in the correct order and plays them back sequentially, sending the audio to the GSM dongle.

This cycle repeats until the call is terminated.


Share This:

IetsNieuws: Are you a great newscaster?

Are you as good a newscaster as the legendary Philip Bloemendal?
Are you as good a newscaster as the legendary Philip Bloemendal?

In the context of the Observe project and Lukas Hulsbergen’s thesis, we developed the interactive game/web toy “IetsNieuws“. In the game participants are asked to do voiceovers for Sound and Vision’s OpenImages videos. One player takes on the role of a newscaster, while the other player remixes news footage. Based on this players’ performance, he/she is presented an achievement screen.

Because of the limited game explanation, players created their own style of play leading to “emergent gameplay. An experiment was done to examine whether players experience the relationship between each other when playing the game in the presence of an audience as competitive or cooperative. The results of the observations during the experiment and feedback through a questionnaire show that the subjects saw the other player as a team player and not as an opponent.

Play the game at http://tinyurl.com/ietsnieuwsgame

For more information, read Lukas’ Thesis Iets Nieuws – Lukas Hulsbergen (in Dutch) or have a look at the code on github. Watch players play the game in the experimental setting https://youtu.be/64xi63d9iCc


Share This:

2nd TMT Workshop in Bamako

2016-05-10 13.14.35.jpg
Kasadaka as presented by AOPP

From 7-9 May 2016, the second TMT-AOPP workshop was held in Bamako, Mali. This workshop was held in the context of the Tailor Made Training project that VU Amsterdam participates in together with the Malinese farmer organization Association des Organisations Professionnelles Paysannes (AOPP).

During the workshop, which was attended by around 25 AOPP members from all over Mali, we followed up on the results of a previous workshop in 2015, where we co-developed a number of use cases around improving the lives of rural farmers in Mali. Specifically, we developed two prototypes services accessible using simple mobile phones:

  1. An online marketplace for seeds. Farmers can call in to the system to place offerings of seeds or browse current offers of seeds of various quality levels in a specific region.
  2. A chicken vaccination service. For this service, an extension worker can register newly born chickens in the system. The system keeps an administration of when farmers need to vaccinate their chickens against specific diseases. The system then calls the farmer and plays a reminder message in his/her language.

2016-05-08 12.03.55.jpgThese services were developed on Kasadaka, the cheap and low-resource rapid-prototyping platform for knowledge-rich and voice-accessible services. During the workshop we were able to further test the Kasadaka in the field. A field trip to local farmers and a milk cooperation in nearby Ouelessebougou gave us further context and information in how these services can support locals (see also the video embedded below). Chris van Aart from 2coolmonkeys demonstrated his progress on the Senepedia wiki and two Android applications that allow farmers and organizers to use geo-services to count cows, trees or other objects in the field.

2016-05-09 09.37.40
Chris van Aart shows his apps

In addition to these two services, we also presented seven services on the Kasadaka, developed by students of the VUA ICT4D M.Sc. course. These included a weather information service, two vetirenary services, general-purpose knowledge sharing platforms, farmer alert services and a milk market. These services were all very well received and allowed the workshop participants to really see the full potential of voice-enabled information services.

The presentation below shows more information, my personal highlights from the workshop (hence the title) as well as feedback received on the seven student projects.




Share This:

MSc Project: The Implications of Using Linked Data when Connecting Heterogeneous User Information

[This post describes Karl Lundfall‘s MSc Thesis research and is adapted from his thesis]

sms phoneIn the realm of database technologies, the reign of SQL is slowly coming to an end with the advent of many NoSQL (Not Only SQL) alternatives. Linked Data in the form of RDF is one of these, and is regarded to be highly effective when connecting datasets. In this thesis, we looked into how the choice of database can affect the development, maintenance, and quality of a product by revising a solution for the social enterprise Text to Change Mobile (TTC).

TTC is a non-governmental organization equipping customers in developing countries with high-quality information and important knowledge they could not acquire for themselves. TTC offers mobile-based solutions such as SMS and call services and focuses on projects implying a social change coherent with the values shared by the company.

We revised a real-world system for linking datasets based on a much more mainstream NoSQL technology, and by altering the approach to instead use Linked Data. The result (see the figure on the left) was a more modular system living up to many of the promises of RDF.

Overview of the Linked Data-enabled tool to connect multiple heterogeneous databases developed in the context of this Msc Project.
Overview of the Linked Data-enabled tool to connect multiple heterogeneous databases developed in the context of this Msc Project.

On the other hand, we also found that there for this use case are some obstacles in adopting Linked Data. We saw indicators that more momentum needs to build up in order for RDF to gradually mature enough to be easily applied on use cases like this. The implementation we present and demonstrates a different flavor of Linked Data than the common scenario of publishing data for public reuse, and by applying the technology in business contexts we might be able to expand the possibilities of Linked Data.

As a by-product of the research, a Node.js module for Prolog communication with Cliopatria was developed and made available at https://www.npmjs.com/package/prolog-db . This module might illustrate that new applications usingRDF could contribute in creating a snowball effect of improved quality in RDF-powered applications attracting even more practitioners.

Read more in Karl’s MSc. Thesis 

Share This:

MSc. Project: The search for credibility in news articles and tweets

[This post was written by Marc Jacobs and describes his MSc Thesis research]

Nowadays the world does not just rely on traditional news sources like newspapers, television and radio anymore. Social Media, such as Twitter, are claiming their key position here, thanks to the fast publishing speed and large amount of items. As one may suspect, the credibility of this unrated news becomes questionable. My Master thesis focuses on determining measurable features (such as retweets, likes or number of Wikipedia entities) in newsworthy tweets and online news articles.

Credibility framework pyramid

The gathering of the credibility features consisted of two parts: a theoretical and practical part. First, a theoretical credibility framework has been built using recent studies about credibility on the Web. Next, Ubuntu was booted, Python was started, and news articles and tweets, including metadata, were mined. The news items have been analysed, and, based on the credibility framework, features were extracted. Additional information retrieval techniques (website scraping, regular expressions, NLTK, IR-API’s) were used to extract additional features, so the coverage of the credibility framework was extended.

The data processing and experimentation pipeline

The last step in this research was to present the features to the crowd in an experimental design, using the crowdsourcing platform Crowdflower. The correlation between a specific feature and the credibility of the tweet or news article has been calculated. The results have been compared to find the differences and similarities between tweets and articles.

The highly correlated credibility features (which include the amount of matches with Wikipedia entries) may be used in the future for the construction of credibility algorithms that automatically assess the credibility of newsworthy tweets or news articles, and, hopefully, adds support to filter reliable news from the impenetrable pile of data on the Internet.

Read all the details in Marc’s thesis

Share This:

MSc. Project Roy Hoeymans: Effective Recommendation in Knowlegde Portals – the SKYbrary case study

[This post was written by Roy Hoeymans. It describes his MSc. project ]

In this master project, which I have done externally at DNV-GL, I have built a recommender system for knowledge portals. Recommender systems are pieces of software that provide suggestions for related items to a user. My research focuses on the application of a recommender system in knowledge portals. A knowledge portal is an online single point of access to information or knowledge on a specific subject. Examples of knowledge portals are SKYbrary (www.skybrary.aero) or Navipedia (www.navipedia.org).

skybrary logoPart of this project was a case study on SKYbrary, a knowledge portal on the subject of aviation safety. In this project I looked at the types of data that are typically available to knowledge portals. I used user navigation pattern data, which I retrieved via the Google Analytics API, and the text of the articles to create a user-navigation based and a content based algorithm. The user-navigation based algorithm uses an item association formula and the content based algorithm uses a tf-idf weighting scheme to calculate content similarity between articles. Because both types of algorithm have their separate disadvantages, I also developed a hybrid algorithm that combines these two.

Screenshot of the demo application
Screenshot of the demo application

To see which type of algorithm was the most effective, I conducted a survey to the content editors of SKYbrary, who are domain experts on the subject. Each question in the survey showed an article and then recommendations for that article. The respondent was then asked to rate each recommended article on a scale from 1 (completely irrelevant) to 5 (very relevant). The results of the survey showed that the hybrid algorithm algorithm is, which a statistical significant difference, better than a user-navigation based algorithm. A difference between the hybrid algorithm and the content-based algorithm was not found however. Future work might include a more extensive or different type of evaluation.

In addition to the research I have done on the algorithms, I have also developed a demo application in which the content editors of SKYbrary can use to show recommendations for a selected article and algorithm.

For more informaton, view Roy Hoeymans’ Thesis Presentation [pdf] or read the thesis [Academia].

Share This: