Elevator Annotator: Local Crowdsourcing on Audio Annotation

[This post is based on Anggarda Prameswari’s Information Sciences MSc. Thesis]

For her M.Sc. Project, conducted at the Netherlands Institute for Sound and Vision (NISV), Information Sciences student Anggarda Prameswari (pictured right) investigated a local crowdsourcing application to allow NISV to gather crowd annotations for archival audio content. Crowdsourcing and other human computation techniques have proven their use for collecting large numbers of annotations, including in the domain of cultural heritage. Most of the time, crowdsourcing campaigns are done through online tools. Local crowdsourcing is a variant where annotation activities are based on specific locations related to the task.

The two variants of the Elevator Annotator box as deployed during the experiment.
The two variants of the Elevator Annotator box as deployed during the experiment.

Anggarda, in collaboration with NISV’s Themistoklis Karavellas, developed a platform called “Elevator Annotator”, to be used on-site. The platform is designed as a standalone Raspberry Pi-powered box which can be placed in an on-site elevator for example. It features a speech recognition software and a button-based UI to communicate with participants (see video below).

The effectiveness of the platform was evaluated in two different locations (at NISV and at Vrije Universiteit) and with two different modes of interaction (voice input and button-based input) through a local crowdsourcing experiment. In this experiments, elevator-travellers were asked to participate in an experiment. Agreeing participants were then played a short sound clip from the collection to be annotated and asked to identify a musical instrument.

The results show that this approach is able to achieve annotations with reasonable accuracy, with up to 4 annotations per hour. Given that these results were acquired from one elevator, this new form of crowdsourcing can be a promising method of eliciting annotations from on-site participants.

Furthermore, a significant difference was found between participants from the two locations. This indicates that indeed, it makes sense to think about localized versions of on-site crowdsourcing.

More information:

Share This:

Speech technology and colorization for audiovisual archives

[This post describes and is based on Rudy Marsman‘s MSc thesis and is partly based on a Dutch blog post by him]

The Netherlands Institute for Sound and Vision (NISV) archives Dutch broadcast TV and makes it available to researchers, professionals and the general public. One subset are the Polygoonjournaals (Public News broadcasts) that are published under open licenses as part of the OpenImages platform. NISV is also interested in exploring new ways and technologies to make interaction with the material easier and to increase exposure to their archives. In this context, Rudy explored two options.

Two stills from the film ‘Steegjes‘, with the right frame colorized. Source: Polygoon-Profilti (producent) / Nederlands Instituut voor Beeld en Geluid  / colorized by Rudy Marsman, CC BY-SA

One part of the research was the autonomous colorization of old black-and-white video footage using Neural Networks. Rudy used a pre-trained NN (Zhang et al 2016) that is able to colorize black and white images. Rudy developed a program to split videos into frames, colorize the individual frames using the NN and then ‘stitch’ them back together into colorized videos. The stunning results were very well received by NISV employees. Examples are shown below.


Tour de France 1954 (colorized by Rudy Marsman in 2016), Polygoon-Profilti (producent) / Nederlands Instituut voor Beeld en Geluid (beheerder), CC-BY SA

Results from the comparison of the different variants of the method on different corpora
Results from the comparison of the different variants of the method on different corpora

In the other part of his research, Rudy investigated to what extent the existing news broadcast corpus, with a voice-overs from the famous Philip Bloemendal  can be used to develop a modern text-to-speech engine with his voice. To do so he have mainly focused on natural language processing and the determination to what extent the language used by Bloemendal in the 1970s is still comparable enough to contemporary Dutch.

Rudy used precompiled automatic speech recognition (ASR) results to match words to sounds and developed a slot-and-filler text-to-speech system based on this. To increase the limited vocabulary, he implemented a number of strategies, including term-expansion through the use of Open Dutch Wordnet and smart decompounding (this mostly works for Dutch, mapping ‘sinterklaasoptocht’ to ‘sinterklaas’ and ‘optocht’. The different strategies were compared to a baseline. Rudy found that a combination of the two resulted in the best performance (see figure). For more information:

Share This:

The Role of Narratives in DIVE

[This post is based on Maartje Kruijt‘s Media Studies Bachelor thesis: “Supporting exploratory search with features, visualizations, and interface design: a theoretical framework“.]

In today’s network society there is a growing need to share, integrate and search in collections of various libraries, archives and museums. For researchers interpreting these interconnected media collections, tools need to be developed.  In the exploratory phase of research the media researcher has no clear focus and is uncertain what to look for in an integrated collection. Data Visualization technology can be used to support strategies and tactics of interest in doing exploratory research

Dive screenshotThe DIVE tool is an event-based linked media browser that allows researchers to explore interconnected events, media objects, people, places and concepts (see screenshot). Maartje Kruijt’s research project involved investigating to what extent and in what way the construction of narratives can be made possible in DIVE, in such a way that it contributes to the interpretation process of researchers. Such narratives can be either automatically generated on the basis of existing event-event relationships, or be constructed  manually by researchers.

The research proposes an extension of the DIVE tool where selections made during the exploratory phase can be presented in narrative form. This allows researchers to publish the narrative, but also share narratives or reuse other people’s narratives. The interactive presentation of a narrative is complementary to the presentation in a text, but it can serve as a starting point for further exploration of other researchers who make use of the DIVE browser.

Within DIVE and Clariah, we are currently extending the user interface based on the recommendations made in the context of this thesis. You can read more about it in Maartje Kruijt’s thesis (Dutch). The user stories that describe the needs of media researchers are descibed in English and found in Appendix I.

Share This:

Crowd- and nichesourcing for film and media scholars

[This post describes Aschwin Stacia‘s MSc. project and is based on his thesis]

There are many online and private film collections that lack structured annotations to facilitate retrieval. In his Master project work, Aschwin Stacia explored the effectiveness of a crowd-and nichesourced film tagging platform,  around a subset of the Eye Open Beelden film collection.

Specifically, the project aimed at soliciting annotations appropriate for various types of media scholars who each have their own information needs. Based on previous research and interviews, a framework categorizing these needs was developed. Based on this framework a data model was developed that matches the needs for provenance and trust of user-provided metadata.

Fimtagging screenshot
Screenshot of the FilmTagging tool, showing how users can annotate a video

A crowdsourcing and retrieval platform (FilmTagging) was developed based on this framework and data model. The frontend of the platform allows users to self-declare knowledge levels in different aspects of film and also annotate (describe) films. They can also use the provided tags and provenance information for retrieval and extract this data from the platform.

To test the effectiveness of platform Aschwin conducted an experiment in which 37 participants used the platform to make annotations (in total, 319 such annotations were made). The figure below shows the average self-reported knowledge levels.

Average self-reported knowledge levels on a 5-point scale. The topics are defined by the framework, based on previous research and interviews.
Average self-reported knowledge levels on a 5-point scale. The topics are defined by the framework, based on previous research and interviews.

The annotations and the platform were then positively evaluated by media scholars as it could provide them with annotations that directly lead to film fragments that are useful for their research activities.

Nevertheless, capturing every scholar’s specific information needs is hard since the needs vary heavily depending on the research questions these scholars have.

  • Read more details in Aschwin’s thesis [pdf].
  • Have a look at the software at https://github.com/Aschwinx/Filmtagging , and maybe start your own Filmtagging instance
  • Test the annotation platform yourself at http://astacia.eculture.labs.vu.nl/ or watch the screencast below

Share This:

A Mobile App for Sign Language Learning in Sub-Saharan Africa

[This post by Julia Salomons describes her Computer Science Master project]

‘Communication is key’ a phrase known worldwide, it is how people exchange ideas, knowledge, feelings, thoughts and much more.  Communication between people comes in all different forms: verbal, visual or electronic to name a few. For many choosing which form of communication they wish to communicate in is an option. However, when someone suffers from hearing loss they tend to lose the option to choose.

Two starting screens of the final application
Two starting screens of the final application

Depending where you are in the world, the support and care that is available to those who are deaf can vary greatly. In developing regions such as Sub-Saharan Africa (SSA) the support and care varies within the region, from acceptance to rejection. Where on one end of the spectrum, acceptance, individuals are allowed to express themselves how they want whereas on the other end individuals are trapped in their environment. Where some cases they fear for their lives.

The problem that was uncovered during our research showed us that there was a lack of communication between hearing and deaf individuals. Deaf individuals who were lucky enough to attend school or gain support from the government or organisations, learned how to communicate through sign language. However, even with the ability to communicate the communication with other deaf individuals is where the communication stops, which increases the gap between deaf and hearing individuals. This project focused on decreasing that gap, by creating an educational mobile application, Learn to Sign, which would assist hearing individuals learn sign language.

To get a good look at the application, watch a screencast of the application on Youtube or visit the project site at https://learn2signsite.wordpress.com/, where you can download the application. You can also download the thesis itself here.

Share This:

Msc project: Low-Bandwith Semantic Web

[This post is based on the Information Sciences MSc. thesis by Onno Valkering]

To make widespread knowledge sharing possible in rural areas in developing countries, the notion of the Web has to be downscaled based on the specific low-resource infrastructure in place. In this paper, we introduce SPARQL over SMS, a solution for exchanging RDF data in which HTTP is substituted by SMS to enable Web-like exchange of data over cellular networks.

SPARQL in an SMS architecture
SPARQL over SMS architecture

The solution uses converters that take outgoing SPARQL queries sent over HTTP and convert them into SMS messages sent to phone numbers (see architecture image). On the receiver-side, the messages are converted back to standard SPARQL requests.

The converters use various data compression strategies to ensure optimal use of the SMS bandwidth. These include both zip-based compression and the removal of redundant data through the use of common background vocabularies. The thesis presents the design and implementation of the solution, along with evaluations of the different data compression methods.

Test setup with two Kasadakas
Test setup with two Kasadakas

The application is validated in two real-world ICT for Development (ICT4D) cases that both use the Kasadaka platform: 1) An extension of the DigiVet application allows sending information related to veterinary symptoms and diagnoses accross different distributed systems. 2) An extension of the RadioMarche application involves the retrieval and adding of current offerings in the market information system, including the phone number of the advertisers.

For more information:

  • Download Onno’s Thesis. A version of the thesis is currently under review.
  • The slides for Onno’s presentation are also available: Onno Valkering
  • View the application code at https://github.com/onnovalkering/sparql-over-sms

 

Share This:

Kasadaka 1.0

[This post is based on Andre Baart’s B.Sc. thesis. The text is mostly written by him]

Generic overview of Kasadaka
Generic overview of Kasadaka

In developing (rural) communities, the adoption of mobile phones is widespread. This allows information to be offered to these communities through voice-based services. This research explores the possibilities of creating a flexible framework (Kasadaka) for hosting voice services in rural communities. The context of the developing world poses special requirements, which have been taken into account in this research. The framework creates a voice service that incorporates dynamic data from a data store. The framework allows for a low-effort adaptation to new and changing use cases. The service is hosted on cheap, low-powered hardware and is connected to the local GSM network through a dongle. We validated the working and flexibility of the framework by adapting it to a new use case. Setting up this new voice server was possible in less than one hour, proving that it is suitable for rapid prototyping. This framework enables further research into the effects and possibilities of hosting voice based information services in the developing world. The image below shows the different components and the dataflow between these components when a call is made. Read more in Andre Baart‘s thesis (pdf).

All information on how to get started with Kasadaka can be found on the project’s GitHub page: https://github.com/abaart/KasaDaka 

 

The different components and dataflow
The different components and dataflow (see below)

Text in italics only takes place when setting up the call.

  1. Asterisk receives the call from the GSM dongle, answers the call, and connects it to VXI.
    Asterisk receives the user’s input and forwards it to VXI.
  2. VXI requests the configured VoiceXML document from Apache.
    VXI requests the configured VoiceXML document from Apache. Together with the request, it sends the user input.
  3. Apache runs the Python program (based on Flask), in which data from the triple store has to be read or written. Python sends the SPARQL query to ClioPatria.
  4. ClioPatria runs the query on the data present, and sends the result of the query back to the Python program.
  5. Python renders the VoiceXML template. The dynamic data is now inserted in the VoiceXML document, and it is sent back to VXI.
  6. VXI starts interpreting the VoiceXML document. In the document there are references to audio files. It sends requests to Apache for the referenced files.
  7. Apache sends a request for the file to the file system.
  8. The file is read from the file system.
  9. Apache responds with the requested audio files.
  10. VXI puts all the audio files in the correct order and plays them back sequentially, sending the audio to the GSM dongle.

This cycle repeats until the call is terminated.

 

Share This:

IetsNieuws: Are you a great newscaster?

Are you as good a newscaster as the legendary Philip Bloemendal?
Are you as good a newscaster as the legendary Philip Bloemendal?

In the context of the Observe project and Lukas Hulsbergen’s thesis, we developed the interactive game/web toy “IetsNieuws“. In the game participants are asked to do voiceovers for Sound and Vision’s OpenImages videos. One player takes on the role of a newscaster, while the other player remixes news footage. Based on this players’ performance, he/she is presented an achievement screen.

Because of the limited game explanation, players created their own style of play leading to “emergent gameplay. An experiment was done to examine whether players experience the relationship between each other when playing the game in the presence of an audience as competitive or cooperative. The results of the observations during the experiment and feedback through a questionnaire show that the subjects saw the other player as a team player and not as an opponent.

Play the game at http://tinyurl.com/ietsnieuwsgame

For more information, read Lukas’ Thesis Iets Nieuws – Lukas Hulsbergen (in Dutch) or have a look at the code on github. Watch players play the game in the experimental setting https://youtu.be/64xi63d9iCc

 

Share This:

Multitasking Behaviour and Gaze-Following Technology for Workplace Video-Conferencing.

[This post was written by Eveline van Everdingen and describes her M.Sc. project]

Working with multiple monitors is very common at the workplace nowadays. A second monitor can increase work efficiency, structure and a better overview in a job. Even in business video-conferencing, dual monitors are used. Although the purpose of dual monitor use might be clear to the multitasker, this behaviour is not always perceived as positive by their video-conferencing partners.

Eveline2
Gaze direction of the multitasker with the focus on the primary monitor (left), on the dual monitor (middle) or in between two monitors when switching (right).

Results show that multitasking on a dual screen or mobile device is indicated as less polite and acceptable than doing something else on the same screen. Although the multitasker might be involved with the meeting, he or she seems less engaged with the meeting, resulting in negative perceptions.

eveline1
Effect of technology on politeness of multitasking

Improving the sense of eye-contact might result in a better video-conferencing experience with the multitasker, therefore a gaze-following tool with two webcams is designed (code available at https://github.com/een450/MasterProject ). When the multitasker switches to the dual screen, another webcam will catch the frontal view of the multitasker. Indeed, participants indicate the multitasking behaviour as being more polite and acceptable with the dynamic view of the multitasker. The sense of eye-contact is not significantly more positive rated with this experimental design.

These results show that gaze-following webcam technology can be successful to improve collaboration in dual-monitor multitasking.

For more information, read Eveline’s thesis [pdf] or visit the project’s figshare page.

Example of a video presented to the experiment participants.

Share This:

MSc Project: The Implications of Using Linked Data when Connecting Heterogeneous User Information

[This post describes Karl Lundfall‘s MSc Thesis research and is adapted from his thesis]

sms phoneIn the realm of database technologies, the reign of SQL is slowly coming to an end with the advent of many NoSQL (Not Only SQL) alternatives. Linked Data in the form of RDF is one of these, and is regarded to be highly effective when connecting datasets. In this thesis, we looked into how the choice of database can affect the development, maintenance, and quality of a product by revising a solution for the social enterprise Text to Change Mobile (TTC).

TTC is a non-governmental organization equipping customers in developing countries with high-quality information and important knowledge they could not acquire for themselves. TTC offers mobile-based solutions such as SMS and call services and focuses on projects implying a social change coherent with the values shared by the company.

We revised a real-world system for linking datasets based on a much more mainstream NoSQL technology, and by altering the approach to instead use Linked Data. The result (see the figure on the left) was a more modular system living up to many of the promises of RDF.

Overview of the Linked Data-enabled tool to connect multiple heterogeneous databases developed in the context of this Msc Project.
Overview of the Linked Data-enabled tool to connect multiple heterogeneous databases developed in the context of this Msc Project.

On the other hand, we also found that there for this use case are some obstacles in adopting Linked Data. We saw indicators that more momentum needs to build up in order for RDF to gradually mature enough to be easily applied on use cases like this. The implementation we present and demonstrates a different flavor of Linked Data than the common scenario of publishing data for public reuse, and by applying the technology in business contexts we might be able to expand the possibilities of Linked Data.

As a by-product of the research, a Node.js module for Prolog communication with Cliopatria was developed and made available at https://www.npmjs.com/package/prolog-db . This module might illustrate that new applications usingRDF could contribute in creating a snowball effect of improved quality in RDF-powered applications attracting even more practitioners.

Read more in Karl’s MSc. Thesis 

Share This: