Who uses DBPedia anyway?

[this post is based on Frank Walraven‘s Master thesis]

Who uses DBPedia anyway? This was the question that started a research project for Frank Walraven. This question came up during one of the meetings of the Dutch DBPedia chapter, of which VUA is a member. If usage and users are better understood, this can lead to better servicing of those users, by for example prioritizing the enrichment or improvement of specific sections of DBPedia Characterizing use(r)s of a Linked Open Data set is an inherently challenging task as in an open Web world, it is difficult to know who are accessing your digital resources. For his Msc project research, which he conducted at the Dutch National Library supervised by Enno Meijers , Frank used a hybrid approach using both a data-driven method based on user log analysis and a short survey of know users of the dataset. As a scope Frank selected just the Dutch DBPedia dataset.

For the data-driven part of the method, Frank used a complete user log of HTTP requests on the Dutch DBPedia. This log file (see link below) consisted of over 4.5 Million entries and logged both URI lookups and SPARQL endpoint requests. For this research only a subset of the URI lookups were concerned.

As a first analysis step, the requests’ origins IPs were categorized. Five classes can be identified (A-E), with the vast majority of IP addresses being in class “A”: Very large networks and bots. Most of the IP addresses in these lists could be traced back to search engine

indexing bots such as those from Yahoo or Google. In classes B-F, Frank manually traced the top 30 most encounterd IP-addresses, concluding that even there 60% of the requests came from bots, 10% definitely not from bots, with 30% remaining unclear.

The second analysis step in the data-driven method consisted of identifying what types of pages were most requested. To cluster the thousands of DBPedia URI request, Frank retriev

ed the ‘categories’ of the pages. These categories are extracted from Wikipedia category links. An example is the “Android_TV” resource, which has two categories: “Google” and “Android_(operating_system)”. Following skos:broader links, a ‘level 2 category’ could also be found to aggregate to an even higher level of abstraction. As not all resources have such categories, this does not give a complete image, but it does provide some ideas on the most popular categories of items requested. After normalizing for categories with large amounts of incoming links, for example the category “non-endangered animal”, the most popular categories where 1. Domestic & International movies, 2. Music, 3. Sports, 4. Dutch & International municipality information and 5. Books.

Frank also set up a user survey to corroborate this evidence. The survey contained questions about the how and why of the respondents Dutch DBPedia use, including the categories they were most interested in. The survey was distributed using the Dutch DBPedia websitea and via twitter however only attracted 5 respondents. This illustrates

the difficulty of the problem that users of the DBPedia resource are not necessarily easily reachable through communication channels. The five respondents were all quite closely related to the chapter but the results were interesting nonetheless. Most of the users used the DBPedia SPARQL endpoint. The full results of the survey can be found through Frank’s thesis, but in terms of corroboration the survey revealed that four out of the five categories found in the data-driven method were also identified in the top five resulting from the survey. The fifth one identified in the survey was ‘geography’, which could be matched to the fifth from the data-driven method.Frank’s research shows that although it remains a challenging problem, using a combination of data-driven and user-driven methods, it is indeed possible to get an indication into the most-used categories on DBPedia. Within the Dutch DBPedia Chapter, we are currently considering follow-up research questions based on Frank’s research.

Share This:

The Benefits of Linking Metadata for Internal and External users of an Audiovisual Archive

[This post describes the Master Project work of Information Science students Tim de Bruyn and John Brooks and is based on their theses]

Audiovisual archives adopt structured vocabularies for their metadata management. With Semantic Web and Linked Data now becoming more and more stable and commonplace technologies, organizations are looking now at linking these vocabularies to external sources, for example those of Wikidata, DBPedia or GeoNames.

However, the benefits of such endeavors to the organizations are generally underexplored. For their master project research, done in the form of an internship at the Netherlands Institute for Sound and Vision (NISV), Tim de Bruyn and John Brooks conducted a case study into the benefits of linking the “Common Thesaurus for Audiovisual Archives(or GTAA) and the general-purpose dataset Wikidata. In their approach, they identified various use cases for user groups that are both internal (Tim) as well as external (John) to the organization. Not only were use cases identified and matched to a partial alignment of GTAA and Wikidata, but several proof of concept prototypes that address these use cases were developed. 


For the internal users, three cases were elaborated, including a calendar service where personnel receive notifications when an author of a work has passed away 70 years ago, thereby changing copyright status of the work. This information is retrieved from the Wikidata page of the author, aligned with the GTAA entry (see fig 1 above).

A second internal case involves the new ‘story platform’ of NISV. Here Tim implemented a prototype enduser application to find stories related to the one currently shown to the user, based on persons occuring in that story (fig 2).

The external cases centered around the users of the CLARIAH Media Suite. For this extension, several humanities researchers were interviewed to identify worthwile extensions with Wikidata information. Based on the outcomes of these interviews, John Brooks developed the Wikidata retrieval service (fig 3).

The research presented in the two theses are a good example of User-Centric Data Science, where affordances provided by data linkages are aligned with various user needs. The various tools were evaluated with end users to ensure they match their actual needs. The research was reported in a research paper which will be presented at the MTSR2018 conference: (Victor de Boer, Tim de Bruyn, John Brooks, Jesse de Vos. The Benefits of Linking Metadata for Internal and External users of an Audiovisual Archive. To appear in Proceedings of MTSR 2018 [Draft PDF])

Find out more:

Share This:

Developing a Sustainable Weather Information System in Rural Burkina Faso

[This post describes the Information Sciences Master Project of Hameedat Omoine and is based on her thesis.] 

In the quest to improve the lives of farmers and improve agricultural productivity in rural Burkina Faso, meteorological data has been identified as one of the is key information needs for local farmers. Various online weather information services are available, but many are not tailored specifically to tis target user group. In a research case study, Hameedat Omoine designed a weather information system that collects not only weather but also related agricultural information and provides the farmers with this information to allow them to improve agricultural productivity and the livelihood of the people of rural Burkina Faso.

The research and design of the system was conducted at and in collaboration with 2CoolMonkeys, a Utrecht-based Open data and App-development company with expertise in ICT for Development (ICT4D).

Following the design science research methodology, Hameedat investigated the requirements for a weather information system, and the possible options for ensuring the sustainability of the system. Using a structured approach, she developed the application and evaluated it in the field with potential Burkinabe end users. The mobile interface of the application featured weather information and crop advice (seen in the  images above). A demonstration video is shown below

Hameedat developed multiple alternative models to investigate the sustainability of the application. For this she used the e3value approach and language. The image below shows a model for the case where a local radio station is involved.

Share This:

Testimonials Digital Humanities minor at DHBenelux2018

At the DHBenelux 2018 conference, students from the VU minor “Digital Humanities and Social Analytics” presented their final DH in Practice work. In this video, the students talk about their experience in the minor and the internship projects. We also meet other participants of the conference talking about the need for interdisciplinary research.


Share This:

An Augmented Reality App to Annotate Art

[This post is based on the Bachelor project by Jurjen Braam and reuses content from his thesis]

The value of Augmented Reality applications has been shown for a number of different tasks. Most of these show that AR applications add to the immersiveness of an experience. For his Bachelor Project, VU student Jurjen Braam researched to what extent AR technology makes sense for the task of annotating artworks.

To this end, Jurjen built a mobile application which allows experts or laypeople to add textual annotations to artworks in three different modes. One mode doesnt show the artwork, but allows for textual input, the 2nd mode shows the work in an image and allows for localised annotations. The last mode is the AR mode, which projects the artwork in the physical space, using the device camera and screen.

Three modes of the Application (Text, 2D, AR)

Jurjen evaluated the three modes through a small user study, which showed that immersion and enjoyment was highest in the AR mode but that this mode was least efficient. Also, participants indicated that for annotation tasks, larger screens would be preferable.

User evaluation in action

This research was a unique endeavour combining a proven technology (AR) and well-known task (Annotation) which identified interesting possibilities for follow-up research.

Share This:

A Voice Service Development Kit for the Kasadaka platform

[This post is written by André Baart and describes his MSc thesis]

While the internet usage in the developing world is still low, the adoption of simple mobile phones is widespread. A way to offer the advantages of the internet to these populations is voice-based information systems. The KasaDaka voice-services platform is aimed at providing voice-services in the context of ICT for Development (ICT4D). The platform is based on a Raspberry Pi and a GSM modem, which enables affordable voice-service hosting, using the locally available GSM network. The platform takes into account the special requirements of the ICT4D context, such as limited internet connectivity and low literacy rates.

This research focuses on lowering the barrier to entry of voice-service development, by reducing the skill set needed to do so. A Voice Service Development Kit (VSDK) is developed that allows the development of voice-services by deploying and customizing provided building-blocks. These building blocks each represent a type of interaction that is often found in voice-services. (for example a menu, user voice input or the playback of a message) The researcher argues that the simplification of voice-service development is an essential step towards sustainable voice-services in the ICT4D context; As this increases the potential number of local voice-service developers, hremoving the dependency on foreign (and thus expensive) developers and engineers. This simplification should ideally be achieved by providing a graphical interface to voice-service development.

The VSDK was evaluated during the ICT4D course at the Vrije Universiteit Amsterdam, where students built applications for various ICT4D use-cases using the VSDK. Afterwards a survey was conducted, which provided insight on the students’ experiences with voice-service development and the VSDK. From the results of the evaluation is concluded that the building-block approach to voice-service development used in the VSDK, is successful for the development of simple voice-services. It allows newcomers to (voice-service) development, to quickly develop (simple) voice-services from a graphical interface, without requiring programming experience.

The VSDK combined with the existing KasaDaka platform provides a good solution to the hosting and development of voice-services in the ICT4D context.

More details can be found in the complete thesis.A slidedeck is included below. You can find the VSDK code on Andre’s Github: http://github.com/abaart/KasaDaka-VSDK


Share This:

Machine-to-machine communication in rural conditions: Realizing KasadakaNet

[This post describes research by Fahad Ali and is based on his Msc. thesis]

Contextual constraints (lack of infrastructure, low-literacy etc.) play an important role in ICT for Development (ICT4D) projects. The Kasadaka project offers a technological platform for knowledge sharing applications in rural areas in Sub-Saharan Africa. However, lack of stable internet connections restrict exchange of data between distributed Kasadaka instances, which leads us to research alternative ways of machine-to-machine (m2m) communication.

Example of a KasadakaNet situation, with a wifi-donkey mounted on a bus, visiting a city and two remote villages, creating a so-called sneakernet

Fahad Ali’s research focuses on mobile elements and using wifi sneakernets for this m2m to enable information sharing between geographically distributed devices. He developed a Raspberry Pi-based device called the Wifi-donkey that can be mounted on a vehicle and facilitates information exchange with nearby devices, using the built-in wifi card of the rPi 3.The solution is based on Piratebox offline file-sharing and communications system built with free software and uses off-the-shelf Linux software components and configuration settings to allow it to discover and connect to nearby Kasadaka devices based using Wifi technologies.

Experimental setup: the wifi-donkey taped to an Amsterdam balcony to test range and bandwith.

We evaluated the solution by simulating a low resource setting and testing it by performing so-called “pass-bys” in an Amsterdam residential area. In these cases, SPARQL queries are exchanged between host and client devices and we measure amount of RDF triples transferred. This setup matches earlier case requirements as described in Onno Valkering’s work.Results show that the system works fairly reliably in the simulated setting. The machine-to-machine communication method can be used in various ICT4D projects that require some sort of data sharing functionality.

You can find out more about Fahad’s work through the following resources:

Share This:

Dancing and Semantics

This post describes the MSc theses of Ana-Liza Tjon-a-Pauw and Josien Jansen. 

As a semantic web researcher, it is hard to sometimes not see ontologies and triples in aspects of my private life. In this case, through my contacts with dancers and choreographers, I have since a long time been interested in exploring knowledge representation for dance. After a few failed attempts to get a research project funded, I decided to let enthusiastic MSc. students have a go to continue with this exploration. This year, two Information Sciences students, Josien Jansen and Ana-Liza Tjon-a-Pauw, were willing to take up this challenge, with great success. With their background as dancers they did not only have the necessary background knowledge at but also access to dancers who could act as study and test subjects.

The questions of the two projects was therefore: 1) How can we model and represent dance in a sensible manner so that computers can make sense of choreographs and 2) How can we communicate those choreographies to the dancers?

Screenshot of the mobile choreography assistant prototype

Josien’s thesis addressed this first question. Investigating to what extent choreographers can be supported by semi-automatic analysis of choreographies through the generation of new creative choreography elements. She conducted an online questionnaire among 54 choreographers. The results show that a significant subgroup is willing to use an automatic choreography assistant in their creative process. She further identified requirements for such an assistant, including the semantic levels at which should operate and communicate with the end-users. The requirements are used for a design of a choreography assistant “Dancepiration”, which we implemented as a mobile application. The tool allows choreographers to enter (parts of) a choreography and uses multiple strategies for generating creative variations in three dance styles. Josien  evaluated the tool in a user study where we test a) random variations and b) variations based on semantic distance in a dance ontology. The results show that this latter variant is better received by participants. We furthermore identify many differences between the varying dance styles to what extent the assistant supports creativity.

Four participants during the 2nd user experiment. From left to right this shows variations presented through textual, 2D animation, 3D animation, and auditory instructions.

In her thesis, Ana-Liza dove deeper into the human-computer interaction side of the story. Where Josien had classical ballet and modern dance as background and focus, Ana-Liza looked at Dancehall and Hip-Hop dance styles. For her project, Ana-Liza developed four prototypes that could communicate pieces of computer-generated choreography to dancers through Textual Descriptions, 2-D Animations, 3-D Animations, and Audio Descriptions. Each of these presentation methods has its own advantages and disadvantages, so Ana-Liza made an extensive user survey with seven domain experts (dancers). Despite the relatively small group of users, there was a clear preference for the 3-D animations. Based on the results, Ana-Liza also designed an interactive choreography assistant (IDCAT).

The combined theses formed the basis of a scientific article on dance representation and communication that was accepted for publication in the renowned ACE entertainment conference, co-authored by us and co-supervisor Frank Nack.

You can find more information here:

Share This:

ABC-Kb Network Insitute project kickoff

The ABC-Kb team, clockwise from top-left: Dana Hakman, Cerise Muller, Victor de Boer, Petra BosVU’s Network Institute has a yearly Academy Assistant programme where small interdisciplinary research projects are funded. Within these projects, Master students from different disciplines are given the opportunity to work on these projects under supervision of VU staff members. As in previous years, this year, I also participate as a supervisor in one of these projects, in collaboration with Petra Bos from the Applied Linguistics department. And after having found two enthusiastic students: Dana Hakman from Information Science and Cerise Muller from Applied Linguistics, the project has just started.

Our project “ABC-Kb: A Knowledge base supporting the Assessment of language impairment in Bilingual Children” is aimed at supporting language therapists by (re-)structuring information about language development for bilingual children. Speech language therapists and clinical linguists face the challenge of diagnosing children as young as possible, also when their home language is not Dutch. Their achievements on standard (Dutch) language tests will not be reliable indicators for a language impairment. If diagnosticians had access to information on the language development in the Home Language of these children, this would be tremendously helpful in the diagnostic process.

This project aims to develop a knowledge base (KB) collecting relevant information on the specificities of 60 different home languages (normal and atypical language development), and on contrastive analyses of any of these languages with Dutch. To this end, we leverage an existing wiki: meertaligheidentaalstoornissenvu.wikispaces.com

Share This:

Elevator Annotator: Local Crowdsourcing on Audio Annotation

[This post is based on Anggarda Prameswari’s Information Sciences MSc. Thesis]

For her M.Sc. Project, conducted at the Netherlands Institute for Sound and Vision (NISV), Information Sciences student Anggarda Prameswari (pictured right) investigated a local crowdsourcing application to allow NISV to gather crowd annotations for archival audio content. Crowdsourcing and other human computation techniques have proven their use for collecting large numbers of annotations, including in the domain of cultural heritage. Most of the time, crowdsourcing campaigns are done through online tools. Local crowdsourcing is a variant where annotation activities are based on specific locations related to the task.

The two variants of the Elevator Annotator box as deployed during the experiment.
The two variants of the Elevator Annotator box as deployed during the experiment.

Anggarda, in collaboration with NISV’s Themistoklis Karavellas, developed a platform called “Elevator Annotator”, to be used on-site. The platform is designed as a standalone Raspberry Pi-powered box which can be placed in an on-site elevator for example. It features a speech recognition software and a button-based UI to communicate with participants (see video below).

The effectiveness of the platform was evaluated in two different locations (at NISV and at Vrije Universiteit) and with two different modes of interaction (voice input and button-based input) through a local crowdsourcing experiment. In this experiments, elevator-travellers were asked to participate in an experiment. Agreeing participants were then played a short sound clip from the collection to be annotated and asked to identify a musical instrument.

The results show that this approach is able to achieve annotations with reasonable accuracy, with up to 4 annotations per hour. Given that these results were acquired from one elevator, this new form of crowdsourcing can be a promising method of eliciting annotations from on-site participants.

Furthermore, a significant difference was found between participants from the two locations. This indicates that indeed, it makes sense to think about localized versions of on-site crowdsourcing.

More information:

Share This: