I am an assistant professor (UD) at the Web & Media group at the Computer Science department of the Vrije Universiteit Amsterdam (VU). I am also a senior research fellow at Netherlands Institute for Sound and Vision. In my research, I combine (Semantic) Web technologies with Human-Computer Interaction, Knowledge Representation and Information Extraction to tackle research challenges in various domains. These include Cultural Heritage, Digital Humanities and ICT for Development (ICT4D). More information on these projects can be found on this site or through my CV .
[This post is based on the Information Sciences MSc. thesis by Onno Valkering]
To make widespread knowledge sharing possible in rural areas in developing countries, the notion of the Web has to be downscaled based on the specific low-resource infrastructure in place. In this paper, we introduce SPARQL over SMS, a solution for exchanging RDF data in which HTTP is substituted by SMS to enable Web-like exchange of data over cellular networks.
The solution uses converters that take outgoing SPARQL queries sent over HTTP and convert them into SMS messages sent to phone numbers (see architecture image). On the receiver-side, the messages are converted back to standard SPARQL requests.
The converters use various data compression strategies to ensure optimal use of the SMS bandwidth. These include both zip-based compression and the removal of redundant data through the use of common background vocabularies. The thesis presents the design and implementation of the solution, along with evaluations of the different data compression methods.
The application is validated in two real-world ICT for Development (ICT4D) cases that both use the Kasadaka platform: 1) An extension of the DigiVet application allows sending information related to veterinary symptoms and diagnoses accross different distributed systems. 2) An extension of the RadioMarche application involves the retrieval and adding of current offerings in the market information system, including the phone number of the advertisers.
For more information:
- Download Onno’s Thesis. A version of the thesis is currently under review.
- The slides for Onno’s presentation are also available: Onno Valkering
- View the application code at https://github.com/onnovalkering/sparql-over-sms
[This post is based on Andre Baart’s B.Sc. thesis. The text is mostly written by him]
In developing (rural) communities, the adoption of mobile phones is widespread. This allows information to be offered to these communities through voice-based services. This research explores the possibilities of creating a flexible framework (Kasadaka) for hosting voice services in rural communities. The context of the developing world poses special requirements, which have been taken into account in this research. The framework creates a voice service that incorporates dynamic data from a data store. The framework allows for a low-effort adaptation to new and changing use cases. The service is hosted on cheap, low-powered hardware and is connected to the local GSM network through a dongle. We validated the working and flexibility of the framework by adapting it to a new use case. Setting up this new voice server was possible in less than one hour, proving that it is suitable for rapid prototyping. This framework enables further research into the effects and possibilities of hosting voice based information services in the developing world. The image below shows the different components and the dataflow between these components when a call is made. Read more in Andre Baart‘s thesis (pdf).
All information on how to get started with Kasadaka can be found on the project’s GitHub page: https://github.com/abaart/KasaDaka
Text in italics only takes place when setting up the call.
- Asterisk receives the call from the GSM dongle, answers the call, and connects it to VXI.
Asterisk receives the user’s input and forwards it to VXI.
- VXI requests the configured VoiceXML document from Apache.
VXI requests the configured VoiceXML document from Apache. Together with the request, it sends the user input.
- Apache runs the Python program (based on Flask), in which data from the triple store has to be read or written. Python sends the SPARQL query to ClioPatria.
- ClioPatria runs the query on the data present, and sends the result of the query back to the Python program.
- Python renders the VoiceXML template. The dynamic data is now inserted in the VoiceXML document, and it is sent back to VXI.
- VXI starts interpreting the VoiceXML document. In the document there are references to audio files. It sends requests to Apache for the referenced files.
- Apache sends a request for the file to the file system.
- The file is read from the file system.
- Apache responds with the requested audio files.
- VXI puts all the audio files in the correct order and plays them back sequentially, sending the audio to the GSM dongle.
This cycle repeats until the call is terminated.
The Fourth International Workshop on Downscaling the (Semantic) Web (Downscale2016) will be co-located with the 4th International Conference on ICT for Sustainability (ICT4S). The workshop will be Aug 29 in Amsterdam, The Netherlands.
Downscale2016 follows success of previous Downscale workshops and will mostly focus on appropriate infrastructures. Instead of using large-scale centralised approaches to data management we look at breaking data-centric architectures into smaller components that consume less electricity, be cheaper to own, and more flexible than a “big server” while still mimicking, as a swarm, the features one such big server would provide. As such, the workshop matches ICT for Development (ICT4D) goals with ICT for Solutions (ICT4S) and we expect that the dialogue between ICT4S, Semantic Web and ICT4D researchers and practitioners will further each of the research fields.
We are currenty inviting both short papers (6 pages) or abstracts (2 pages) describing current or latebreaking research in ICT4D. These papers will undergo a light review procedure. For more information, visit the workshop web page.
As audiovisual archives are digitizing their collections and making these collections available online, the need arises to also establish connections between different collections and to allow for cross-collection search and browsing. Structured vocabularies can be used as connecting points by aligning thesauri from different institutions. The project “Gemeenschappelijke Thesaurus voor Uniforme Ontsluiting” was funded by the Taalunie -a cross-national organization focusing on the Dutch language- and executed by the Netherlands Institute for Sound and Vision and the Flemish VIAA archive. It involved a case study where partial collections of the two archives were connected by aligning their thesauri. This involved the conversion of the VRT thesaurus to the SKOS format and linking it to Sound and Vision’s GTAA thesaurus.The interactive alignment tool CultuurLINK, made by Dutch company Spinque was used to align the two thesauri (see the screenshot above).
The links between the collections can be explored using a cross-collection browser, also built by Spinque. This allows users to search and explore connections between the two collections. Unfortunately, the collections are not publicly available so the demonstrator is password-protected, but a publicly accessible screencast (below) shows the functionalities.
In the context of the Observe project and Lukas Hulsbergen’s thesis, we developed the interactive game/web toy “IetsNieuws“. In the game participants are asked to do voiceovers for Sound and Vision’s OpenImages videos. One player takes on the role of a newscaster, while the other player remixes news footage. Based on this players’ performance, he/she is presented an achievement screen.
Because of the limited game explanation, players created their own style of play leading to “emergent gameplay”. An experiment was done to examine whether players experience the relationship between each other when playing the game in the presence of an audience as competitive or cooperative. The results of the observations during the experiment and feedback through a questionnaire show that the subjects saw the other player as a team player and not as an opponent.
Play the game at http://tinyurl.com/ietsnieuwsgame
For more information, read Lukas’ Thesis Iets Nieuws – Lukas Hulsbergen (in Dutch) or have a look at the code on github. Watch players play the game in the experimental setting https://youtu.be/64xi63d9iCc
The CLARIN framework commissioned the production of dissemmination videos showcasing the outcomes of the individual CLARIN projects. One of these projects was the Dutch Ships and Sailors project, a collaboration between VU Computer Science, VU humanities and the Huygens Institute for National History. In this project, we developed a heterogeneous linked data cloud connecting many different maritime databases. This data cloud allows for new types of integrated browsing and new historical research questions. In the video, we (Victor de Boer together with historians Jur Leinenga and Rik Hoekstra) explain how the data cloud was formed and how it can be used by maritime historians.
Our paper “Evaluating Unsupervised Thesaurus-based Labeling of Audiovisual Content in an Archive Production Environment” was accepted for publication in the International Journal on Digital Libraries (IJDL). This paper, co-authored with Roeland Ordelman and Josefien Schuurman reports on a series of information extraction experiments carried out at the Netherlands Institute for Sound and Vision (NISV). Specifically, in the paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using subtitles. We look at how such an approach can provide acceptable results given requirements with respect to archival quality, authority and service levels to external users.
For this, we developed a text extraction pipeline (TESS), pictured here which extracts key terms and matches them to the NISV thesaurus, the GTAA. This journal paper is an extended version of the paper previously accepted at the TPDL conference and here provide an analysis of the term extraction after being taken into production, where we focus on performance variation with respect to term types and television programs. Having implemented the procedure in our production work-flow allows us to gradually develop the system further and to also assess the effect of the transformation from manual to automatic annotation from an end-user perspective.
The paper will appear on the Journal site shortly. A final draft version of the paper can be found here: deboer_ijdl2016evaluating_draft [PDF].
Around 40 students joined this year’s “bachelor’s for a day” for the VU IMM programme this year. As in previous years, I give a 45 minute lecture and construct a hands-on session around “The Social Web”. Each year I do a non-scientific survey of Social Web use among the -mostly- 17 year old attendees. This year’s outcome:
- Everybody still uses Facebook (even though for the last couple of years, there are some murmurs about abandoning it
- Everybody uses Whatsapp. No surprise there
- More than half of the students use Snapchat.
- About 1/4 of students use LinkedIn.
- About 1/8 of students actively uses Twitter (one post in the last 3 months)
- Most students have heard of Hyves, but noone ever used it
- Almost noone has heard of Second Life 🙂
- Noone heard of Schoolbank.nl
You can find my slides below. The handson session can be found here
From 7-9 May 2016, the second TMT-AOPP workshop was held in Bamako, Mali. This workshop was held in the context of the Tailor Made Training project that VU Amsterdam participates in together with the Malinese farmer organization Association des Organisations Professionnelles Paysannes (AOPP).
During the workshop, which was attended by around 25 AOPP members from all over Mali, we followed up on the results of a previous workshop in 2015, where we co-developed a number of use cases around improving the lives of rural farmers in Mali. Specifically, we developed two prototypes services accessible using simple mobile phones:
- An online marketplace for seeds. Farmers can call in to the system to place offerings of seeds or browse current offers of seeds of various quality levels in a specific region.
- A chicken vaccination service. For this service, an extension worker can register newly born chickens in the system. The system keeps an administration of when farmers need to vaccinate their chickens against specific diseases. The system then calls the farmer and plays a reminder message in his/her language.
These services were developed on Kasadaka, the cheap and low-resource rapid-prototyping platform for knowledge-rich and voice-accessible services. During the workshop we were able to further test the Kasadaka in the field. A field trip to local farmers and a milk cooperation in nearby Ouelessebougou gave us further context and information in how these services can support locals (see also the video embedded below). Chris van Aart from 2coolmonkeys demonstrated his progress on the Senepedia wiki and two Android applications that allow farmers and organizers to use geo-services to count cows, trees or other objects in the field.
In addition to these two services, we also presented seven services on the Kasadaka, developed by students of the VUA ICT4D M.Sc. course. These included a weather information service, two vetirenary services, general-purpose knowledge sharing platforms, farmer alert services and a milk market. These services were all very well received and allowed the workshop participants to really see the full potential of voice-enabled information services.
The presentation below shows more information, my personal highlights from the workshop (hence the title) as well as feedback received on the seven student projects.