I am an assistant professor (UD) at the Web & Media group at the Computer Science department of the Vrije Universiteit Amsterdam (VU). I am also a senior research fellow at Netherlands Institute for Sound and Vision. In my research, I combine (Semantic) Web technologies with Human-Computer Interaction, Knowledge Representation and Information Extraction to tackle research challenges in various domains. These include Cultural Heritage, Digital Humanities and ICT for Development (ICT4D). More information on these projects can be found on this site or through my CV .
On 13 October 2016, the W4RA team organized and co-chaired, a Green Climate Funds workshop together with Malian farmer organization AOPP (l’Association des Organisations professionnelles paysannes). The objective of the meeting was to form a consortium and prepare a project plan, which will be submitted in the framework of this United Nations program.
The workshop was attended by representatives from the Dutch Embassy, the Swedish and Norwegian embassies, and by development (donor) agencies from the EU, Germany, the United Nations Capital Development Fund, the Global Environment Facility (GEF) and a range of Malian and Dutch development organizations.
Mali is one of the poorest countries in the world, plagued by the effects of climate change and a civil war in the northern regions. The effects of land degradation and desertification are a serious threat to the food security of millions of people, especially those living in rural regions.
Recently, the United Nations prioritized its support to Mali in the framework of the Green Climate Funds, a new programme to fight the effects of climate change on global scale. In response to a call for proposals, organizations in Mali are forming consortia, to prepare project proposals for funding by the Green Climate Funds.
Through ongoing interdisciplinary research collaboration, W4RA has obtained extensive experience in socio-technical field-based action research in West Africa. Building on partnerships with local partners (AOPP, Sahel Eco and Radio Rurale – Mali, Réseau MARP -Burkina Faso, University for Development Studies – Ghana) VU’s research programme W4RA wants to contribute to regreening, local knowledge sharing, local innovation and emerging rural agro-forestry value chains.
Meanwhile the W4RA is training students, through community service education, in rural Africa. This is done through the ICT4D master course (artificial intelligence, information science, computer science,) and various master research projects (Network Institute Academy assistants, various master research projects).
[This post is based on Maartje Kruijt‘s Media Studies Bachelor thesis: “Supporting exploratory search with features, visualizations, and interface design: a theoretical framework“.]
In today’s network society there is a growing need to share, integrate and search in collections of various libraries, archives and museums. For researchers interpreting these interconnected media collections, tools need to be developed. In the exploratory phase of research the media researcher has no clear focus and is uncertain what to look for in an integrated collection. Data Visualization technology can be used to support strategies and tactics of interest in doing exploratory research
The DIVE tool is an event-based linked media browser that allows researchers to explore interconnected events, media objects, people, places and concepts (see screenshot). Maartje Kruijt’s research project involved investigating to what extent and in what way the construction of narratives can be made possible in DIVE, in such a way that it contributes to the interpretation process of researchers. Such narratives can be either automatically generated on the basis of existing event-event relationships, or be constructed manually by researchers.
The research proposes an extension of the DIVE tool where selections made during the exploratory phase can be presented in narrative form. This allows researchers to publish the narrative, but also share narratives or reuse other people’s narratives. The interactive presentation of a narrative is complementary to the presentation in a text, but it can serve as a starting point for further exploration of other researchers who make use of the DIVE browser.
Within DIVE and Clariah, we are currently extending the user interface based on the recommendations made in the context of this thesis. You can read more about it in Maartje Kruijt’s thesis (Dutch). The user stories that describe the needs of media researchers are descibed in English and found in Appendix I.
[This blog post is co-written with Marieke van Erp and Rinke Hoekstra and is cross-posted from the Clariah website]
Linked Data, RDF and Semantic Web are popular buzzwords in tech-land and within CLARIAH. But they may not be familiar to everyone within CLARIAH. On 12 september, CLARIAH therefore organized a workshop at the Vrije Universiteit Amsterdam to discuss the use of Linked Data as technology for connecting data across the different CLARIAH work packages (WP3 linguistics, WP4 structured data and WP5 multimedia).
The goal of the workshop was twofold. First of all, to give an overview from the ‘tech’ side of these concepts and show how they are currently employed in the different work packages. At the same time we wanted to hear from Arts and Humanities researchers how these technologies would best suit their research and how CLARIAH can support them in familiarising themselves with Semantic Web tools and data.
Monday afternoon, at 13:00 sharp, around 40 people showed up for the workshop at the Boelelaan in Amsterdam. The workshop included plenary presentations that laid the groundwork for discussions in smaller groups centred around the different types of data from the different WPs (raw collective notes can be found on this piratepad).
Rinke Hoekstra presented an Introduction Linked Data: What is it, how does it compare to other technologies and what is its potential for CLARIAH. [Slides]
In the discussion that followed, some concerns about the potential for Linked Data to deal with data provenance and data quality were discussed.
After this, three humanities researchers from each of the work packages discussed experiences, opportunities, and challenges around Linked Data. Our “Linked Data Champions” of this day were:
- WP3: Piek Vossen (Vrije Universiteit Amsterdam) [Slides]
- WP4: Richard Zijdeman (International Institute of Social History)
- WP5: Kaspar Beelen and Liliana Melgar (University of Amsterdam) [Slides]
Marieke van Erp, Rinke Hoekstra and Victor de Boer then discussed how Linked Data is currently being produced in the different work packages and showed an example of how these could be integrated (see image). [Slides]. If you want to try these out yourself, here are some example SPARQL queries to play with.
Break out sessions
Finally, in the break out sessions, the implications and challenges for the individual work packages were further discussed.
- For WP3, the discussion focused on formats. There are manynatural language annotation formats used, some with a long history, and these formats are often very closely connected to text analysis software. One of the reasons it may not be useful to WP3 to convert all tools and data to RDF is that performance cannot be guaranteed, and in some cases has already been proven to not be preserved when doing certain text analysis tasks in RDF. However, converting certain annotations, i.e. end results of processing to RDF could be useful here. We further talked about different types of use cases for WP3 that include LOD.
- The WP4 break-out session consisted of about a dozen researchers, representing all working packages. The focus of the talk was on the expectations of the tools and data that were demonstrated throughout the day. Various persons were interested to apply QBer, the tool that allows one to turn csv files into Linked Data. The really exciting bit about this, is that the interest was shared by persons outside WP4, thus from persons usually working with text or audio-video sources. This does not just signal the interest in interdisciplinary research, but also the interest for research based on various data types. A second issue discussed was the need for vocabularies ((hierarchical) lists of standard terms). For various research fields such vocabularies do not yet exist. While some vocabularies can be derived relatively easily from existing standards that experts use, it will prove more difficult for a large range of variables. The final issue discussed was the quality of datasets. Should tools be able to handle ‘messy’ data? The audience agreed that data cleaning is the responsibility of the researcher, but that tools should be accompanied by guidelines on the expected format of the datafile.
- In the WP5 discussion, issues around data privacy and copyrights were discussed as well as how memory institutions and individual researchers can be persuaded to make their data available as LOD (see image).
The day ended with some final considerations and some well-deserved drinks.
I chose to publish the proceedings of Downscale2016 using Figshare. This gives a nice persistent place for the proceedings and includes a DOI. To cite the proceedings, use the text below. The proceedings is published using the CC-BY license.
Victor de Boer, Anna Bon, Cheah WaiShiang and Nana Baah Gyan (eds.) Proceedings of the 4th Workshop on Downscaling the Semantic Web (Downscale2016). Co-located with the 4th International Conference on ICT for Sustainability (ICT4S) Sep 1, 2016, Amsterdam, The Netherlands. doi:10.6084/m9.figshare.3827052.v1
As previously announced, the pilot implementation for the Big-Data-Europe platform for Societal Challenge 1 (the Health domain) facilitates the Open PHACTS discovery Platform functionality. The Open PHACTS platform is built for researchers in Drug Discovery. It uses databases of physicochemical and pharmacological properties stored in a RDF Triple Store. This interconnected data is exposed through a Linked Data API composed of interoperable data. The system caches query results via a Memcached module. In the context of the SC1 pilot, most functionalities of the platform is now successfully replicated via Docker containers on the BDE infrastructure.
Please do try this at home! The pilot can be installed on Linux (through Docker compose) or Windows (through Docker toolbox). Installations instructions are available on the pilot’s GitHub page. By design the technology itself is independent from the domain. Once you got familiar with the code and got it running by yourself, you should have enough experience to upload your own Linked Data, and create your own API.
[This post describes Aschwin Stacia‘s MSc. project and is based on his thesis]
There are many online and private film collections that lack structured annotations to facilitate retrieval. In his Master project work, Aschwin Stacia explored the effectiveness of a crowd-and nichesourced film tagging platform, around a subset of the Eye Open Beelden film collection.
Specifically, the project aimed at soliciting annotations appropriate for various types of media scholars who each have their own information needs. Based on previous research and interviews, a framework categorizing these needs was developed. Based on this framework a data model was developed that matches the needs for provenance and trust of user-provided metadata.
A crowdsourcing and retrieval platform (FilmTagging) was developed based on this framework and data model. The frontend of the platform allows users to self-declare knowledge levels in different aspects of film and also annotate (describe) films. They can also use the provided tags and provenance information for retrieval and extract this data from the platform.
To test the effectiveness of platform Aschwin conducted an experiment in which 37 participants used the platform to make annotations (in total, 319 such annotations were made). The figure below shows the average self-reported knowledge levels.
The annotations and the platform were then positively evaluated by media scholars as it could provide them with annotations that directly lead to film fragments that are useful for their research activities.
Nevertheless, capturing every scholar’s specific information needs is hard since the needs vary heavily depending on the research questions these scholars have.
- Read more details in Aschwin’s thesis [pdf].
- Have a look at the software at https://github.com/Aschwinx/Filmtagging , and maybe start your own Filmtagging instance
- Test the annotation platform yourself at http://astacia.eculture.labs.vu.nl/ or watch the screencast below
On 29 August, the 4th International Workshop on Downscaling the Semantic Web (Downscale2016) was held as a full-day workshop in Amsterdam co-located with the ICT4S conference. The workshop attracted 12 participants and we received 4 invited paper contributions, which were presented and discussed in the morning session (slides can be found below). These papers describe a issues regarding sustainability of ICT4D approaches, specific downscaled solutions for two ICT4D use cases and a system for distributed publishing and consuming of Linked Data.. The afternoon session was reserved for demonstrations and discussions. An introduction into the Kasadaka platform was followed by an in-depth howto on developing voice-based information services using Linked Data. The papers and the descriptions of the demos are gathered in a proceedings (published online at figshare: doi:10.6084/m9.figshare.3827052.v1).
During the discussions the issue of sustainability was addressed. Different dimensions of sustainability were discussed (technical, economical, social and environmental). The participants agreed that a holistic approach is needed for successful and sustainable ICT4D and that most of these dimensions were indeed present in the four presentations and the design of the Kasadaka platform. There remains a question on how different architectural solutions for services (centralized, decentralized, cloud services) relate to eachother in terms of sustainability and when a choice for one of these is most suited. Discussion then moved towards different technical opportunities for green power supplies, including solar panels.
The main presentations and slides are listed below::
- Downscale2016 introduction (Victor and Anna) (slides)
- Jari Ferguson and Kim Bosman. The Kasadaka Weather Forecast Service (slides)
- Aske Robenhagen and Bart Aulbers. The Mali Milk Service – a voice based platform for enabling farmer networking and connections with buyers. (slides)
- Anna Bon, Jaap Gordijn et al. A Structured Model-Based Approach To Preview Sustainability in ICT4D (slides)
- Mihai Gramada and Christophe Gueret Low profile data sharing with the Entity Registry System (ERS) (slides)
[This post by Julia Salomons describes her Computer Science Master project]
‘Communication is key’ a phrase known worldwide, it is how people exchange ideas, knowledge, feelings, thoughts and much more. Communication between people comes in all different forms: verbal, visual or electronic to name a few. For many choosing which form of communication they wish to communicate in is an option. However, when someone suffers from hearing loss they tend to lose the option to choose.
Depending where you are in the world, the support and care that is available to those who are deaf can vary greatly. In developing regions such as Sub-Saharan Africa (SSA) the support and care varies within the region, from acceptance to rejection. Where on one end of the spectrum, acceptance, individuals are allowed to express themselves how they want whereas on the other end individuals are trapped in their environment. Where some cases they fear for their lives.
The problem that was uncovered during our research showed us that there was a lack of communication between hearing and deaf individuals. Deaf individuals who were lucky enough to attend school or gain support from the government or organisations, learned how to communicate through sign language. However, even with the ability to communicate the communication with other deaf individuals is where the communication stops, which increases the gap between deaf and hearing individuals. This project focused on decreasing that gap, by creating an educational mobile application, Learn to Sign, which would assist hearing individuals learn sign language.
To get a good look at the application, watch a screencast of the application on Youtube or visit the project site at https://learn2signsite.wordpress.com/, where you can download the application. You can also download the thesis itself here.
[This post is based on the Information Sciences MSc. thesis by Onno Valkering]
To make widespread knowledge sharing possible in rural areas in developing countries, the notion of the Web has to be downscaled based on the specific low-resource infrastructure in place. In this paper, we introduce SPARQL over SMS, a solution for exchanging RDF data in which HTTP is substituted by SMS to enable Web-like exchange of data over cellular networks.
The solution uses converters that take outgoing SPARQL queries sent over HTTP and convert them into SMS messages sent to phone numbers (see architecture image). On the receiver-side, the messages are converted back to standard SPARQL requests.
The converters use various data compression strategies to ensure optimal use of the SMS bandwidth. These include both zip-based compression and the removal of redundant data through the use of common background vocabularies. The thesis presents the design and implementation of the solution, along with evaluations of the different data compression methods.
The application is validated in two real-world ICT for Development (ICT4D) cases that both use the Kasadaka platform: 1) An extension of the DigiVet application allows sending information related to veterinary symptoms and diagnoses accross different distributed systems. 2) An extension of the RadioMarche application involves the retrieval and adding of current offerings in the market information system, including the phone number of the advertisers.
For more information:
- Download Onno’s Thesis. A version of the thesis is currently under review.
- The slides for Onno’s presentation are also available: Onno Valkering
- View the application code at https://github.com/onnovalkering/sparql-over-sms
[This post is based on Andre Baart’s B.Sc. thesis. The text is mostly written by him]
In developing (rural) communities, the adoption of mobile phones is widespread. This allows information to be offered to these communities through voice-based services. This research explores the possibilities of creating a flexible framework (Kasadaka) for hosting voice services in rural communities. The context of the developing world poses special requirements, which have been taken into account in this research. The framework creates a voice service that incorporates dynamic data from a data store. The framework allows for a low-effort adaptation to new and changing use cases. The service is hosted on cheap, low-powered hardware and is connected to the local GSM network through a dongle. We validated the working and flexibility of the framework by adapting it to a new use case. Setting up this new voice server was possible in less than one hour, proving that it is suitable for rapid prototyping. This framework enables further research into the effects and possibilities of hosting voice based information services in the developing world. The image below shows the different components and the dataflow between these components when a call is made. Read more in Andre Baart‘s thesis (pdf).
All information on how to get started with Kasadaka can be found on the project’s GitHub page: https://github.com/abaart/KasaDaka
Text in italics only takes place when setting up the call.
- Asterisk receives the call from the GSM dongle, answers the call, and connects it to VXI.
Asterisk receives the user’s input and forwards it to VXI.
- VXI requests the configured VoiceXML document from Apache.
VXI requests the configured VoiceXML document from Apache. Together with the request, it sends the user input.
- Apache runs the Python program (based on Flask), in which data from the triple store has to be read or written. Python sends the SPARQL query to ClioPatria.
- ClioPatria runs the query on the data present, and sends the result of the query back to the Python program.
- Python renders the VoiceXML template. The dynamic data is now inserted in the VoiceXML document, and it is sent back to VXI.
- VXI starts interpreting the VoiceXML document. In the document there are references to audio files. It sends requests to Apache for the referenced files.
- Apache sends a request for the file to the file system.
- The file is read from the file system.
- Apache responds with the requested audio files.
- VXI puts all the audio files in the correct order and plays them back sequentially, sending the audio to the GSM dongle.
This cycle repeats until the call is terminated.