VU’s Network Institute has a yearly Academy Assistant programme where small interdisciplinary research projects are funded. Within these projects, Master students from different disciplines are given the opportunity to work on these projects under supervision of VU staff members. As in previous years, this year, I also participate as a supervisor in one of these projects, in collaboration with Petra Bos from the Applied Linguistics department. And after having found two enthusiastic students: Dana Hakman from Information Science and Cerise Muller from Applied Linguistics, the project has just started.
Our project “ABC-Kb: A Knowledge base supporting the Assessment of language impairment in Bilingual Children” is aimed at supporting language therapists by (re-)structuring information about language development for bilingual children. Speech language therapists and clinical linguists face the challenge of diagnosing children as young as possible, also when their home language is not Dutch. Their achievements on standard (Dutch) language tests will not be reliable indicators for a language impairment. If diagnosticians had access to information on the language development in the Home Language of these children, this would be tremendously helpful in the diagnostic process.
This project aims to develop a knowledge base (KB) collecting relevant information on the specificities of 60 different home languages (normal and atypical language development), and on contrastive analyses of any of these languages with Dutch. To this end, we leverage an existing wiki: meertaligheidentaalstoornissenvu.wikispaces.com
On Tuesday 13 June 2017, the second CLARIAH Linked Data workshop took place. After the first workshop in September which was very much an introduction to Linked Data to the CLARIAH community, we wanted to organise a more hands-on workshop where researchers, curators and developers could get their hands dirty.
The main goal of the workshop was to introduce relevant tools to novice as well as more advanced users. After a short plenary introduction, we therefore split up the group where for the novice users the focus was on tools that are accompanied by a graphical user interface, like OpenRefine and Gephi; whereas we demonstrated API-based tools to the advanced users, such as the CLARIAH-incubated COW, grlc, Cultuurlink and ANANSI. Our setup, namely to have the participants convert their own dataset to Linked Data and query and visualise, was somewhat ambitious as we had not taken into account all data formats or encodings. Overall, participants were able to get started with some data, and ask questions specific to their use cases.
It is impossible to fully clean and convert and analyse a dataset in a single day, so the CLARIAH team will keep investigating ways to support researchers with their Linked Data needs. For now, you can check out the CultuurLink slidesand tutorial materials from the workshop and keep an eye out on this website for future CLARIAH LOD events.
In today’s network society there is a growing need to share, integrate and search in collections of various libraries, archives and museums. For researchers interpreting these interconnected media collections, tools need to be developed. In the exploratory phaseof research the media researcher has no clear focus and is uncertain what to look for in an integrated collection. Data Visualization technology can be used to support strategies and tactics of interest in doing exploratory research
The DIVE tool is an event-based linked media browser that allows researchers to explore interconnected events, media objects, people, places and concepts (see screenshot). Maartje Kruijt’s research project involved investigating to what extent and in what way the construction of narratives can be made possible in DIVE, in such a way that it contributes to the interpretation process of researchers. Such narratives can be either automatically generated on the basis of existing event-event relationships, or be constructed manually by researchers.
The research proposes an extension of the DIVE tool where selections made during the exploratory phase can be presented in narrative form. This allows researchers to publish the narrative, but also share narratives or reuse other people’s narratives. The interactive presentation of a narrative is complementary to the presentation in a text, but it can serve as a starting point for further exploration of other researchers who make use of the DIVE browser.
Within DIVE and Clariah, we are currently extending the user interface based on the recommendations made in the context of this thesis. You can read more about it in Maartje Kruijt’s thesis (Dutch). The user stories that describe the needs of media researchers are descibed in English and found in Appendix I.
Linked Data, RDF and Semantic Web are popular buzzwords in tech-land and within CLARIAH. But they may not be familiar to everyone within CLARIAH. On 12 september, CLARIAH therefore organized a workshop at the Vrije Universiteit Amsterdam to discuss the use of Linked Data as technology for connecting data across the different CLARIAH work packages (WP3 linguistics, WP4 structured data and WP5 multimedia).
The goal of the workshop was twofold. First of all, to give an overview from the ‘tech’ side of these concepts and show how they are currently employed in the different work packages. At the same time we wanted to hear from Arts and Humanities researchers how these technologies would best suit their research and how CLARIAH can support them in familiarising themselves with Semantic Web tools and data.
Monday afternoon, at 13:00 sharp, around 40 people showed up for the workshop at the Boelelaan in Amsterdam. The workshop included plenary presentations that laid the groundwork for discussions in smaller groups centred around the different types of data from the different WPs (raw collective notes can be found on this piratepad).
Rinke Hoekstra presented an Introduction Linked Data: What is it, how does it compare to other technologies and what is its potential for CLARIAH. [Slides]
In the discussion that followed, some concerns about the potential for Linked Data to deal with data provenance and data quality were discussed.
After this, three humanities researchers from each of the work packages discussed experiences, opportunities, and challenges around Linked Data. Our “Linked Data Champions” of this day were:
WP4: Richard Zijdeman (International Institute of Social History)
WP5: Kaspar Beelen and Liliana Melgar (University of Amsterdam) [Slides]
Marieke van Erp, Rinke Hoekstra and Victor de Boer then discussed how Linked Data is currently being produced in the different work packages and showed an example of how these could be integrated (see image). [Slides]. If you want to try these out yourself, here are some example SPARQL queries to play with.
Break out sessions
Finally, in the break out sessions, the implications and challenges for the individual work packages were further discussed.
For WP3, the discussion focused on formats. There are manynatural language annotation formats used, some with a long history, and these formats are often very closely connected to text analysis software. One of the reasons it may not be useful to WP3 to convert all tools and data to RDF is that performance cannot be guaranteed, and in some cases has already been proven to not be preserved when doing certain text analysis tasks in RDF. However, converting certain annotations, i.e. end results of processing to RDF could be useful here. We further talked about different types of use cases for WP3 that include LOD.
The WP4 break-out session consisted of about a dozen researchers, representing all working packages. The focus of the talk was on the expectations of the tools and data that were demonstrated throughout the day. Various persons were interested to apply QBer, the tool that allows one to turn csv files into Linked Data. The really exciting bit about this, is that the interest was shared by persons outside WP4, thus from persons usually working with text or audio-video sources. This does not just signal the interest in interdisciplinary research, but also the interest for research based on various data types. A second issue discussed was the need for vocabularies ((hierarchical) lists of standard terms). For various research fields such vocabularies do not yet exist. While some vocabularies can be derived relatively easily from existing standards that experts use, it will prove more difficult for a large range of variables. The final issue discussed was the quality of datasets. Should tools be able to handle ‘messy’ data? The audience agreed that data cleaning is the responsibility of the researcher, but that tools should be accompanied by guidelines on the expected format of the datafile.
In the WP5 discussion, issues around data privacy and copyrights were discussed as well as how memory institutions and individual researchers can be persuaded to make their data available as LOD (see image).
The day ended with some final considerations and some well-deserved drinks.
On 29 August, the 4th International Workshop on Downscaling the Semantic Web (Downscale2016) was held as a full-day workshop in Amsterdam co-located with the ICT4S conference. The workshop attracted 12 participants and we received 4 invited paper contributions, which were presented and discussed in the morning session (slides can be found below). These papers describe a issues regarding sustainability of ICT4D approaches, specific downscaled solutions for two ICT4D use cases and a system for distributed publishing and consuming of Linked Data.. The afternoon session was reserved for demonstrations and discussions. An introduction into the Kasadaka platform was followed by an in-depth howto on developing voice-based information services using Linked Data. The papers and the descriptions of the demos are gathered in a proceedings (published online at figshare: doi:10.6084/m9.figshare.3827052.v1).
During the discussions the issue of sustainability was addressed. Different dimensions of sustainability were discussed (technical, economical, social and environmental). The participants agreed that a holistic approach is needed for successful and sustainable ICT4D and that most of these dimensions were indeed present in the four presentations and the design of the Kasadaka platform. There remains a question on how different architectural solutions for services (centralized, decentralized, cloud services) relate to eachother in terms of sustainability and when a choice for one of these is most suited. Discussion then moved towards different technical opportunities for green power supplies, including solar panels.
The main presentations and slides are listed below::
Downscale2016 introduction (Victor and Anna) (slides)
Jari Ferguson and Kim Bosman. The Kasadaka Weather Forecast Service (slides)
Aske Robenhagen and Bart Aulbers. The Mali Milk Service – a voice based platform for enabling farmer networking and connections with buyers. (slides)
Anna Bon, Jaap Gordijn et al. A Structured Model-Based Approach To Preview Sustainability in ICT4D (slides)
Mihai Gramada and Christophe Gueret Low profile data sharing with the Entity Registry System (ERS) (slides)
[This post is based on the Information Sciences MSc. thesis by Onno Valkering]
To make widespread knowledge sharing possible in rural areas in developing countries, the notion of the Web has to be downscaled based on the specific low-resource infrastructure in place. In this paper, we introduce SPARQL over SMS, a solution for exchanging RDF data in which HTTP is substituted by SMS to enable Web-like exchange of data over cellular networks.
The solution uses converters that take outgoing SPARQL queries sent over HTTP and convert them into SMS messages sent to phone numbers (see architecture image). On the receiver-side, the messages are converted back to standard SPARQL requests.
The converters use various data compression strategies to ensure optimal use of the SMS bandwidth. These include both zip-based compression and the removal of redundant data through the use of common background vocabularies. The thesis presents the design and implementation of the solution, along with evaluations of the different data compression methods.
The application is validated in two real-world ICT for Development (ICT4D) cases that both use the Kasadaka platform: 1) An extension of the DigiVet application allows sending information related to veterinary symptoms and diagnoses accross different distributed systems. 2) An extension of the RadioMarche applicationinvolves the retrieval and adding of current offerings in the market information system, including the phone number of the advertisers.
For more information:
Download Onno’s Thesis. A version of the thesis is currently under review.
[This post describes Karl Lundfall‘s MSc Thesis research and is adapted from his thesis]
In the realm of database technologies, the reign of SQL is slowly coming to an end with the advent of many NoSQL (Not Only SQL) alternatives. Linked Data in the form of RDF is one of these, and is regarded to be highly effective when connecting datasets. In this thesis, we looked into how the choice of database can affect the development, maintenance, and quality of a product by revising a solution for the social enterprise Text to Change Mobile (TTC).
TTC is a non-governmental organization equipping customers in developing countries with high-quality information and important knowledge they could not acquire for themselves. TTC offers mobile-based solutions such as SMS and call services and focuses on projects implying a social change coherent with the values shared by the company.
We revised a real-world system for linking datasets based on a much more mainstream NoSQL technology, and by altering the approach to instead use Linked Data. The result (see the figure on the left) was a more modular system living up to many of the promises of RDF.
On the other hand, we also found that there for this use case are some obstacles in adopting Linked Data. We saw indicators that more momentum needs to build up in order for RDF to gradually mature enough to be easily applied on use cases like this. The implementation we present and demonstrates a different flavor of Linked Data than the common scenario of publishing data for public reuse, and by applying the technology in business contexts we might be able to expand the possibilities of Linked Data.
As a by-product of the research, a Node.js module for Prolog communication with Cliopatria was developed and made available at https://www.npmjs.com/package/prolog-db . This module might illustrate that new applications usingRDF could contribute in creating a snowball effect of improved quality in RDF-powered applications attracting even more practitioners.
This year’s third issue of E-Data and Research magazine features an article about the Dutch Ships and Sailors project. The article (in Dutch) describes how our project provides new ways of interacting with Dutch maritime data. So far, four datasets are present in the DSS data cloud but we are currently extending the dataset with two new datasets. More on that later…
In the same issue, there is an article about the workshop around newspaper data as provided by the National Library. This includes a picture of me presenting the DIVE project.
Today, the second international VU symposium in ICT for Development was held. As last year, the workshop was a great success, with an international host of speakers and a variety of attendees (around 80 people joined).This year’s symposium we looked at the opportunities and challenges for “Data for Development” from many angles. In his keynote speech, Gayo Diallo from Unversite de Bordeaux elaborated on how data from mobile telephony providers was used to identify issues with access to health care in Senegal. Marije Geldof discussed the success and difficulties in using mobile data services for assisting health workers in Malawi. After these longer presentations, a series of duo-presentations were held. In the first the concept of upscaling and downscaling (big) data sharing solutions was discussed (Hans Akkermans and Christophe Gueret). In the second duo-presentation we heard from two Amsterdam-based organizations on the use of Open Data for aid transparency (Rolf Kleef) and how to connect data from different mobile projects (Karl Lundfall). The final duo-presentation featured Cheah Waishiang on how to connect to local communities using ICT in Malaysia and Chris van Aart who described the approach of the App-developer. Myrthe van der Wekken and Gossa Lo presented their research on Knowledge Sharing for the Rural Poor through a quick pitch and two very nice posters (see also their reports 1 and 2) . All in all, the symposium showed that in every stage of the data value chain, there is progress being made in the development context. However, there are enormous challenges to be overcome at each stage as well. Enough to work on for a next installment of this yearly symposium series. You can watch the entire symposium through the embedded video below (3 hrs). Below the video you can see the list of speakers and the different timestamps in the video when their talk starts (clicking on the link will open in new window). [youtube https://www.youtube.com/watch?v=s7JO_R9-x6k]
Gayo Diallo – Université de Bordeaux, Bordeaux, FR “Mobile Data in Senegal, a Health Decision Enabler” (6.58)
Marije Geldof – ICT4D professional The Hague, NL “‘Mobile health and the role of data in Malawi’” (45.05)
Hans Akkermans – The Network Institute, VU Amsterdam, NL, “Community-centric Data Services (1.12.00) for Social & Economic Development in Africa”
Christophe Guéret – DANS-KNAW The Hague, NL “Downscaling the (Semantic) Web: Decentralized Linked Open Data for World Citizens” (1.22.40)
Rolf Kleef – Open for Change, NL “Open Data for Development Agencies” (2.04.30)
Karl Lundfall – Text2Change, NL “Integration of Data Sources for Development” (2.15.18)
Cheah Waishiang – Universiti Malaysia Sarawak, Malaysia “Empowering & knowledge through digital storytelling in Borneo, Sarawak, Malaysia” (2.28.26)
Chris van Aart – 2CoolMonkeys, Utrecht, NL “Mr. Meteo, Weather forecasts for African farmers” (2.41.30)
DOWNSCALE 2013, the 2nd international workshop on downscaling the Semantic Web was held on 19-9-2013 in Geneva, Switzerland and was co-located with the Open Knowledge Conference 2013. The workshop seeks to provide first steps in exploring appropriate requirements, technologies, processes and applications for the deployment of Semantic Web technologies in constrained scenarios, taking into consideration local contexts. For instance, making Semantic Web platforms usable under limited computing power and limited access to Internet, with context-specific interfaces.
The workshop accepted three full papers after peer-review and featured five invited abstracts. in his keynote speech, Stephane Boyera of SBC4D gave a very nice overview of the potential use of Semantic Web for Social & Economic Development. The accepted papers and abstracts can be found in the downscale2013 proceedings, which will also appear as part of the OKCon 2013 Open Book.
We broadcast the whole workshop live on the web, and you can actually watch the whole thing (or fragments) via the embedded videos below.
After the presentations, we had fruitful discussions about the main aspects of ‘downscaling’. The consensus seemed to be that Downscaling involved the investigation and usage of Semantic Web technologies and Linked Data principles to allow for data, information and knowledge sharing in circumstances where ‘mainstream’ SW and LD is not feasible or simply does not work. These circumstances can be because of cultural, technical or physical limitations or because of natural or artificial limitations.
The figure illustrates a first attempt to come to a common architecture. It includes three aspects that need to be considered when thinking about data sharing in exceptional circumstances:
Hardware/ Infrastructure. This aspect includes issues with connectivity, low resource hardware, unavailability, etc.
Interfaces. This concerns the design and development of appropriate interfaces with respect to illiteracy of users or their specific usage. Building human-usable interfaces is a more general issue for Linked data.
Pragmatic semantics. Developing LD solutions that consider which information is relevant in which (cultural) circumstances is crucial to its success. This might include filtering of information etc.
The right side of the picture illustrates the downscaling stack.