Msc project: Low-Bandwith Semantic Web

[This post is based on the Information Sciences MSc. thesis by Onno Valkering]

To make widespread knowledge sharing possible in rural areas in developing countries, the notion of the Web has to be downscaled based on the specific low-resource infrastructure in place. In this paper, we introduce SPARQL over SMS, a solution for exchanging RDF data in which HTTP is substituted by SMS to enable Web-like exchange of data over cellular networks.

SPARQL in an SMS architecture
SPARQL over SMS architecture

The solution uses converters that take outgoing SPARQL queries sent over HTTP and convert them into SMS messages sent to phone numbers (see architecture image). On the receiver-side, the messages are converted back to standard SPARQL requests.

The converters use various data compression strategies to ensure optimal use of the SMS bandwidth. These include both zip-based compression and the removal of redundant data through the use of common background vocabularies. The thesis presents the design and implementation of the solution, along with evaluations of the different data compression methods.

Test setup with two Kasadakas
Test setup with two Kasadakas

The application is validated in two real-world ICT for Development (ICT4D) cases that both use the Kasadaka platform: 1) An extension of the DigiVet application allows sending information related to veterinary symptoms and diagnoses accross different distributed systems. 2) An extension of the RadioMarche application involves the retrieval and adding of current offerings in the market information system, including the phone number of the advertisers.

For more information:

  • Download Onno’s Thesis. A version of the thesis is currently under review.
  • The slides for Onno’s presentation are also available: Onno Valkering
  • View the application code at https://github.com/onnovalkering/sparql-over-sms

 

Share This:

Kasadaka 1.0

[This post is based on Andre Baart’s B.Sc. thesis. The text is mostly written by him]

Generic overview of Kasadaka
Generic overview of Kasadaka

In developing (rural) communities, the adoption of mobile phones is widespread. This allows information to be offered to these communities through voice-based services. This research explores the possibilities of creating a flexible framework (Kasadaka) for hosting voice services in rural communities. The context of the developing world poses special requirements, which have been taken into account in this research. The framework creates a voice service that incorporates dynamic data from a data store. The framework allows for a low-effort adaptation to new and changing use cases. The service is hosted on cheap, low-powered hardware and is connected to the local GSM network through a dongle. We validated the working and flexibility of the framework by adapting it to a new use case. Setting up this new voice server was possible in less than one hour, proving that it is suitable for rapid prototyping. This framework enables further research into the effects and possibilities of hosting voice based information services in the developing world. The image below shows the different components and the dataflow between these components when a call is made. Read more in Andre Baart‘s thesis (pdf).

All information on how to get started with Kasadaka can be found on the project’s GitHub page: https://github.com/abaart/KasaDaka 

 

The different components and dataflow
The different components and dataflow (see below)

Text in italics only takes place when setting up the call.

  1. Asterisk receives the call from the GSM dongle, answers the call, and connects it to VXI.
    Asterisk receives the user’s input and forwards it to VXI.
  2. VXI requests the configured VoiceXML document from Apache.
    VXI requests the configured VoiceXML document from Apache. Together with the request, it sends the user input.
  3. Apache runs the Python program (based on Flask), in which data from the triple store has to be read or written. Python sends the SPARQL query to ClioPatria.
  4. ClioPatria runs the query on the data present, and sends the result of the query back to the Python program.
  5. Python renders the VoiceXML template. The dynamic data is now inserted in the VoiceXML document, and it is sent back to VXI.
  6. VXI starts interpreting the VoiceXML document. In the document there are references to audio files. It sends requests to Apache for the referenced files.
  7. Apache sends a request for the file to the file system.
  8. The file is read from the file system.
  9. Apache responds with the requested audio files.
  10. VXI puts all the audio files in the correct order and plays them back sequentially, sending the audio to the GSM dongle.

This cycle repeats until the call is terminated.

 

Share This:

Inviting submissions to Downscale2016

The Fourth International Workshop on Downscaling the (Semantic) Web (Downscale2016) will be co-located with the 4th International Conference on ICT for Sustainability (ICT4S). The workshop will be Aug 29 in Amsterdam, The Netherlands.

Downscale2016 follows success of previous Downscale workshops and will mostly focus on appropriate infrastructures. Instead of using large-scale centralised approaches to data management we look at breaking data-centric architectures into smaller components that consume less electricity, be cheaper to own, and more flexible than a “big server” while still mimicking, as a swarm, the features one such big server would provide. As such, the workshop matches ICT for Development (ICT4D) goals with ICT for Solutions (ICT4S) and we expect that the dialogue between ICT4S, Semantic Web and ICT4D researchers and practitioners will further each of the research fields.

We are currenty inviting both short papers (6 pages) or abstracts (2 pages) describing current or late­breaking research in ICT4D. These papers will undergo a light ­review procedure. For more information, visit the workshop web page.

Share This:

Connecting collections across national borders

Items from two collections shown side-by-sideAs audiovisual archives are digitizing their collections and making these collections available online, the need arises to also establish connections between different collections and to allow for cross-collection search and browsing. Structured vocabularies can be used as connecting points by aligning thesauri from different institutions. The project “Gemeenschappelijke Thesaurus voor Uniforme Ontsluiting” was funded by the Taalunie -a cross-national organization focusing on the Dutch language- and executed by the Netherlands Institute for Sound and Vision and the Flemish VIAA archive. It involved a case study where partial collections of the two archives were connected by aligning their thesauri. This involved the conversion of the VRT thesaurus to the SKOS format and linking it to Sound and Vision’s GTAA thesaurus.cultuurlink screenshotThe interactive alignment tool CultuurLINK, made by Dutch company Spinque was used to align the two thesauri (see the screenshot above).

 

The links between the collections can be explored using a cross-collection browser, also built by Spinque. This allows users to search and explore connections between the two collections. Unfortunately, the collections are not publicly available so the demonstrator is password-protected, but a publicly accessible screencast (below) shows the functionalities.

The full report can be accessed through the VIAA site. There, you can also find a blog post in Dutch.

Update: a paper about this has been accepted for publication:

  • Victor de Boer, Matthias Priem, Michiel Hildebrand, Nico Verplancke, Arjen de Vries and Johan Oomen. Exploring Audiovisual Archives through Aligned Thesauri. To appear in Proceedings of 10th Metadata and Semantics Research Conference. [Draft PDF]

Share This: