Open Tea at CIS-VU

Open Tea at CISToday, the fourth floor of the VU metropolitan building was the scene of the Open Tea, a series of events organised by the Open for Change network. Open for Change organises such an event, open to everyone interested in Open Development a couple of times a year somewhere in the Netherlands.

After meeting members at the OKFest in Helsinki, we proposed to hold an Open Tea event at VU University. Specifically, the Centre for International Cooperation (CIS-VU) was kind enough to host the event.

We got to talk about the API2LOD for Development data. API2LOD is a tool that is under development by Christophe Guéret and myself that allows the creation of a wrapper that exposes any data from any API into five-star linked data. Our first two datasets are related to development: the data from the knowledge base of the Institute for Development Studies (IDS) and data from the International Aid Transparancy Initiative (IATI). Through our efforts, we hope to bring more data on development to the Web of Linked Data.

The second VU talk was about the efforts of the W4RA team to bring the Web to people in rural developent areas using voice services. In particular RadioMarché and Foroba Blon.

Pelle Aardema and Rolf Kleef then gave us an update on the status of Open for Change and plans for the future. I am sure that VU Amsterdam’s  Network Institute and CIS-VU can benefit greatly from continuing this collaboration and we hope that we can add our knowledge and experience to the network.

Share This:

SPARQL Queries for Verrijkt Koninkrijk

[update: the links have been updated] In this post, I list a number of SPARQL queries that show the way external sources can be used to provide enriched access to the Verrijkt Koninkrijk text. The queries go with a two-page abstract  entitled “Enriched Access to a Large War Historical Text using the Back of the Book Index” I submitted to the SWAIE 2012 – Semantic Web and Information Extraction workshop I will be attending.

These queries use the back-of-the-book index that has been converted to SKOS and was subsequently aligned with a number of datasources.

The queries can be entered in the interactive SPARQL interface of the Verrijkt Koninkrijk semantic server, which can be found at http://semanticweb.cs.vu.nl/verrijktkoninkrijk/flint/ . (login: sparqltester, ww: sparqltester).

Query1: GeoNames. Get all paragrahs containing references to a place in the Dutch Province “Noord Holland”:

PREFIX niod: <http://purl.org/collections/nl/niod/>
prefix dc:   <http://purl.org/dc/elements/1.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?subj ?bc ?par
WHERE  {
?subj <http://www.geonames.org/ontology#parentADM1> <http://sws.geonames.org/2749879/>.
?bc skos:closeMatch ?subj.
?bc skos:inScheme niod:BotBScheme.
?bc niod:pageRef ?pr.
?pr niod:parRef ?par.
}
limit 100

Edit 3 oct: I continued experimenting with some other SPARQL queries and used Willem van Hage and Tomi Kauppinen’s excellent SPARQl package for R to do some quick-and-dirty statistical analysis. I used a variant of  the query above, but with the province as a variable. I put the results in a pie chart showing Loe de Jong’s mentions of places found in each of the twelve provinces of the Netherlands.

Frequencies of page references to places in each of the twelve provinces in "Het Koninkrijk"
Frequencies of page references to places in each of the twelve provinces in “Het Koninkrijk”

And if you substitute the predicate ‘parentADM1’ for ‘parentADM2’, you get the frequencies for the individual municipalities:

Frequencies of page references to municipalities in "Het Koninkrijk"
Frequencies of page references to municipalities in “Het Koninkrijk”

I will leave the historical interpretation of these charts to the reader. Note however that a major disclaimer is needed. There are numerous errors in the data, including OCR errors, and concept  mapping errors. I am sure that the municipality ‘Berkelland’ is not as important as it now seems. Also, the data should be normalized by province size to give a better idea of what is going on.

The point is however that -given the linked data- these analyses are ridiculously easy to perform with SPARQL and R.

Query2: NIOD Thesaurus Beeldbank WO2. Get all combinations of BBWO2 images and paragraphs

PREFIX niod: <http://purl.org/collections/nl/niod/>
prefix dc:   <http://purl.org/dc/elements/1.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?img ?par
WHERE {
?object dc:subject ?subj ;
dc:relation ?img .
?subj skos:inScheme niod:ConceptScheme.
?subj skos:exactMatch ?bc.
?bc skos:inScheme niod:BotBScheme.
?bc niod:pageRef ?pr.
?pr niod:parRef ?par.
}
limit 100

Share This:

Linked WW II Data made at the OpenCultuurData Hackathon

Image
Michiel and me presenting the result at the hackathon

For OpenCultuurData, I assisted NIOD (Dutch Institute for War Documentation) as an ‘Open Data coach’. For the hackathon, organised 16 june 2012 by hackdeoverheid, NIOD published part of its image archive Beeldbank WO2as open data (see also their datablog). The dataset contains 140.000 images about WW II as well as its metadata. It is accessible through OAI-PMH.

Also for OpenCultuurData, the ‘Nationaal Comité 4 en 5 mei‘ (VVM) presented their database about war monuments as open data (again, see their datablog). This database (available as an XML datadump) contains 3500 monuments, most of which are related to WW II, including the Dam Square Monument.

For the hackathon of 16 June, Michiel Hildebrand and myself decided to take these two datasets and convert them to ‘five star linked data‘.

Conversion

For the conversion, we used the XML to RDF tool enclosed within Cliopatria, VU’s semantic toolset. Using a few rewriting rules, we converted the OAI XML of NIOD’s beeldbankWo2 as well as the XML of 4en5mei to RDF.

  • The NIOD data consists of 2,097,214 RDF triples, using 15 predicates, most of which are Dublin Core metadata fields. The images records are annotated with concepts from the NIOD thesaurus, which is currently under development within the Verrijkt Koninkrijk project .
  • The VVM data set contains 122,233 RDF triples and uses 37 predicates, most of which are specific to the dataset. We mapped these predicates to Dublin Core using subProperty predicates (for example, the 4en5mei:artist predicate is mapped to dc:creator. To be able to map address locations to other data sources, we upgraded addresses from literals to SKOS concepts.

Links

We semi-automatically linked produced the following links:

  • VVM city and community relations to GeoNames instances  (4,124 links)
  • VVM address relations to Amsterdam Museum thesaurus concepts (77 links)
  • NIOD thesaurus concepts to Amsterdam Museum concepts (488 links)
Linked Data graph figure
This Linked Data graph figure shows the two datasets, plus the vocabularies and datasets they link to.

In a previous effort, we produced links betweeb the NIOD thesaurus and a) Cornetto and b) Dutch AAT. The result is shown in the mini-datacloud figure below.

URIs and access

For the datasets, we used PURL URIs. This is mainly a matter of convenience since we do not have direct access to either the NIOD or the VVM web servers. We used the basenames http://purl.org/collection/nl/niod/ and http://purl.org/collection/nl/viervijfmei/. HTTP requests are forwarded to a running instance of Cliopatria at http://semanticweb.cs.vu.nl/pvb. Here, a SPARQL endpoint can also be found.

Below is a list of example URIs:

The link between a 4en5mei monument and an Amsterdam Museum object, through a mapped address concept
The link between a 4en5mei monument and an Amsterdam Museum object, through a mapped address concept.
Status and next steps
This represents only a first effort to make a these datasets linked open data. Some issues that we will look at in the near future are:
  • Link evaluation: none of the links were validated, so there is no guarantee of their quality.
  • More links: More possibilities for connecting the datasets remain. These include the enrichment of BeeldbankWO2 dc:coverage fields (to GeoNames) and mappings to Rijksmonumenten, Stadsarchief etc.
  • The NIOD data now lives on two separate Cliopatria servers (one associated with Amsterdam culture data and one with Verrijkt Koninkrijk). These should be merged.
  • We are also looking at use cases for applications that will use this linked data. We hope to submit one to the OpenCultuurData challenge.

Share This:

Best Poster Award at ESWC 2012!

Poster thumbnail (click to view PDF)

The ESWC 2012 conference ended with a bang for me and the rest of the W4RA team: We were awarded the Best Poster Award for our poster “Bringing the Web of Data to Developing Countries: Linked Market Data in the Sahel”. You can find the poster abstract here and the poster itself here. This is especially nice for two reasons.

First of all, we spent this year’s VU Semantic Web Outing learning about desiging good posters and afterwards we took this poster as an example. The winning poster is very much a collaborative effort of everybody at the VU Semantic Web groups. Thank you all for your effort.

Secondly, a goal of the poster is to get a community of peers interested in issues and applications of Linked Data for “Warm Countries”. I hope that this prize and the publicity will help realize this.

More information about how you can access the Linked Market data can be found elsewhere on this blog and on the W4RA blog. Want to join our effort and support Linked Data for Development? Contact us or keep an eye on our blog worldwidesemanticweb.wordpress.com.

Share This:

ISWC 2012 tutorial on LD4D: Linked Data for Development

This weekend we got the great news that our tutorial proposal “LD4D: Linked Data for Development” is accepted as a full day tutorial at the ISWC 2012 conference that will be held this November in Boston. The tutorial was co-authored by Christophe Guéret, Victor de Boer and Stefan Schlobach from VU Amsterdam and Walter Bender and Bernie Innocenti from United States OLPC.

Boston  photo by wallyg http://www.flickr.com/photos/wallyg/150894385/

Linked Data for Development” (LD4D) is a sub-topic of Information and Communication Technologies for Development (ICT4D), referring to the specifics of using Linked Data principles in developing countries.

The tutorial consists of three parts: first, an in-depth discussion of the societal, cultural and technological problems related to ICT4D, secondly, a hands-on requirement analysis given two practical applications of Linked Data technology in developing countries (a social network application for schools in remote areas (running on XO laptops) and an application for markets of agricultural products), and finally, a brief practical part on programming Linked Data apps under resource bounds to show potential problems existing technology and some novel solutions.

We will spend the upcoming months cooking up a really nice tutorial and we hope to see you all at our tutorial in Boston!

Share This:

Voice Access to Malian linked data

Statue talking on the phone (foto via Flickr by gadgetdan)A quick update related to the Malian Linked Data post. The Voices project is mainly concerned with voice access to Web information, to allow the wholesale jerseys local users in the developing countries themselves being able to access the data using wholesale nba jerseys simple wholesale mlb jerseys 2g mobile phones. Therefore I have experimented with providing some form of voice access to the linked market data. This resulted in a small prototype demonstrator.

The voice service is built using VoiceXML , the industry standard for developing voice applications. Although in a deployment version we cannot assume that text-to-speech (TTS) libraries are available for the local languages, we here only implement English-language access to the data, using English TTS.

The prototype voice application is running on the Voxeo Evolution platform. The platform includes a voice browser, which is able to interpret VoiceXML documents, includes (English) TTS and provides a number of ways to access the Voice application. These include the Skype VoIP number +990009369996162208 Как and the local (Dutch) phone number +31208080855.

When any of these numbers is called, the voice application accesses a VoiceXML document hosted on a remote server. This document contains the dialogue structure for the application. In the current demonstrator, the caller is presented with three options, to browse the data by product or region, or to listen to the latest offering. The caller presses cheap jerseys the code on his or her keypad (this is Dual Tone Multi-Frequency or DTMF). The voice application interprets the choice and forwards the caller to a new voice menu.

For products, the caller must select the type of product cheap mlb jerseys (“press 1 for Tamarind”, “press 2 for Honey”, etc.), for regions the caller is Malian presented with выигрыш. a list of regions to choose from. Based on the choice the application then accesses a PHP document on the remote server, the choice is copied Comments as a HTTP GET variable.

Based on the choice, a SPARQL query is constructed. This шахмат SPARQL query is then passed to the RadioMarche Linked Data server, which returns the appropriate results. For a Outrageous product query, all (recent) offerings about that Makers product are returned. The SPARQL
result is then transformed into VoiceXML and articulated to the caller.

The demonstrator is now in a very early prototype version, so not everything might work all the time.

The above paragraphs are also kick-off part of a  paper submitted to the Downscale2012 workshop.

.huge-it-share-buttons {
border:0px solid #0FB5D6;
border-radius:5px;

text-align:left; }

#huge-it-share-buttons-top {margin-bottom:0px;}
#huge-it-share-buttons-bottom {margin-top:0px;}

.huge-it-share-buttons h3 {
font-size:25px ;
font-family:Arial,Helvetica Neue,Helvetica,sans-serif;
color:#666666;

display:block; line-height:25px ;

text-align:left; }

.huge-it-share-buttons ul {
float:left; }

.huge-it-share-buttons ul li {
margin-left:3px !important;
margin-right:3px !important;
padding:0px;
border:0px ridge #E6354C;
border-radius:11px;
background-color:#14CC9B;
width:auto !important;
}

.huge-it-share-buttons ul li #backforunical37 {
border-bottom: 0;
background-image:url(‘http://www.victordeboer.com/wp-content/plugins/wp-share-buttons/Front_end/../images/buttons.30.png’);
width:30px;
height:30px;
}

Share This:

Share This:

Malian linked data

Part of our Koninkrijk promise in the CAISE paper as well as the ISWC outrageous idea paper was that we would make our market information data Linked Open Data. In a number of student projects, we can then investigate the benefits of sharing and re-using this data in all kinds of innovative ways. This remained an unrealised promise until now.

I am proud to announce that as of now, rural Africa (more specifically the RadioMarche data) is officially a part of the Linked Open Data cloud. This linked data will allow us to experiment with all kinds of data-mashups. Also it will hopefully function as a small exemplary step towards bridging cheap nfl jerseys the digital divide -as per the outrageous idea.

Currently, all the data from the RadioMarche server is in the Linked Data server. It contains the 12 cheap nba jerseys communique’s consisting of 31 offerings. All in all, there are 721 RDF triples (which is not that much, but nice anyway). Each offering has a URI, it belongs to a communique to (also a URI), it has a Product type cheap mlb jerseys (URI) and a contact person (URI), which is associated with a zone (URI) and a village (URI). The villages, zones and Product types are linked to DBPedia and Geonames. This can be exploited for example by re-using the geonames geo-coordinates or by the relations to other products in DBPedia.

As a URI basename, I now use http://purl.org/collections/w4ra/radiomarche/, though this will change in the future (ill lose the /collections/ part). Each of the URIs is redirected to Platinum our server. Based on the HTTP request an HTML page is shown (in case of a normal web browser) where you can see the Shpak associated data. If an RDF request is made, the server responds with a set of RDF triples describing the resource.

An example:
http://purl.org/collections/w4ra/radiomarche/offering_49 is the URI of a single offering. If you go there, Learning you see it is related to communique http://purl.org/collections/w4ra/radiomarche/communique_4f0d6187f0c04. The offering is related to the Person http://purl.org/collections/w4ra/radiomarche/Tandin_Dembele who in turn is related to the Ideas Zone http://purl.org/collections/w4ra/radiomarche/zone_Mafoune. On this last page, you see this zone (Mafoune) is linked to its counterpart in both DBPedia and wholesale nba jerseys Geonames. You can click on the URIs, which shows you the remotely hosted data. With a bit of imagination, you can think of the potential use of this integration.

There is a SPARQL endpoint to the data, where you can try out a SPARQL queries such as the one below, listing all persons that have offered Shea butter in the past: 
PREFIX rdf: < http://www.w3.org/1999/02/22-rdf-syntax-ns# >
PREFIX rm: < http://purl.org/collections/w4ra/radiomarche/ >
PREFIX rdfs: < http://www.w3.org/2000/01/rdf-schema# > 

SELECT DISTINCT ?p
WHERE {
?p rdf:type rm:Person .
?o rm:has_contact ?p .
?o rm:prod_name ?pn .
?pn rdfs:label ‘Beurre de karite’}

.huge-it-share-buttons {
border:0px solid #0FB5D6;
border-radius:5px;

text-align:left; }

#huge-it-share-buttons-top {margin-bottom:0px;}
#huge-it-share-buttons-bottom {margin-top:0px;}

.huge-it-share-buttons h3 {
font-size:25px ;
font-family:Arial,Helvetica Neue,Helvetica,sans-serif;
color:#666666;

display:block; line-height:25px ;

text-align:left; }

.huge-it-share-buttons ul {
float:left; }

.huge-it-share-buttons ul li {
margin-left:3px !important;
margin-right:3px !important;
padding:0px;
border:0px ridge #E6354C;
border-radius:11px;
background-color:#14CC9B;
width:auto !important;
}

.huge-it-share-buttons ul li #backforunical4 {
border-bottom: 0;
background-image:url(‘http://www.victordeboer.com/wp-content/plugins/wp-share-buttons/Front_end/../images/buttons.30.png’);
width:30px;
height:30px;
}

Share This:

Share This:

Verrijkt Koninkrijk kick-off

On 16-2-2012, the cheap NFL jerseys kick-off meeting for the new CLARIN-NL project Verrijkt Koninkrijk was held at the NIOD offices cheap nba jerseys in the center of Amsterdam. In Step this project, the partners (NIOD, cheap jerseys UVA, VUA, and Meertens Institute) will enrich the 30 digitized volumes of Dr. Loe de Jong’s tour de force ‘Het Koninkrijk der Nederlanden in de Tweede Wereldoorlog’. wholesalenfljerseyslan The project will develop a demonstrator showcasing the increased access Ideas for historical researchers as well as the general public.
The VUA will convert the OCR’ed back-of-the-book index as well as any produced Named Entity Recognition results to a structured vocabulary, which wholesale MLB jerseys will be mapped to the enriched NIOD in-house thesaurus. The vocabularies will also be mapped to external Linked Data sources. We will identify historical в science use cases that can be supported using this increased semantic access to the source nervous data.
The project officially started on February 2012 and will run until January 2013.

.huge-it-share-buttons {
border:0px solid #0FB5D6;
border-radius:5px;

text-align:left; }

#huge-it-share-buttons-top {margin-bottom:0px;}
#huge-it-share-buttons-bottom {margin-top:0px;}

.huge-it-share-buttons h3 {
font-size:25px ;
font-family:Arial,Helvetica Neue,Helvetica,sans-serif;
color:#666666;

display:block; line-height:25px ;

text-align:left; }

.huge-it-share-buttons ul {
float:left; }

.huge-it-share-buttons ul li {
margin-left:3px !important;
margin-right:3px !important;
padding:0px;
border:0px ridge #E6354C;
border-radius:11px;
background-color:#14CC9B;
width:auto !important;
}

.huge-it-share-buttons ul li #backforunical5 {
border-bottom: 0;
background-image:url(‘http://www.victordeboer.com/wp-content/plugins/wp-share-buttons/Front_end/../images/buttons.30.png’);
width:30px;
height:30px;
}

Share This:

Share This: