Verrijkt Koninkrijk at the Soeterbeeck E-humanities workshop

The Soeterbeeck monastery with two e-humanistsLast week, I presented our work on the Verrijkt Koninkrijk project at the E-humanities workshop in the Soeterbeeck monastery which was organised by the university of Nijmegen and the e-humanities group of KNAW.

It was a very pleasant get-together with some nice talks and hands on sessions. Alice Dijkstra from NWO  presented a number of opportunities for getting funding for e-humanities projects. She mentioned some obvious candidates (vernieuwingsimpuls,…) and some less obvious ones (the hopefully upcoming CLARIAH programme, which would continue CLARIN and DARIAH).

The two hands on sessions were nice but showed that there is a more general issue with e-humanities that ‘nice tools’ are being developed but that these tools remain solutions to a single problem. Next to that they are either nice from a computer science or from a historical science viewpoint but it is hard to do exciting comp.science and historical science at the same time. This is reenforced by the issue that historical scientists rarely know what type of tools they want at the beginning of a project. A more interactive and cyclical approach makes sense for both parties. The BiographyNet idea of putting the researchers from different backgrounds in the same room would be one solution. The other in my view is the development of more general-purpose query environments .

In my poster presentation I showed how I tried to do that with Verrijkt Koninkrijk and I think for a more or less generic data analysis interface is also a good idea.

You can download the VK poster Abstract as well as the actual Poster.

Links to some of the web-demo’s we tried:
http://collatex.net/demo/
http://voyeurtools.org/?skin=scatter
http://eccentricity.org/delta3d/

Share This:

VOICES video

As the VOICES project is ending, we wanted to wrap up our results in the form of a nice video. The result shows the three systems (RadioMarche, Foroba Blon and Tabale) that have been deployed and tested in Mali, Africa. The video was shot by people from the project and edited by Pepijn Borgwat from Synergique and myself. There is an English and a French version, both are embedded below.

[vimeo http://www.vimeo.com/68218759 w=500&h=281] [vimeo http://www.vimeo.com/68218758 w=500&h=281]

The Web Of Voices (english) and Le Web Par La Voix (francais) from Synergique

 

 

Share This:

Web Science 2013 (and a bit of CHI)

Nana presenting the VOICES paperLast week, coming back from Mali, I extended my stopover in Paris for a week to visit the Web Science 2013 conference, which was colocated with CHI 2013. It was my first time visiting the WebSci conference and I want to use this short post to share my impressions. My colleague Paul Groth visited both conferences as well and wrote an excellent trip report.

One aspect I really enjoyed was the fact that it was colocated with CHI and that participants were encouraged to go to eachothers sessions. I visited a panel on speech technologies, which is
The Web Science conference started for me with the IFIP VASCO workshop followed by the Web Science education workshop. I think theoretically for a young field as Web Science it is a good idea to discuss the different programmes and it was nice to hear the different aspects of the curriculums. I hope that next year, the VU Web Science minor is listed on the different international web pages as a good example of how to do a Web Science programme for non-computer science students.very relevant in the light of our VOICES project. The CHI interactivity demo sessions were very cool and convinced me to try to send in something to this conference next year.

Data-DJ Paul Groth

The conference itself featured a nice keynote speech by one of the “Fathers of the Internet” Vint Cerf and by the great Cory Doctorow on the horrors of DRM. It was very nice to hear about Internet Law and copyright protection from someone who -as an artist- is on the potential benefiting end of the discussion.  Doctorow is at the same time well versed in the technical details and can tell a coherent story (no slides!).

Although I was very skeptical about the format beforehand, I feel the Pecha Kucha session  was a very successful endeavour. Paul Groth presented our paper on online prayer [1] excellently and Nana Gyan did a great job telling the Web Science audience about our work in Mali [2]. Both papers were among the 8 papers that were nominated for best paper, so that is a great achievement!

Some of the other sessions dragged on a bit too long and although I appreciate the intent of trying to make the conference more interdisciplinary, I felt that some of the philosophically and sociologically oriented papers were not that well presented. In some cases it was hard to find where the “science” was.

Some papers I liked:

  • Taxis Metaxas story on the analysis of “Narco-tweets” and citizen journalism in Mexico. Mr. Metaxas gave a very passionate account of his research in Mexico and played a voice recording of the anonymous citizen journalist @MelissaLotzer. A very example of socially relevant research.
  • Another good presentation about web journalism is the talk by Souneil Park  Challenges and Opportunities of Local Journalism: A case study of the 2012 Korean general election.
  • I liked Harry Halpin’s talk on the question of whether or not the Web extends the mind although I’m not sure if I can judge it on its philosophical merits.
  • I saw some nice posters, including one on the Open Annotation specs, one on a webtool for supporting archeologist

All in all it was a nice conference, and it was very interesting to see that in a lot of presentations  the meta-discussion on what constitutes Web Science, what its scope is , what the methodologies are, was discussed to great lengths.

Hope to be there again next year.

Archeology Poster

[1] Fabian Eikelboom, Paul Groth, Victor de Boer and Laura Hollink (2013) A Comparison between Online and Offline Prayer. In Web Science 2013. [PDF]

[2] Nana Baah Gyan, Victor de Boer, Anna Bon, Chris van Aart, Stephane Boyera, Hans Akkermans, Mary Allen, Aman Grewal, Max Froumentin. Voice-based Web access in rural Africa. In Web Science 2013

Share This:

Dutch Ships and Sailors project started

a very unofficial DSS logo i madeLast week saw the kickoff of the new Clarin NL-funded project “Dutch Ships and Sailors”(*). This project will run for one year and gives me the opportunity to work with historians from both VU and Huygens ING on applying Linked Data principles to Dutch maritime-historical data. From the official description:

As a sea-faring nation, a large portion of Dutch history is found on the water. However, much of the digitized historical source material is still scattered across many databases and archives. This curation and demonstrator project aims to bring together the rich maritime historical data preserved in the many different databases. We propose a (semantic) web-based infrastructure

that will house various maritime-historical datasets. We will provide a tool chain and methodology for converting legacy datasets. The infrastructure includes common vocabularies to normalize and enrich existing data. Links are established between the datasets and to other relevant datasets on the Web. Although the infrastructure will be set up to facilitate 25+ identified datasets, we initially populate the infrastructure with four selected datasets. These will allow us to investigate two case studies in order to answer the historical research question “To what extent did patterns of shipping and recruitment in the Dutch maritime sector change over the course of the 18th and 19th centuries?”

(*) the project’s official title is Dutch Ships and Seamen, but we think this is potentially less problematic 🙂

Share This:

The Verrijkt Koninkrijk Hackathon Report

On Friday, March 8th, we organized a Verrijkt Koninkrijk Linked Data Hackathon at the Intertain Lab of VU Amsterdam. The event was co-sponsored by the Network Institute. The goal of the hackathon was to allow third party developers to produce (ideas for) innovative applications beyond the Verrijkt Koninkrijk core research questions. We especially encouraged the use of the Linked Data produced in the project.

image015

As organizers, we are very happy with the produced prototypes. The benefits are following:

  • The produced applications show the (unexpected) reusability of the VK (Linked) Open Data. The applications produced or suggested give new browsing opportunities, links to other datasets or show how the data can be used in a completely novel context.The hackathon revealed that indeed the data is usable for external developers using the documentation provided. Some bugs were found, some of which could be fixed during the hackathon.
  • Important concepts around data quality were articulated by the users. Although it falls outside of the scope of this project, subsequent curation of the dat should involve considering ways of allowing experts or amateurs to correct errors in the data.
  •  The VK project data is made known to researchers and developers from related projects, for example that of Agora or BiographyNed. We expect that this ensures future use of the data by related projects.

We here present short descriptions of what the six hacker teams cooked up. Two prize winners were announced by the jury, for “best use of data” and “coolest app” respectively. The jury consisted of Kees Ribbens and Edwin Klijn from NIOD, Serge ter Braake and Victor de Boer from VU. More photos of the event can be seen at www.few.vu.nl/~vbr240/verrijktkoninkrijk/hackathon/.

TOUR APPLICATION AND TOUCH TABLE DEMO [Niels Ockeloen]  WINNER “COOLEST APP”

image024

Niels used the data from the Named Entity index to create a history browser which allows the user to browse information about WWII on basis of persons, locations, organisations, etc. (the NER classes). For this he reused the Agora Touch demonstrator. When a class is chosen a list of entities is shown with images which are resolved through the alignment with DBpedia. Niels used the LDtogo framework to map the selected data on the API interface of the Agora demo.

VERRIJKT KONINKRIJK ON FACEBOOK [Albert Merono & Wouter Beek] WINNER “BEST USE OF DATA”image016

This group set out to to recreate the network of important people of the Netherlands during WWII and their quotes in fake Facebook profiles, trying to imitate the reality of their time. We feed automatically these streams with the contents of the VK datasets: little Cliopatria and Python snippets retrieve data from SPARQL endpoints, resolve the structured XML texts, extract the quotes and expose them using the Facebook Graph API. View the project on GitHub and see the live demo at  http://www.facebook.com/verrijkt.koninkrijk

INTEGRATION WITH AGORA RIJKSMUSEUM DATA [Lourens van der Meij]
image031Lourens aligned the VK data with that of Agora Rijksmuseumusing the Amalgame alignment tool. This is used to link VK data to RM images using the Rijksmuseum API via http://eculture2.cs.vu.nl:43020/ (results shown here (pdf)) He furthermore started to use the Verrijkt Koninkrijk data to add links to VK from within our AGORA demo that is an event centered browser for the Rijksmuseum content. Very rough results show a AGORA demo entry for Duitsland.

CUBE-BASED BROWSING [Chris van Aart]
image028The application of Chris van Aart shows how the monument data from Vier en Vijf Mei can be browsed using the Cube browser on IOS. THis allows for multi-faceted browsing between Dutch war monuments. By flipping the screen, one can actually look at the RDF data!

MAP LAYERS SHOWING THE LIBERATION OF NIJMEGEN [Michiel van Dijk]
image029Michiel built a web map application showing the liberation of Nijmegen in 1944. 1940s data and current maps scan be superimposed over eachother therefore showing for example what part of the city was damaged during the liberation. Further additions include 17,19 and 20th Century maps. A demo can be seen at www.numagapp.nl An attempt was made to include Vier en Vijf Mei monument data in this dataset

INCONTEXT DATA VISUALISATION [Willem Melder]
image018Willem presented the idea to visualise the VK data using the InContext RDF visualizer for enriched publications. Unfortunately, due to time constraints, Willem did not succeed in getting everything up and running.  [screencast]

 

 

image010

Share This:

Verrijkt Koninkrijk Hackathon Instructions

This post will provide all the participants of the Verrijkt Koninkrijk Hackathon with the information and data they need to start building great applications.

Piratepad
On this Piratepad, I suggest we all note down the progress and results: [http://piratepad.net/tzvFB5AGuk]

The text

In this deliverable document [ dx1- deliverable pdf], you can find more detailed information about (the origin of) the data.

The Verrijkt Koninkrijk Data concerns Dr Loe de Jong’s Het Koninkrijk der Nederlanden in de Tweede Wereldoorlog and was based on the PDFs as provided by NIOD at http://www.niod.knaw.nl/koninkrijk/ . The books have been OCRed and transformed to structured XML by researchers from the Universiteit van Amsterdam. This data is available through www.loedejongdigitaal.nl. A search interface is available at http://search.loedejongdigitaal.nl.  A resolver server was installed which responds by presenting the structure (in XML) when presented with a URL. For example, http://resolver.loedejongdigitaal.nl/nl.vk.d.1.6.1.43 is resolved to the XML fragment of that paragraph. Removing the last number of the identifier (43) results in its broader section, etcetera. Paragraphs are the smallest logical units (also, a page is not a logical unit).

Linked Data

We provide two RDF ‘stepping stones’ into the book text. The ‘Back of the Book index’ and the ‘Named Entities index’. Both are SKOS vocabularies and consist of terms pointing to resolver.loedejongdigitaal.nl URIs. These vocabularies are linked to external sources as well. All RDF is available as Linked Data at the VK Semantic Layer at http://semanticweb.cs.vu.nl/verrijktkoninkrijk/ The base namespace for the VK/NIOD triples is http://purl.org/collections/nl/niod/ (abbreviated as niod:). Datasets, mapping sets and schemata are all loaded as separate named graphs (http://semanticweb.cs.vu.nl/verrijktkoninkrijk/browse/list_graphs).

A SPARQL endpoint is also available at http://semanticweb.cs.vu.nl/verrijktkoninkrijk/sparql/ with an interactive SPARQL editor available at http://semanticweb. cs.vu.nl/verrijktkoninkrijk/flint/ You can login with “hacker”/”hacker” (if needed).

Back of the book index (BotB index)

The BotB index consists of 15,234 SKOS Concepts, consolidated from the manual index. They link to RDF blank nodes using the niod:pageRef predicate. The blank node links to individual paragraphs using niod:parRef predicates.  An example is shown below.

hack2

The BotB index is partially aligned with the NIOD thesaurus (see below), GeoNames, Cornetto and AATNed. The BotB index, the schema, the alignments are found in separate RDF turtle files

Named Entity index (NE index)

This SKOS vocabulary consists of 88,243 concepts, resulting of  Named Entity recognition. The NEs are of type person, location, organisation, misc, product and event. They link into the text through direct niod:pRef links. An example is shown below.

hack1

The NE concepts are partially aligned with DBPedia (through wikilinks established during the NER process), with GeoNames (locations only), with GTAA and the NIOD thesaurus. There is also mapping to the BotB index (can be used to use a ‘higher quality’ subset).

Example SPARQL Queries

This page lists a number of sparql queries that exploit some of the links presented above. It accompanies a paper and deliverable.

Other data sources in the semantic layer

Pillarization

Within the project, we are very much interested in the concept of pillarization and how Loe de Jong describes it. For this reason, we have added a turtle file what links Pillar concepts (Protestants, Jews, Communists, etc) to persons, organisations etc. found in the BotB index. This is a manual list of 60 links, which was semi-automatically expanded to 254 links. You can find the original list  here and the expanded one here.

We did a number of  analyses using this data, a (Dutch) PDF document describing the results can be found here [zuilen (pdf)].

License

Het Koninkrijk is licensed under the Creative Commons Naamsvermelding 3.0 Nederland licentie. The VK Linked Data are NIOD thesaurus are also available under that licenseV. The VM monument data is available under the CC0 Publieke Domein Dedicatie verklaring license

Share This:

Verrijkt Koninkrijk Hackathon

HackathonOn Friday March 8, I will organize a small ‘hackathon’ workshop sponsored by the Network Institute in the context of the Verrijkt Koninkrijk project. In this project, we have created a linked data set for “Het Koninkrijk der Nederlanden in WoII” and linked it to some other datasets. We have shown some nice applications but would really like to show how this linked data can be used to create all kinds of nice applications/mashups/visualisations. If you, or any of your students, would like to join in, please let me know! A real hackathon has pizza, coke and prizes so I will definitely make sure that these things are present in our hackathon as well! The hackathon is the whole friday from 10h-17.30 in the Intertain Lab of the VU but you can obviously join half a day or just pop in. If you want to have a sneak peek at the data: here is The Verrijkt Konikrijk Semantic Layer and here’s the search interface (non-semantic, still very nice) Please send me a mail at (v.de.boerATvu.nl) or leave a message below if you are interested in hacking along, so I can order enough pizza’s 🙂

Attendees

  • Chris van Aart (2CoolMonkeys)
  • Michiel van Dijk (2CoolMonkeys)
  • Michiel Hildebrand (VU Web and Media – Data2Sematics)
  • Niels Ockeloen (VU Web and Media – BiographyNed project)
  • Lourens van der Meij (VU Web and Media – Agora project)
  • Johan van Doornik (UvA ILPS)
  • Albert Meronyo (VU Knowledge Representation and Reasoning)
  • Wouter Beek (VU Knowledge Representation and Reasoning)
  • Kees Ribbens (NIOD)
  • Tim Veken (NIOD)
  • Willem Melder (Beeld en Geluid)

Share This:

Verrijkt Koninkrijk Semantic Layer update: now with more DBPedia!

overview

The Verrijkt Koninkrijk Semantic Layer (which has gotten a small makeover) now is expanded with 13,160 links to the Dutch version of DBPedia (nl.dbpedia.org). The Named Entities that have been identified by the UvA recognizers have been converted to SKOS and are loaded in the semantic layer. The wikipedia links, also from the UvA algorithms, have been converted to owl:sameAS links to Dutch and English DBPedia. To allow for some nice SPARQL querying, I have fetched the RDF triples for the linked Dutch DBPedia concepts (interactive SPARQL endpoint at http://semanticweb.cs.vu.nl/verrijktkoninkrijk/flint/).

An example of such a nice SPARQL query is this one, which retrieves all paragraphs in Loe de Jong’s text (retrievable through the loedejongdigitaal.nl resolver) that mention a person that was (or later became) a Prime minister (limited to the first 100 results).

PREFIX dcterms: <http://purl.org/dc/terms/ >
PREFIX niod: <http://purl.org/collections/nl/niod/ >
PREFIX skos: <http://www.w3.org/2004/02/skos/core# >
PREFIX dbp-prop: <http://nl.dbpedia.org/property/ >
PREFIX dbp-res: <http://nl.dbpedia.org/resource/ >

SELECT * WHERE {
?entity niod:nerClass niod:nerclass-per;
owl:sameAs ?dbpedia_entry;
niod:pRef ?pref.
?dbpedia_entry dbp-prop:functie dbp-res:Minister-president_van_Nederland.
}
LIMIT 100

Or this next one, that lists for each of the found named entities of type ‘event’ an image depicting that event.

PREFIX dcterms: <http://purl.org/dc/terms/ >
PREFIX niod: <http://purl.org/collections/nl/niod/ >
PREFIX skos: <http://www.w3.org/2004/02/skos/core# >
PREFIX dbp-prop: <http://nl.dbpedia.org/property/ >
PREFIX foaf: <http://xmlns.com/foaf/0.1/ >

SELECT DISTINCT ?event ?place WHERE {
?event niod:nerClass niod:nerclass-eve;
owl:sameAs ?dbpedia_entry;
niod:pRef ?pref.
?dbpedia_entry foaf:depiction ?picture.
}
LIMIT 100

Again, all results should be taken with a grain of salt, since many OCR, conversion and linking-errors occur. The quality of the DBPedia conversion is unknown and outside of the scope of the Verrijkt Koninkrijk project.

[update: Links have been updated to http://semanticweb.cs.vu.nl/verrijktkoninkrijk/]

Share This:

Open Tea at CIS-VU

Open Tea at CISToday, the fourth floor of the VU metropolitan building was the scene of the Open Tea, a series of events organised by the Open for Change network. Open for Change organises such an event, open to everyone interested in Open Development a couple of times a year somewhere in the Netherlands.

After meeting members at the OKFest in Helsinki, we proposed to hold an Open Tea event at VU University. Specifically, the Centre for International Cooperation (CIS-VU) was kind enough to host the event.

We got to talk about the API2LOD for Development data. API2LOD is a tool that is under development by Christophe Guéret and myself that allows the creation of a wrapper that exposes any data from any API into five-star linked data. Our first two datasets are related to development: the data from the knowledge base of the Institute for Development Studies (IDS) and data from the International Aid Transparancy Initiative (IATI). Through our efforts, we hope to bring more data on development to the Web of Linked Data.

The second VU talk was about the efforts of the W4RA team to bring the Web to people in rural developent areas using voice services. In particular RadioMarché and Foroba Blon.

Pelle Aardema and Rolf Kleef then gave us an update on the status of Open for Change and plans for the future. I am sure that VU Amsterdam’s  Network Institute and CIS-VU can benefit greatly from continuing this collaboration and we hope that we can add our knowledge and experience to the network.

Share This:

SPARQL Queries for Verrijkt Koninkrijk

[update: the links have been updated] In this post, I list a number of SPARQL queries that show the way external sources can be used to provide enriched access to the Verrijkt Koninkrijk text. The queries go with a two-page abstract  entitled “Enriched Access to a Large War Historical Text using the Back of the Book Index” I submitted to the SWAIE 2012 – Semantic Web and Information Extraction workshop I will be attending.

These queries use the back-of-the-book index that has been converted to SKOS and was subsequently aligned with a number of datasources.

The queries can be entered in the interactive SPARQL interface of the Verrijkt Koninkrijk semantic server, which can be found at http://semanticweb.cs.vu.nl/verrijktkoninkrijk/flint/ . (login: sparqltester, ww: sparqltester).

Query1: GeoNames. Get all paragrahs containing references to a place in the Dutch Province “Noord Holland”:

PREFIX niod: <http://purl.org/collections/nl/niod/>
prefix dc:   <http://purl.org/dc/elements/1.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?subj ?bc ?par
WHERE  {
?subj <http://www.geonames.org/ontology#parentADM1> <http://sws.geonames.org/2749879/>.
?bc skos:closeMatch ?subj.
?bc skos:inScheme niod:BotBScheme.
?bc niod:pageRef ?pr.
?pr niod:parRef ?par.
}
limit 100

Edit 3 oct: I continued experimenting with some other SPARQL queries and used Willem van Hage and Tomi Kauppinen’s excellent SPARQl package for R to do some quick-and-dirty statistical analysis. I used a variant of  the query above, but with the province as a variable. I put the results in a pie chart showing Loe de Jong’s mentions of places found in each of the twelve provinces of the Netherlands.

Frequencies of page references to places in each of the twelve provinces in "Het Koninkrijk"
Frequencies of page references to places in each of the twelve provinces in “Het Koninkrijk”

And if you substitute the predicate ‘parentADM1’ for ‘parentADM2’, you get the frequencies for the individual municipalities:

Frequencies of page references to municipalities in "Het Koninkrijk"
Frequencies of page references to municipalities in “Het Koninkrijk”

I will leave the historical interpretation of these charts to the reader. Note however that a major disclaimer is needed. There are numerous errors in the data, including OCR errors, and concept  mapping errors. I am sure that the municipality ‘Berkelland’ is not as important as it now seems. Also, the data should be normalized by province size to give a better idea of what is going on.

The point is however that -given the linked data- these analyses are ridiculously easy to perform with SPARQL and R.

Query2: NIOD Thesaurus Beeldbank WO2. Get all combinations of BBWO2 images and paragraphs

PREFIX niod: <http://purl.org/collections/nl/niod/>
prefix dc:   <http://purl.org/dc/elements/1.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?img ?par
WHERE {
?object dc:subject ?subj ;
dc:relation ?img .
?subj skos:inScheme niod:ConceptScheme.
?subj skos:exactMatch ?bc.
?bc skos:inScheme niod:BotBScheme.
?bc niod:pageRef ?pr.
?pr niod:parRef ?par.
}
limit 100

Share This: