Kickoff meeting Quantifying Historical Perspectives on WWII

Today, the kickoff meeting for the project Quantifying Historical Perspectives on WWII was held. This is one of the projects funded by the Data Science Research Center. In this VU-UvA collaboration project*, two students will be investigating different perspectives on the Second World War. Specifically, they will employ a data science pipeline to look in all kinds of different media (Wikipedia, Verrijkt Koninkrijk, KB newspapers,…) and identify and visualize different perspectives. 


The students will build on previous work (Verrijkt Koninkrijk, …)  and on existing analysis tool (xTAS, ThemeStreams) to provide insight into the volume, selection and depth of WWII-related topics across different media, times and locations.


* The project proposal was submitted by Daan Odijk from UvA and Laura Hollink, Jacco van Ossenbruggen and Victor de Boer from VUA.

Verrijkt Koninkrijk at the Soeterbeeck E-humanities workshop

The Soeterbeeck monastery with two e-humanistsLast week, I presented our work on the Verrijkt Koninkrijk project at the E-humanities workshop in the Soeterbeeck monastery which was organised by the university of Nijmegen and the e-humanities group of KNAW.

It was a very pleasant get-together with some nice talks and hands on sessions. Alice Dijkstra from NWO  presented a number of opportunities for getting funding for e-humanities projects. She mentioned some obvious candidates (vernieuwingsimpuls,…) and some less obvious ones (the hopefully upcoming CLARIAH programme, which would continue CLARIN and DARIAH).

The two hands on sessions were nice but showed that there is a more general issue with e-humanities that ‘nice tools’ are being developed but that these tools remain solutions to a single problem. Next to that they are either nice from a computer science or from a historical science viewpoint but it is hard to do exciting and historical science at the same time. This is reenforced by the issue that historical scientists rarely know what type of tools they want at the beginning of a project. A more interactive and cyclical approach makes sense for both parties. The BiographyNet idea of putting the researchers from different backgrounds in the same room would be one solution. The other in my view is the development of more general-purpose query environments .

In my poster presentation I showed how I tried to do that with Verrijkt Koninkrijk and I think for a more or less generic data analysis interface is also a good idea.

You can download the VK poster Abstract as well as the actual Poster.

Links to some of the web-demo’s we tried:

The Verrijkt Koninkrijk Hackathon Report

On Friday, March 8th, we organized a Verrijkt Koninkrijk Linked Data Hackathon at the Intertain Lab of VU Amsterdam. The event was co-sponsored by the Network Institute. The goal of the hackathon was to allow third party developers to produce (ideas for) innovative applications beyond the Verrijkt Koninkrijk core research questions. We especially encouraged the use of the Linked Data produced in the project.


As organizers, we are very happy with the produced prototypes. The benefits are following:

  • The produced applications show the (unexpected) reusability of the VK (Linked) Open Data. The applications produced or suggested give new browsing opportunities, links to other datasets or show how the data can be used in a completely novel context.The hackathon revealed that indeed the data is usable for external developers using the documentation provided. Some bugs were found, some of which could be fixed during the hackathon.
  • Important concepts around data quality were articulated by the users. Although it falls outside of the scope of this project, subsequent curation of the dat should involve considering ways of allowing experts or amateurs to correct errors in the data.
  •  The VK project data is made known to researchers and developers from related projects, for example that of Agora or BiographyNed. We expect that this ensures future use of the data by related projects.

We here present short descriptions of what the six hacker teams cooked up. Two prize winners were announced by the jury, for “best use of data” and “coolest app” respectively. The jury consisted of Kees Ribbens and Edwin Klijn from NIOD, Serge ter Braake and Victor de Boer from VU. More photos of the event can be seen at



Niels used the data from the Named Entity index to create a history browser which allows the user to browse information about WWII on basis of persons, locations, organisations, etc. (the NER classes). For this he reused the Agora Touch demonstrator. When a class is chosen a list of entities is shown with images which are resolved through the alignment with DBpedia. Niels used the LDtogo framework to map the selected data on the API interface of the Agora demo.


This group set out to to recreate the network of important people of the Netherlands during WWII and their quotes in fake Facebook profiles, trying to imitate the reality of their time. We feed automatically these streams with the contents of the VK datasets: little Cliopatria and Python snippets retrieve data from SPARQL endpoints, resolve the structured XML texts, extract the quotes and expose them using the Facebook Graph API. View the project on GitHub and see the live demo at

image031Lourens aligned the VK data with that of Agora Rijksmuseumusing the Amalgame alignment tool. This is used to link VK data to RM images using the Rijksmuseum API via (results shown here (pdf)) He furthermore started to use the Verrijkt Koninkrijk data to add links to VK from within our AGORA demo that is an event centered browser for the Rijksmuseum content. Very rough results show a AGORA demo entry for Duitsland.

image028The application of Chris van Aart shows how the monument data from Vier en Vijf Mei can be browsed using the Cube browser on IOS. THis allows for multi-faceted browsing between Dutch war monuments. By flipping the screen, one can actually look at the RDF data!

image029Michiel built a web map application showing the liberation of Nijmegen in 1944. 1940s data and current maps scan be superimposed over eachother therefore showing for example what part of the city was damaged during the liberation. Further additions include 17,19 and 20th Century maps. A demo can be seen at An attempt was made to include Vier en Vijf Mei monument data in this dataset

image018Willem presented the idea to visualise the VK data using the InContext RDF visualizer for enriched publications. Unfortunately, due to time constraints, Willem did not succeed in getting everything up and running.  [screencast]




Linked WW II Data made at the OpenCultuurData Hackathon

Michiel and me presenting the result at the hackathon

For OpenCultuurData, I assisted NIOD (Dutch Institute for War Documentation) as an ‘Open Data coach’. For the hackathon, organised 16 june 2012 by hackdeoverheid, NIOD published part of its image archive Beeldbank WO2as open data (see also their datablog). The dataset contains 140.000 images about WW II as well as its metadata. It is accessible through OAI-PMH.

Also for OpenCultuurData, the ‘Nationaal Comité 4 en 5 mei‘ (VVM) presented their database about war monuments as open data (again, see their datablog). This database (available as an XML datadump) contains 3500 monuments, most of which are related to WW II, including the Dam Square Monument.

For the hackathon of 16 June, Michiel Hildebrand and myself decided to take these two datasets and convert them to ‘five star linked data‘.


For the conversion, we used the XML to RDF tool enclosed within Cliopatria, VU’s semantic toolset. Using a few rewriting rules, we converted the OAI XML of NIOD’s beeldbankWo2 as well as the XML of 4en5mei to RDF.

  • The NIOD data consists of 2,097,214 RDF triples, using 15 predicates, most of which are Dublin Core metadata fields. The images records are annotated with concepts from the NIOD thesaurus, which is currently under development within the Verrijkt Koninkrijk project .
  • The VVM data set contains 122,233 RDF triples and uses 37 predicates, most of which are specific to the dataset. We mapped these predicates to Dublin Core using subProperty predicates (for example, the 4en5mei:artist predicate is mapped to dc:creator. To be able to map address locations to other data sources, we upgraded addresses from literals to SKOS concepts.


We semi-automatically linked produced the following links:

  • VVM city and community relations to GeoNames instances  (4,124 links)
  • VVM address relations to Amsterdam Museum thesaurus concepts (77 links)
  • NIOD thesaurus concepts to Amsterdam Museum concepts (488 links)
Linked Data graph figure
This Linked Data graph figure shows the two datasets, plus the vocabularies and datasets they link to.

In a previous effort, we produced links betweeb the NIOD thesaurus and a) Cornetto and b) Dutch AAT. The result is shown in the mini-datacloud figure below.

URIs and access

For the datasets, we used PURL URIs. This is mainly a matter of convenience since we do not have direct access to either the NIOD or the VVM web servers. We used the basenames and HTTP requests are forwarded to a running instance of Cliopatria at Here, a SPARQL endpoint can also be found.

Below is a list of example URIs:

The link between a 4en5mei monument and an Amsterdam Museum object, through a mapped address concept
The link between a 4en5mei monument and an Amsterdam Museum object, through a mapped address concept.
Status and next steps
This represents only a first effort to make a these datasets linked open data. Some issues that we will look at in the near future are:
  • Link evaluation: none of the links were validated, so there is no guarantee of their quality.
  • More links: More possibilities for connecting the datasets remain. These include the enrichment of BeeldbankWO2 dc:coverage fields (to GeoNames) and mappings to Rijksmonumenten, Stadsarchief etc.
  • The NIOD data now lives on two separate Cliopatria servers (one associated with Amsterdam culture data and one with Verrijkt Koninkrijk). These should be merged.
  • We are also looking at use cases for applications that will use this linked data. We hope to submit one to the OpenCultuurData challenge.

