Today, the kickoff meeting for the project Quantifying Historical Perspectives on WWII was held. This is one of the projects funded by the Data Science Research Center. In this VU-UvA collaboration project*, two students will be investigating different perspectives on the Second World War. Specifically, they will employ a data science pipeline to look in all kinds of different media (Wikipedia, Verrijkt Koninkrijk, KB newspapers,…) and identify and visualize different perspectives.
The students will build on previous work (Verrijkt Koninkrijk, …) and on existing analysis tool (xTAS, ThemeStreams) to provide insight into the volume, selection and depth of WWII-related topics across different media, times and locations.
For the conversion, we used the XML to RDF tool enclosed within Cliopatria, VU’s semantic toolset. Using a few rewriting rules, we converted the OAI XML of NIOD’s beeldbankWo2 as well as the XML of 4en5mei to RDF.
The NIOD data consists of 2,097,214 RDF triples, using 15 predicates, most of which are Dublin Core metadata fields. The images records are annotated with concepts from the NIOD thesaurus, which is currently under development within the Verrijkt Koninkrijk project .
The VVM data set contains 122,233 RDF triples and uses 37 predicates, most of which are specific to the dataset. We mapped these predicates to Dublin Core using subProperty predicates (for example, the 4en5mei:artist predicate is mapped to dc:creator. To be able to map address locations to other data sources, we upgraded addresses from literals to SKOS concepts.
We semi-automatically linked produced the following links:
VVM city and community relations to GeoNames instances (4,124 links)
NIOD thesaurus concepts to Amsterdam Museum concepts (488 links)
In a previous effort, we produced links betweeb the NIOD thesaurus and a) Cornetto and b) Dutch AAT. The result is shown in the mini-datacloud figure below.
URIs and access
For the datasets, we used PURL URIs. This is mainly a matter of convenience since we do not have direct access to either the NIOD or the VVM web servers. We used the basenames http://purl.org/collection/nl/niod/ and http://purl.org/collection/nl/viervijfmei/. HTTP requests are forwarded to a running instance of Cliopatria at http://semanticweb.cs.vu.nl/pvb. Here, a SPARQL endpoint can also be found.