DHBenelux2023 trip report

Two weeks ago, I visited the 2023 edition of the Digital Humanities Benelux conference in Brussels. It turned out this was the 10th anniversary edition, which goes to show that the Luxembourgian, Belgian and Dutch DH community is alive and kicking! This years gathering at the Royal Library of Belgium brought together humanities and computer science researchers and practitioners from the BeNeLux and beyond. Participants got to meet interesting tools, datasets and use cases, all the while critically assessing issues around perspective, representation and bias in each.

On the workshop day, I attended part of a tutorial organized by people from Göttingen University on the use of Linked Data for historical data. They presented a OpenRefine and WikiData-centric pipeline also including a batch wikidata editing tool https://quickstatements.toolforge.org/.

The second half of that day I attended a workshop on the Kiara tool presented by the people behind the Dharpa project. The basic premise of the tool makes a lot of sense: while many DH people use Python notebooks, it is not always clear what operations specific blocks of code map to. Reusing other peoples code becomes difficult and reusing existing data transformation code is not trivial. The solution of Kiara is an environment in which pre-defined well-documented modules are made available so that users can easily, find, select and combine modules for data transformation. For any DH infrastructure, one has to make decisions in what flexibility to offer users. My hunch is that this limited set of operations will not be enough for arbitrary DH-Data Science pipelines and that full flexibility (provided by python notebooks) will be needed. Nevertheless, we have to keep thinking on how infrastructures provide support for pipeline transparency, reusability and cater to less digital literate users.

On the first day of the main conference, Roeland Ordelman presented our own work on the CLARIAH MediaSuite: Towards ’Stakeholder Readiness’ in the CLARIAH Media Suite: Future-Proofing an Audio-Visual Research Infrastructure. This talk was preceded by a very interesting talk from Loren Verreyen who worked with a digital dataset of program guides (I know of similar datasets archived at Beeld and Geluid). Unfortunately, the much awaited third talk on the Distracted Boyfriend meme was cancelled.

Interesting talks on the first day included a presentation by Paavo Van der Eecken on capturing uncertainty in manually annotating images. This work “Thinking Outside of the Bounding Box: A Reconsideration of the Application of Computational Tools on Uncertain Humanities Data” and its main premise that disagreement is a valuable signal are reminiscent of the CrowdTruth approach.

A very nice duo-presentation was given by Daria Kondakova and Jakob Kohler on Messy Myths: Applying Linked Open Data to Study Mythological Narratives. This paper uses the theoretical framework of Zgol to back up the concept of hylemes to analyze mythological texts. Such hylemes are triple-like statements (subject-verb-object) that describe events in text. In the context of the project, these hylemes were then converted to full-blown Linked Open Data to allow for linking and comparing versions of myths. A research prototype can be found here https://dareiadareia-messy-myths.streamlit.app/ .

The GLOBALISE project was also present at the conference with presentation about the East-Asian shipping vocabulary and a poster.


At the poster session, I had the pleasure to present a poster from students of the VU DH minor and their supervisors on a tool to identify and link occupations in biographical descriptions.

VU DH Minor students’ poster https://twitter.com/victordeboer/status/1664199079251832832

The keynote by Patricia Murrieta-Flores from University of Lancaster introduced the concept of Cosmovision with respect to the archiving and enrichment of (colonial) heritage objects from meso-America. This concept of Cosmovision is very related to our polyvocality aims and the connection to computer vision is inspiring if not very challenging.

It is great to see that DHBenelux continues to be a very open and engaging community of humanities and computer science people, bringing together datasets, tools, challenges and methods.

Student-supported project in the news

It was great to see that one of this year’s Digital Humanities in Practice projects lead to a conversation between the students in that project Helene Ayar and Edith Brooks, their external supervisors Willemien Sanders (UU) and Mari Wigham (NISV) and an advisor for another project André Krouwel (VU). That conversation resulted in original research and CLARIAH MediaSuite data story “‘Who’s speaking?’- Politicians and parties in the media during the Dutch election campaign 2021” where the content of news programmes was analysed for politicians’ names, their gender and party affiliation.

The results are very interesting and subsequently appeared on Dutch news site NOS.nl, showing that right-wing politicians are more represented on radio and tv: “Onderzoek: Rechts domineert de verkiezingscampagne op radio en tv“. Well done and congratulations!

DIVE+ receives the Grand Prize at the LODLAM Summit in Venice

We are excited to announce that DIVE+ has been awarded the Grand Prize at the LODLAM Summit, held at the Fondazione Giorgio Cini this week. The summit brought together ~100 experts in the vibrant and global community of Linked Open Data in Libraries, Archives and Museums. It is organised bi-annually since 2011. Earlier editions were held in the US, Canada and Australia, making the 2017 edition the first in Europe.

The Grand Prize (USD$2,000) was awarded by the LODLAM community. It’s recognition of how DIVE+ demonstrates social, cultural and technical impact of linked data. The Open Data Prize (of USD$1,000) was awarded to WarSampo for its groundbreaking approach to publish open data

Fondazione Giorgio Cini. Image credit: Johan Oomen CC-BY

.Five finalists were invited to present their work, selected from a total of 21 submissions after an open call published earlier this year. Johan Oomen, head of research at the Netherlands Institute for Sound and Vision presented DIVE+ on day one of the summit. The slides of his pitch have been published, as well as the demo video that was submitted to the open call. Next to DIVE+ (Netherlands) and WarSampo (Finland) the finalists were Oslo public library (Norway), Fishing in the Data Ocean (Taiwan) and Genealogy Project (China). The diversity of the finalists is a clear indication that the use of linked data technology is gaining momentum. Throughout the summit, delegates have been capturing the outcomes of various breakout sessions. Please look at the overview of session notes and follow @lodlam on Twitter to keep track.

Pictured: Johan Oomen (@johanoomen) pitching DIVE+. Photo: Enno Meijers. 

DIVE+ is an event-centric linked data digital collection browser aimed to provide an integrated and interactive access to multimedia objects from various heterogeneous online collections. It enriches the structured metadata of online collections with linked open data vocabularies with focus on events, people, locations and concepts that are depicted or associated with particular collection objects. DIVE+ is the result of a true interdisciplinary collaboration between computer scientists, humanities scholars, cultural heritage professionals and interaction designers. DIVE+ is integrated in the national CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) research infrastructure.

Pictured: each day experts shape the agenda for that day, following the OpenSpace format. Image credit: Johan Oomen (cc-by)

DIVE+ is a collaborative effort of the VU University Amsterdam (Victor de Boer, Oana Inel, Lora Aroyo, Chiel van den Akker, Susane Legene), Netherlands Institute for Sound and Vision (Jaap Blom, Liliana Melgar, Johan Oomen), Frontwise (Werner Helmich), University of Groningen (Berber Hagendoorn, Sabrina Sauer) and the Netherlands eScience Centre (Carlos Martinez). It is supported by CLARIAH and NWO.

The LODLAM Challenge was generously sponsored by Synaptica. We would also like to thank the organisers, especially Valentine Charles and Antoine Isaac of Europeana and Ingrid Mason of Aarnet for all of their efforts. LODLAM 2017 has been a truly unforgettable experience for the DIVE+ team.

Getting down with LOD tools at the 2nd CLARIAH Linked Data workshop

[cross-post from clariah.nl]

On Tuesday 13 June 2017, the second CLARIAH Linked Data workshop took place. After the first workshop in September which was very much an introduction to Linked Data to the CLARIAH community, we wanted to organise a more hands-on workshop where researchers, curators and developers could get their hands dirty.

The main goal of the workshop was to introduce relevant tools to novice as well as more advanced users. After a short plenary introduction, we therefore split up the group where for the novice users the focus was on tools that are accompanied by a graphical user interface, like OpenRefine and Gephi; whereas we demonstrated API-based tools to the advanced users, such as the CLARIAH-incubated COW, grlc, Cultuurlink and ANANSI. Our setup, namely to have the participants convert their own dataset to Linked Data and query and visualise, was somewhat ambitious as we had not taken into account all data formats or encodings. Overall, participants were able to get started with some data, and ask questions specific to their use cases.

It is impossible to fully clean and convert and analyse a dataset in a single day, so the CLARIAH team will keep investigating ways to support researchers with their Linked Data needs. For now, you can check out the CultuurLink slides and tutorial materials from the workshop and keep an eye out on this website for future CLARIAH LOD events.

The Role of Narratives in DIVE

[This post is based on Maartje Kruijt‘s Media Studies Bachelor thesis: “Supporting exploratory search with features, visualizations, and interface design: a theoretical framework“.]

In today’s network society there is a growing need to share, integrate and search in collections of various libraries, archives and museums. For researchers interpreting these interconnected media collections, tools need to be developed.  In the exploratory phase of research the media researcher has no clear focus and is uncertain what to look for in an integrated collection. Data Visualization technology can be used to support strategies and tactics of interest in doing exploratory research

Dive screenshotThe DIVE tool is an event-based linked media browser that allows researchers to explore interconnected events, media objects, people, places and concepts (see screenshot). Maartje Kruijt’s research project involved investigating to what extent and in what way the construction of narratives can be made possible in DIVE, in such a way that it contributes to the interpretation process of researchers. Such narratives can be either automatically generated on the basis of existing event-event relationships, or be constructed  manually by researchers.

The research proposes an extension of the DIVE tool where selections made during the exploratory phase can be presented in narrative form. This allows researchers to publish the narrative, but also share narratives or reuse other people’s narratives. The interactive presentation of a narrative is complementary to the presentation in a text, but it can serve as a starting point for further exploration of other researchers who make use of the DIVE browser.

Within DIVE and Clariah, we are currently extending the user interface based on the recommendations made in the context of this thesis. You can read more about it in Maartje Kruijt’s thesis (Dutch). The user stories that describe the needs of media researchers are descibed in English and found in Appendix I.

CLARIAH Linked Data Workshop

[This blog post is co-written with Marieke van Erp and Rinke Hoekstra and is cross-posted from the Clariah website]

Linked Data, RDF and Semantic Web are popular buzzwords in tech-land and within CLARIAH. But they may not be familiar to everyone within CLARIAH. On 12 september, CLARIAH therefore organized a workshop at the Vrije Universiteit Amsterdam to discuss the use of Linked Data as technology for connecting data across the different CLARIAH work packages (WP3 linguistics, WP4 structured data and WP5 multimedia).

Great turnout at Clariah LOD workshop

The goal of the workshop was twofold. First of all, to give an overview from the ‘tech’ side of these concepts and show how they are currently employed in the different work packages. At the same time we wanted to hear from Arts and Humanities researchers how these technologies would best suit their research and how CLARIAH can support them in familiarising themselves with Semantic Web tools and data.

The workshop
Monday afternoon, at 13:00 sharp, around 40 people showed up for the workshop at the Boelelaan in Amsterdam. The workshop included plenary presentations that laid the groundwork for discussions in smaller groups centred around the different types of data from the different WPs (raw collective notes can be found on this piratepad).

Rinke Hoekstra presented an Introduction Linked Data: What is it, how does it compare to other technologies and what is its potential for CLARIAH. [Slides]
In the discussion that followed, some concerns about the potential for Linked Data to deal with data provenance and data quality were discussed.
After this, three humanities researchers from each of the work packages discussed experiences, opportunities, and challenges around Linked Data. Our “Linked Data Champions” of this day were:

  • WP3: Piek Vossen (Vrije Universiteit Amsterdam) [Slides]
  • WP4: Richard Zijdeman (International Institute of Social History)
  • WP5: Kaspar Beelen and Liliana Melgar (University of Amsterdam) [Slides]

Marieke van Erp, Rinke Hoekstra and Victor de Boer then discussed how Linked Data is currently being produced in the different work packages and showed an example of how these could be integrated (see image). [Slides]. If you want to try these out yourself, here are some example SPARQL queries to play with.hisco integrated data example

Break out sessions
Finally, in the break out sessions, the implications and challenges for the individual work packages were further discussed.

  • For WP3, the discussion focused on formats. There are manynatural language annotation formats used, some with a long history, and these formats are often very closely connected to text analysis software. One of the reasons it may not be useful to WP3 to convert all tools and data to RDF is that performance cannot be guaranteed, and in some cases has already been proven to not be preserved when doing certain text analysis tasks in RDF. However, converting certain annotations, i.e. end results of processing to RDF could be useful here. We further talked about different types of use cases for WP3 that include LOD.
  • The WP4 break-out session consisted of about a dozen researchers, representing all working packages. The focus of the talk was on the expectations of the tools and data that were demonstrated throughout the day. Various persons were interested to apply QBer, the tool that allows one to turn csv files into Linked Data. The really exciting bit about this, is that the interest was shared by persons outside WP4, thus from persons usually working with text or audio-video sources. This does not just signal the interest in interdisciplinary research, but also the interest for research based on various data types. A second issue discussed was the need for vocabularies ((hierarchical) lists of standard terms). For various research fields such vocabularies do not yet exist. While some vocabularies can be derived relatively easily from existing standards that experts use, it will prove more difficult for a large range of variables. The final issue discussed was the quality of datasets. Should tools be able to handle ‘messy’ data? The audience agreed that data cleaning is the responsibility of the researcher, but that tools should be accompanied by guidelines on the expected format of the datafile.
  • In the WP5 discussion, issues around data privacy and copyrights were discussed as well as how memory institutions and individual researchers can be persuaded to make their data available as LOD (see image).

wp5 result

The day ended with some final considerations and some well-deserved drinks.

