I am an Associate Professor (UHD) at the User-Centric Data Science group at the Computer Science department of the Vrije Universiteit Amsterdam (VU) I am also a co-director of the Cultural AI Lab. In my research, I combine (Semantic) Web technologies with Human-Computer Interaction, Knowledge Representation and Information Extraction to tackle research challenges in various domains. These include Cultural Heritage, Digital Humanities and ICT for Development (ICT4D). I am currently involved in the following research projects:

  • HEDGE-IoT: IoT data conversion and enrichment; user-centric and explainable machine learning
  • HAICu: Perspective-aware AI to make digital heritage collections more accessible.
  • InTaVia: making linked cultural heritage and biographical data usable for end-users
  • Pressing Matter: developing data models to support societal reconciliation with the colonial past and its afterlives.
  • Interconnect: machine learning on IoT and smart energy knowledge graphs 
  • Hybrid Intelligence: Augmenting Human Intellect
  • CARPA: responsible production using crowdsourcing in Africa

For other and older research projects, see the “research” tab.

Hybrid Intelligence for Digital Humanities

For deep and meaningful integration of AI tools in the Digital Humanities (DH) discipline, Hybrid Intelligence (HI) as a research paradigm. In DH research, the use of digital methods and specifically that of Artificial Intelligence is subject to a set of requirements and constraints. In our position paper, which we presented at the HHAI2024 conference in Malmö, we argue that these are well-supported by the capabilities and goals of HI. Our paper includes the identification of five such DH requirements: Successful AI systems need to be able to

  1. collaborate with the (human) scholar;
  2. support data criticism;
  3. support tool criticism;
  4. be aware of and cater to various perspectives and
  5. support distant and close reading.

In our paper, we take the CARE principles of Hybrid Intelligence (collaborative, adaptive, responsible and explainable) as theoretical framework and map these to the DH requirements. In this mapping, we include example research projects. We finally address how insights from DH can be applied to HI and discuss open challenges for the combination of the two disciplines.

You can find the paper here: Victor de Boer and Lise Stork. “Hybrid Intelligence for Digital Humanities.” HHAI 2024: Hybrid Human AI Systems for the Social Good. pp. 94-104. Frontiers in Artificial Intelligence and Applications. Vol. 386. IOS Press. DOI: 10.3233/FAIA240186 

…and our presentation below:

ESWC2024 Trip report

Last week, I joined the 21st edition of the Extended Semantic Web Conference (ESWC2024) held in Heraklion Crete. The 2004 edition was my first scientific conference ever, and I have been going to many editions ever since, so this feels a bit like my ‘home conference’. General Chair Albert Meroño and his team did a great job and it was overall a very nice conference. Paul Groth wrote a very nice trip report here, but I wanted to collect some thoughts and personal highlights in a short blogpost anyway.

The workshops

The workshops overall were very well organized and the ones I joined were well attended. This has been different in previous editions! The PhD symposium was very lively and I had nice chats with PhD candidates during the symposium lunch.

I joined part of the Genesy Workshop, where there were various talks about the potential of generative AI (a definite and unsurprising theme of the conference) and Semantic Web processes and technologies. The paper from Bouchouras et al: LLMs for the Engineering of a Parkinson Disease Monitoring and Alerting Ontology looked at using LLMs for Knowledge Engineering.

I was asked to give a keynote speech at the 2nd edition of the Workshop on Semantic Methods for Events and Stories (SEMMES), at ESWC2024. I talked about work on polyvocality in cultural heritage knowledge graphs. You can find my slides here.

There were very nice talks in the workshop, including the (best paper winning) Let the fallen voussoirs of Notre-Dame de Paris speak: Scientific Narration and 3D Visualization of Virtual Reconstruction Hypotheses and Reasoningfrom Guillem Anais, John Samuel, Gilles Gesquière, Livio De Luca and Violette Abergel that looked at a combination of modelling, argumentation and visualisation for architectural reconstruction.

I then joined the SemDH workshop on Semantic Digital Humanities and its panel discussion in the afternoon, which was really nice. One observation is that many of the talks in SEMMES could have been very interesting for SemDH as well and vice versa. Maybe merging the two would make sense in the future?

The Keynotes

There were three nice keynote speeches, each with its own angle and interesting points.

Elena Simperl gave a somewhat personal history of Knowledge Engineering and the role that machines and humans have in this process. This served as a prelude for the special track on this topic organized by her, Paul Groth and others. Elena called for tools and data for proper benchmarking, introduced the ProVe tool for provenance verification and explored what the roles are of AI (LLM) with respect to Knowledge engineers, domain experts and prompt engineers.

Katariina Kari reflected on 7 Years of Building Enterprise Knowledge Graphs at Zalando and Ikea. This was a very interesting talk about the impact of Knowledge Graphs in industry (she mentioned 7 figure sales increases) and about what works (SKOS, SHACL, OntoClean, Reasoning) and what doesnt work or isnt needed (OWL, Top level ontologies, big data).

Peter Clark of the Allen Institute for AI gave my favorite talk on Structured Reasoning with Language. He discussed their research on Knowledge Graphs and reasoning but also on Belief Graphs, that consist of atomic statements with textual entailment relations. LLMs can be used to ‘reason’ over such Belief Graphs for for example explaining decisions or search results.

Main Conference

The main conference had many interesting talks in all the tracks. The industry track and resource track were quite competitive this year. In terms of quality and number of submissions, they seemed equal to the research track to me this year. Also, the special track on LLMs for Knowledge Engineering was a great success.

I was a bit hesitant with respect to this clear theme of the conference, fearing lots of “we did LLM” talks, but that was not the case at all. Most papers showed genuine interest in the strength and weaknesses of various LLMs and how they can be used in several Semantic web tasks and pipelines. There was clearly a renewed interest in methodologies (Neon, Ontology Engineering 101, Methontology etc ) and how LLMs can fit here. There were for example several talks on how LLMS can be used to generate competency questions: (“Can LLMs Generate Competency Questions? [pdf] by Youssra Rebboud et al. and “The Role of Generative AI in Competency Question Retrofitting” [pdf] by Reham Alharbi et al.”).

Roderick presenting our Resource paper

Roderick van der Weerdt presented our -best Resource paper nominated- OfficeGraph: A Knowledge Graph of Office Building IoT Measurements [pdf]. Roderick did a great job presenting this nice result from the InterConnect project and it was well-received. The winner of the Resource track best paper award was however “PyGraft: Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips [pdf] by Nicolas Hubert et al (in my view deservedly so).

The in-use track also had very nice papers, including a quite holistic system to map the German Financial system with knowledge Graphs [pdf] by Markus Schröder et al. Oh, and I won an award 🙂

With more focus on applications, in use, resources, methods for knowledge engineering and of course LLMs, some topics seem to get less attention. Ironically, I missed both Semantics and the Web: Semantics and reasoning did not get a lot of attention in the talks I attended and most applications were about singular knowledge graphs, rather than distributed datasets. Maybe this means that we have solved most of the challenges around these two topics, but possibly it also means that these two elements are less important for actual implementation of Knowledge Graphs. It makes one wonder about the name of the conference though…

With a truly great demo and poster session (near the beach), a great dinner, really nice people and the wonderful surroundings, ESWC2024 was a great success. See you next year in Portoroz!?

SEMMES keynote: more than one side to the story

I was honored to be asked to give the keynote address for the 2nd edition of the Workshop on Semantic Methods for Events and Stories (SEMMES), at ESWC2024. I talked about work on polyvocality in cultural heritage knowledge graphs:

There is more than one side to every story. This common saying is not only true for works of fiction. In the global data space that is the Semantic Web, views and perspectives from different people, organizations and cultures should be available. I identify three challenges towards such a polyvocal Semantic Web. I will talk about ways to identify various voices, to model different perspectives and to make these perspectives available to end users. I will give examples from the cultural heritage domain, both in how semantic technologies can be of use to make available various perspectives on people, objects and events there but also how insights from the domain can help to shape the polyvocal Semantic Web.

You can find my slides below

HEDGE-IoT project kickoff

The HorizonEurope project HEDGE-IoT started January 2024. The 3.5 year project will build on existing technology to develop a Holistic Approach towards Empowerment of the DiGitalization of the Energy Ecosystem through adoption of IoT solutions. For VU, this project allows us to continue with the research and development initiated in the InterConnect project on data interoperability and explainable machine learning for smart buildings.

Researchers from the User-Centric Data Science group will participate in the project mostly in the context of the Dutch pilot, which will run in Arnhems Buiten, the former testing location of KEMA in the east of the Netherlands. In the pilot, we will collaborate closely with the other Dutch partners: TNO and Arnhems Buiten. At this site, an innovative business park is being realized that has its own power grid architecture, allowing for exchange of data and energy, opening the possibility for various AI-driven services for end-users.

VU will research a) how such data can be made interoperable and enriched with external information and knowledge and b) how such data can be made accessible to services and end-users through data dashboards that include explainable AI.

The image above shows the Arnhems Buiten buildings and the energy grid (source: Arnhems Buiten)

SUMAC keynote on Knowledge Graphs for Cultural Heritage and Digital Humanities

I was honored to be invited as a keynote speaker for the 5th edition of the SUMAC 2023 workshop (analySis, Understanding and proMotion of heritAge Contents) held in conjunction with ACM Multimedia in Ottawa, Canada. In the keynote, I sketched how Knowledge Graphs as a technology can be applied to the cultural heritage domain with examples of opportunities for new types of research in the field of digital humanities specifically with respect to analyses and visualisation of such (multi-modal) data.

In the talk, I discussed the promises and challenges of designing, constructing and enriching knowledge graphs for cultural heritage and digital humanities and how such integrated and multimodal data can be browsed, queried or analysed using state of the art machine learning.

I also addressed the issue of polyvocality, where multiple perspectives on (historical) information are to be represented. Especially in contexts such as that of (post-)colonial heritage, representing multiple voices is crucial.

You can find the complete abstract of my talk here and the (compressed) presentation slides itself below.

Best NIAA project award for VR project

The award for the Best Network Institute Academy Assistant project for this year goes to the project titled “Between Art, Data, and Meaning – How can Virtual Reality expand visitors’ perspectives on cultural objects with colonial background?” This project was carried out by VU students Isabel Franke and Stefania Conte, supervised by Thilo Hartmann and UCDS researchers Claudia Libbi and myself A project report and research paper is forthcoming but you can see the poster below.

HAICu project funded

It has pleased NWO to award the HAICu consortium through the National Research Agenda programme. In the HAICu project, AI researchers, Digital Humanities researchers, heritage professionals and engaged citizens work together on scientific breakthroughs to open, link and analyze large-scale multimodal digital heritage collections in context.

At VU, researchers from the User-Centric Data Science group will research how to create compelling narratives as a way to present multiple perspectives in multimodal data and how to provide transparency regarding the origin of data and the ways in which it was created. These questions will be addressed in collaboration with the Museum for World Cultures on how citizen-contributed descriptions can be combined with AI-generated labels into polyvocal narratives around objects related to the Dutch colonial past in Indonesia. 

A look back at the HHAI2023 Doctoral Consortium

The next generation of Hybrid-Human-AI researchers are here! As part of the second International Conference on Hybrid Human-Artificial Intelligence that was held in June in Munich, German, myself and Amy Loutfi of Örebro University organized a doctoral consortium. We put out a Call for Papers asking for early to late stage PhD candidates on the topic of Hybrid Human-AI research to submit their research proposals. We received 10 submissions and after a smooth peer-reviewing process we were able to invite 8 participants to the workshop in Munich.

A really nice room for a really nice symposium

The workshop started with a great keynote by Wendy Mackay of Inria, Paris-Saclay, and the Université Paris-Saclay. Wendy is a great authority on Human-Computer Interaction and the relation of that field to Artificial Intelligence and she gave a great talk about the importance of being sensitive to both ends of the AI-HCI scale.

Wendy Mackay

Next, the participants presented their research (plans) in 20 minute presentations, with plenty time for questions and discussions. We were joined by multiple members of the community who provided interesting comments and discussion items after the talks. Each presenter was paired with another participant who would lead the discussion following the presentation. All in all my impression was that this set-up lead to a fruitful and nice atmosphere for in-depth discussions about the research.

The participants of the Doctoral Consortium (from left to right: Anastasiya Zakreuskaya, Johanna Wolff, Dhivyabharathi Ramasamy, Cosimo Palma, Regina Duarte, Victor de Boer, Wendy Mackay, Azade Farshad, Amir Homayounirad, and Nicole Orzan).

Below you find some pictures of the day. The entire programme, including (most of) the papers can be found on the HHAI conference web page. The papers are published by IOS press in the proceedings of the conference: Augmenting Human Intellect.

On behalf of Amy as well: Thank you Azade Farshad, Johanna Wolff, Regina Duarte, Amir Homayounirad, Anastasiya Zakreuskaya, Nicole Orzan, Dhivyabharathi Ramasamy, Cosimo Palma and Wendy Mackay for making the DC work. Thanks as well to the wonderful organization team of HHAI2023 to make everything run so smooth!

DHBenelux2023 trip report

Two weeks ago, I visited the 2023 edition of the Digital Humanities Benelux conference in Brussels. It turned out this was the 10th anniversary edition, which goes to show that the Luxembourgian, Belgian and Dutch DH community is alive and kicking! This years gathering at the Royal Library of Belgium brought together humanities and computer science researchers and practitioners from the BeNeLux and beyond. Participants got to meet interesting tools, datasets and use cases, all the while critically assessing issues around perspective, representation and bias in each.

On the workshop day, I attended part of a tutorial organized by people from Göttingen University on the use of Linked Data for historical data. They presented a OpenRefine and WikiData-centric pipeline also including a batch wikidata editing tool https://quickstatements.toolforge.org/.

The second half of that day I attended a workshop on the Kiara tool presented by the people behind the Dharpa project. The basic premise of the tool makes a lot of sense: while many DH people use Python notebooks, it is not always clear what operations specific blocks of code map to. Reusing other peoples code becomes difficult and reusing existing data transformation code is not trivial. The solution of Kiara is an environment in which pre-defined well-documented modules are made available so that users can easily, find, select and combine modules for data transformation. For any DH infrastructure, one has to make decisions in what flexibility to offer users. My hunch is that this limited set of operations will not be enough for arbitrary DH-Data Science pipelines and that full flexibility (provided by python notebooks) will be needed. Nevertheless, we have to keep thinking on how infrastructures provide support for pipeline transparency, reusability and cater to less digital literate users.

On the first day of the main conference, Roeland Ordelman presented our own work on the CLARIAH MediaSuite: Towards ’Stakeholder Readiness’ in the CLARIAH Media Suite: Future-Proofing an Audio-Visual Research Infrastructure. This talk was preceded by a very interesting talk from Loren Verreyen who worked with a digital dataset of program guides (I know of similar datasets archived at Beeld and Geluid). Unfortunately, the much awaited third talk on the Distracted Boyfriend meme was cancelled.

Interesting talks on the first day included a presentation by Paavo Van der Eecken on capturing uncertainty in manually annotating images. This work “Thinking Outside of the Bounding Box: A Reconsideration of the Application of Computational Tools on Uncertain Humanities Data” and its main premise that disagreement is a valuable signal are reminiscent of the CrowdTruth approach.

A very nice duo-presentation was given by Daria Kondakova and Jakob Kohler on Messy Myths: Applying Linked Open Data to Study Mythological Narratives. This paper uses the theoretical framework of Zgol to back up the concept of hylemes to analyze mythological texts. Such hylemes are triple-like statements (subject-verb-object) that describe events in text. In the context of the project, these hylemes were then converted to full-blown Linked Open Data to allow for linking and comparing versions of myths. A research prototype can be found here https://dareiadareia-messy-myths.streamlit.app/ .

The GLOBALISE project was also present at the conference with presentation about the East-Asian shipping vocabulary and a poster.


At the poster session, I had the pleasure to present a poster from students of the VU DH minor and their supervisors on a tool to identify and link occupations in biographical descriptions.

VU DH Minor students’ poster https://twitter.com/victordeboer/status/1664199079251832832

The keynote by Patricia Murrieta-Flores from University of Lancaster introduced the concept of Cosmovision with respect to the archiving and enrichment of (colonial) heritage objects from meso-America. This concept of Cosmovision is very related to our polyvocality aims and the connection to computer vision is inspiring if not very challenging.

It is great to see that DHBenelux continues to be a very open and engaging community of humanities and computer science people, bringing together datasets, tools, challenges and methods.

Digital Humanities in Practice 22-23

As part of the VU Digital Humanities and Social Analytics Minor, this year we again had students do a capstone project in January to show off their DH and SA skills and knowledge. The students were matched with researchers and practitioners in the field to tackle a specific challenge in four weeks. We again thank these wonderful external supervisors for their effort. The students’ effort resulted in really impressive projects, showcased in the compilation video below.

In total, nine projects were executed and we list the titles and hosting organisations below.

Reception of Dutch films by critics and film fansUvA-CREATE
Rethinking provenance through networksLeiden University-Humanities
Gender and Facial RecognitionVU Amsterdam-Computer Science
Impact measurement in VRUTwente
Locating Press PhotosNIOD
Exploring Music Collections through data stories, exploratory interfaces and innovative applicationsNetherlands Institute for Sound and Vision
Predicting news headlines tests: What makes users clickVU-Social Science
Semi-Automatic Refinement of Historic OccupationsVU and IISG
1000 bombs and grenadesNetwerk Oorlogsbronnen

