I am an Associate Professor (UHD) at the User-Centric Data Science group at the Computer Science department of the Vrije Universiteit Amsterdam (VU) I am also a co-director of the Cultural AI Lab. In my research, I combine (Semantic) Web technologies with Human-Computer Interaction, Knowledge Representation and Information Extraction to tackle research challenges in various domains. These include Cultural Heritage, Digital Humanities and ICT for Development (ICT4D). I am currently involved in the following research projects:

For other and older research projects, see the “research” tab.

UI for Polyvocal Provenance Reporting

[This post is based on Bella Abelardo‘s Master Information Science thesis, “Designing a User Interface for Provenance Reporting of Objects with Colonial Heritage”]

Bella’s thesis addresses a critical challenge in cultural institutions: representing multiple perspectives for colonial heritage items. Current systems often create a “singular truth” in provenance reports, and unstructured data hinders discoverability.

Bella’s goal was to create a user interface to help provenance researchers holistically document the “polyvocal knowledge” often present in colonial heritage objects. Her research intended to explore improvements to the popular TMS content management system. To this end, she conducted interviews with various domain experts to gather design requirements and built a prototype, CultureSource.

two figures showing the lo-fi design of the improved user interface (imgs: B. Abelardo)

The evaluation showed CultureSource’s potential to help researchers document multiple perspectives. Bella’s research provides key requirements—standardization, multiple perspectives, usability, and data management—for future user interfaces aimed at documenting complex, multi-layered histories.

Share This:

Unlocking Smarter Customs: using Linked Data in Container Tracking

[This post is based on Auke Hofman’s Master Information Science thesis].
The Dutch Customs Administration handles an immense volume of data daily, primarily for risk assessment and critical safety, health, economy, and environment (VGEM) tasks. However, as Auke Hofman highlights in his Master Information Science thesis, “Opportunities and Challenges for Linked Data at Customs Administration of The Netherlands,” the current focus on declaration data, rather than real-time container events, creates a significant bottleneck, limiting transparency and effectiveness.

Auke’s research dives deep into how Customs can dramatically improve its risk assessment by shifting its attention to these crucial events. His main objective was to explore the opportunities and challenges of using Linked Data to enhance local container tracking. By integrating diverse data sources through Linked Data principles, he aimed to provide a more holistic view.

His methodology employed the Design Science Research Methodology (DSRM), iteratively developing and evaluating a Container Tracking System. He prioritized key requirements using the MoSCoW method, ensuring that the most pressing needs were addressed first. The evaluation itself was framed around user stories, offering practical use cases and demonstrating the system’s potential. Auke built a prototype featuring two knowledge graphs with visualizations, data analysis capabilities, and a notification system. One graph was manually created, while the other leveraged the FEDeRATED prototype, a system designed for real-time data exchange between stakeholders. The evaluation successfully demonstrated the prototype’s ability to retrieve data from the FEDeRATED knowledge graph and apply complex business rules. While some user interface features were deprioritized, the focus shifted to incorporating machine learning algorithms and providing architectural views, illustrating how this innovative prototype could be seamlessly integrated into Customs’ existing infrastructure.

Visualisation of the hand-constructed ontology

In conclusion, Auke Hofman’s thesis showcases, in a test environment, that Customs can significantly enrich container data by integrating it with other datasets using Linked Data principles. This not only allows for the application of sophisticated business rules but also paves the way for AI/ML-powered risk assessment capabilities such as anomaly detection and pattern extraction. His work emphasizes the transformative potential of Linked Data, while also acknowledging the essential need for manual effort in semantic data alignment before fully leveraging industry standards like FEDeRATED. This research marks a significant step towards a more intelligent and efficient Customs operation.

His thesis can be found below.

Share This:

Nice news from ESWC

That’s nice! Thanks for the acknowledgement chairs!

Share This:

Exploring AI with Communities in Kuching, Sarawak

From April 7th to 12th, 2025, I had the pleasure of visiting Kuching, Sarawak, alongside my colleague Lea Krause and Master’s student Eva Heemskerk from Vrije Universiteit Amsterdam. This visit, made possible through our long-standing collaboration with UNIMAS, was a vibrant mix of education, cultural exchange, and engaging discussions on the future of AI in society. dr. Cheah Wai Shiang, Associate Professor at UNIMAS was again our main point of contact for engaging with various communities.

We kicked off our trip at SMK Agama Tun Ahmad Zaidi, where Lea introduced Form 3 students to the fascinating world of Natural Language Processing and Large Language Models using Hugging Face tools. The students experimented with sentiment analysis and chatbots—many for the first time.

Lea Krause assisting students with LLMs (photo by Celine Haren and Cheah Wai Shiang)

Later that day, I delivered a lecture at UNIMAS on “Knowledge Graphs for Cultural AI,” emphasizing how cultural context can shape ethical and inclusive AI systems. During the trip, we also managed to do some data gathering about colonial cultural heritage, by presenting participants with objects from the Sarawak region and asking their perspective on objects. This is done in the context of our HAICu project.

At Sarawak Skills, I presented on embedding AI into technical and vocational education, followed by a thoughtful roundtable on how educators can guide students to use (Hybrid) AI critically.

The discussion panel at Sarawak Skills (photo: Celine Haren and Cheah Wai Shiang)

On our final day, we visited SK Muhibbah, a rural primary school, where we shared stories about the Netherlands and engaged with young students whose curiosity and enthusiasm reminded us of AI’s potential to reach and inspire across all communities.

The three of us with school teachers and ms. Nurfauza, dr. Cheah and Celine

That afternoon, we also met with the minister of Education and the director of the new Sarawak AI institute to explore possible cooperation between VU, UNIMAS and the institute.

This visit was a truly enriching experience, made possible by the ERASMUS+ funding scheme. We very much look forward to deepening our collaborations in the region.

Visiting the Muhibah village

Share This:

Unlocking the Future: how unified IoT Communication transforms Smart Device Data into valuable information – The OfficeGraph resource

As VU participants in the HEDGE-IoT project, we wrote a blog post detailing the OfficeGraph knowledge graph. You can ready it on the project website.

Share This:

Linked Open Data for Cultural Heritage in The Palgrave Encyclopedia of Cultural Heritage and Conflict

Together with Sarah Shoilee, I wrote an encyclopedic article summarizing the promises and challenges of Linked Open Data for Cultural Heritage. It has now been published as part of the The Palgrave Encyclopedia of Cultural Heritage and Conflict.

In the article, we describe the principles and technologies of Linked (Open) Data and how these have been applied in the heritage domain. We also include a section on LOD for Colonial Heritage, matching some of the work we are currently doing in the Pressing Matter and HAICu projects.

You can find the 7-page article here: https://link.springer.com/referenceworkentry/10.1007/978-3-030-61493-5_274-1

If you find it useful, you can cite the work as:

de Boer, V., Shoilee, S.B.A. (2025). Linked Open Data for Cultural Heritage. In: Saloul, I., Baillie, B. (eds) The Palgrave Encyclopedia of Cultural Heritage and Conflict. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-61493-5_274-1

Share This:

Amsterdam Museum Linked Data (2011) on Github

Since some of the landing pages and online triple stores for the Amsterdam Museum Linked Data set were no longer available, I copied the original data to a GitHub repository. https://github.com/biktorrr/AHM

This dataset is the result of a conversion of the Amsterdam Museum collection database, and structured vocabularies into Linked Data in the Europeana Data Model (EDM). More information can be found in our dataset paper: https://www.semantic-web-journal.net/content/amsterdam-museum-linked-open-data

In the past (2011), the data was available int the Europeana Semantic Layer, developed by the EuropeanaConnect project. At this point, the original data is provided as is through this git repository only. If you want to use it, load it into your favourite triple store. This data has been used as a benchmark dataset in multiple Linked Data research projects, including the kgbench repository benchmark http://kgbench.info/ and therefore I feel it makes sense to ensure the original RDF data can be accessed.

If you use this data, please cite it as:

Victor de Boer, Jan Wielemaker, Judith van Gent, Marijke Oosterbroek, Michiel Hildebrand, Antoine Isaac, Jacco van Ossenbruggen, Guus Schreiber (2013). Amsterdam Museum Linked Open Data. Semantic Web – Interoperability, Usability, Applicability4(3), 237–243. https://www.semantic-web-journal.net/content/amsterdam-museum-linked-open-data

Share This:

Information Extraction and Knowledge Graph Creation from Handwritten Historical Documents

[This post is based on the Bachelor Project AI of Annriya Binoy]

In her bachelor thesis “Evaluating Methodologies for Information Extraction and Knowledge Graph Creation from Handwritten Historical Documents”, Annriya Binoy provides a systematic evaluation of various methodologies for extracting and structuring information from historical handwritten documents, with the goal of identifying the most effective strategies.

As a case study, the research investigates several methods on scanned pages from the National Archive of the Netherlands, specifically the service records and pension registers of the late 18th century and early 19th century of the Koninklijk Nederlands Indisch Leger (KNIL), see the example below. The task was defined as that of extracting birth events.


Four approaches are analyzed:

  1. Handwritten Text Recognition (HTR) using the Transkribus tool
  2. a combination of Large Language Models (LLM) and Regular Expressions (Regex),
  3. Regex alone
  4. Fuzzy Search

HTR and the LLM-Regex combination show strong performance and adaptability with F1 measure values of 0.88. While Regex alone delivers high accuracy, it lacks comprehensiveness. Fuzzy Search proves effective in handling transcription errors common in historical documents, offering a balance between accuracy and robustness. This research offers initial but practical solutions for the digitization and semantic enrichment of historical archives, and it also addresses the challenges of preserving contextual integrity when constructing knowledge graphs from extracted data.

More details can be found in Annriya’s thesis below.

Share This:

Exploring Culinary Links with NLP and Knowledge Graphs

[This post is based on Nour al Assali‘s bachelor AI thesis]

Nour’s research explores the use of Natural Language Processing (NLP) and Knowledge Graphs to investigate the historical connections and cultural exchanges within global cuisines. The thesis “Flavours of History: Exploring Historical and Cultural Connections Through Ingredient Analysis Using NLP and Knowledge Graphs” describes a method for analyzing ingredient usage patterns across various cuisines by processing a dataset of recipes. Its goal is to trace the diffusion and integration of ingredients into different culinary traditions. The primary aim is to establish a digital framework for addressing questions related to culinary history and cultural interactions.

The methodology involves applying NLP to preprocess recipe data, focusing on extracting and normalizing ingredient names. The pipeline contains steps for stop word removal, token- and lemmatization, character replacements etc.

With the results, a Knowledge Graph is constructed to map relationships between ingredients, recipes, and cuisines. The approach also includes visualizing these connections, with an interactive map and other tools designed to provide insights into the data and answer key research questions. The figure below shows a visualisation of top ingredients per cuisine.

Case studies on ingredients such as pistachios, tomatoes, basil, olives, and cardamom illustrate distinct usage patterns and origins. The findings reveal that certain ingredients—like pistachios, basil, and tomatoes—associated with specific regions have gained widespread international popularity, while others, such as olives and cardamom, maintain strong ties to their places of origin. This research underscores the influence of historical trade routes and cultural exchanges on contemporary culinary practices and offers a digital foundation for future investigations into culinary history and food culture.

The code and dataset used in this research are available on GitHub: https://github.com/Nour-alasali/BPAI. The complete thesis can be found below.

Share This:

Generating Synthetic Time-Series Data For Smart-Building Knowledge Graphs Using Generative Adversarial Networks

[This blog post is based on Jesse van Haaster‘s bachelor thesis Artificial Intelligence at VU]

Knowledge Graphs represent data as triples, connecting related data points. This form of representation is widely used for various applications, such as querying information and drawing inferences from data. For fine-tuning such applications, actual KGs are needed. However, in certain domains like medical records or smart home devices, creating large-scale public knowledge graphs is challenging due to privacy concerns. To address this, generating synthetic knowledge graph data that mimics the original while preserving privacy is highly beneficial.

Jesse’s thesis explored the feasibility of generating meaningful synthetic time series data for knowledge graphs. He specifically does this in the smart building / IoT domain, building on our previous work on IoT knowledge graphs, including OfficeGraph.

To this end, two existing generative adversarial networks (GANs), CTGAN and TimeGAN, are evaluated for their ability to produce synthetic data that retains key characteristics of the original OfficeGraph dataset. Jesse compared among other things the differences in distributions of values for key features, such as humidity, temperature and co2 levels, seen below.

Key value distributions for CTGAN-generated data vs original data
Key value distributions for TimeGAN-generated data vs original data

The experiment results indicate that while both models capture some important features, neither is able to replicate all of the original data’s properties. Further research is needed to develop a solution that fully meets the requirements for generating meaningful synthetic knowledge graph data.

More details can be found in Jesse’s thesis (found below) and his Github repository https://github.com/JaManJesse/SyntheticKnowledgeGraphGeneration

Share This: