I am an associate professor (UHD) at the User-Centric Data Science group at the Computer Science department of the Vrije Universiteit Amsterdam (VU). I am also a senior research fellow at Netherlands Institute for Sound and Vision  and act as co-director of the Cultural AI Lab. In my research, I combine (Semantic) Web technologies with Human-Computer Interaction, Knowledge Representation and Information Extraction to tackle research challenges in various domains. These include Cultural Heritage, Digital Humanities and ICT for Development (ICT4D). I am currently involved in the following research projects:

  • InTaVia: making linked cultural heritage and biographical data usable for end-users
  • Pressing Matter: developing data models to support societal reconciliation with the colonial past and its afterlives.
  • Interconnect: machine learning on IoT and smart energy knowledge graphs 
  • CLARIAH: investigating how to use linked data for connecting Linked media
  • Hybrid Intelligence: Augmenting Human Intellect
  • CARPA: responsible production using crowdsourcing in Africa

For other and older research projects, see the “research” tab.

Digital Humanities in Practice 22-23

As part of the VU Digital Humanities and Social Analytics Minor, this year we again had students do a capstone project in January to show off their DH and SA skills and knowledge. The students were matched with researchers and practitioners in the field to tackle a specific challenge in four weeks. We again thank these wonderful external supervisors for their effort. The students’ effort resulted in really impressive projects, showcased in the compilation video below.

In total, nine projects were executed and we list the titles and hosting organisations below.

Reception of Dutch films by critics and film fansUvA-CREATE
Rethinking provenance through networksLeiden University-Humanities
Gender and Facial RecognitionVU Amsterdam-Computer Science
Impact measurement in VRUTwente
Locating Press PhotosNIOD
Exploring Music Collections through data stories, exploratory interfaces and innovative applicationsNetherlands Institute for Sound and Vision
Predicting news headlines tests: What makes users clickVU-Social Science
Semi-Automatic Refinement of Historic OccupationsVU and IISG
1000 bombs and grenadesNetwerk Oorlogsbronnen

Share This:

Explainable AI using visual Machine Learning

The InterConnect project gathers 50 European entities to develop and demonstrate advanced solutions for connecting and converging digital homes and buildings with the electricity sector. Machine Learning (ML) algorithms play a significant role in the InterConnect project. Most prominent are the services that do some kind of forecasting like predicting energy consumption for (Smart) devices and households in general. The SAREF ontology allows us to standardize input formats for common ML approaches and that explainability can be increased by selecting algorithms that inherently have these features (e.g. Decision Trees) and by using interactive web environments like Jupyter Notebooks a convenient solution for users is created where step by step the algorithmic procedures can be followed and visualized and forms an implementation example for explainable AI.

Read more, and watch our live demonstration video on the InterConnect project page.

Share This:

Simulating creativity in GANs with IoT

[This blog post is based on the Artificial Intelligence MSc thesis project from Fay Beening, supervised by myself and Joost de Boo, more information can be found on Fay’s website]

Recently, generative art has been one of the fields where AI, especially deep learning has caught the public eye. Algorithms and online tools such as Dall-E are able to produce astounding results based on large artistic datasets. One class of algorithms that has been at the root of this success is the Generative Adversarial Network (GAN), frequently used in online art-generating tools because of their ability to produce realistic artefacts.

but, is this “””real””” art? is this “””real””” creativity?

To address this, Fay investigated current theories on art and art education and found that these imply that true human creativity can be split into three types: 1) combinational, 2) explorative and 3) transformative creativity but that it also requires real-world experiences and interactions with people and the environment. Therefore, Fay in her thesis proposes to combine the GAN with an Internet of Things (IoT) setup to make it behave more creative.

Arduin-based prototype (image from Fay’s thesis)

She then designed a system that extends the original GAN with an interactive IoT system (implemented in an Arduino-based prototype) to simulate a more creative process. The prototype of the design showed a successful implementation of creative behaviour that can react to the environment and gradually change the direction of the generated images.

Images shown to the participant during the level of creativity task. Images 2 and 6 are creative GAN generated images. Images 1 and 5 are human-made art. Images 3 and 4 are online GAN generated art.

The generated art was evaluated based on their creativity by doing task-based interviews with domain experts. The results show that the the level to which the generated images are considered to be creative depends heavily on the participant’s view of creativity.

Share This:

Representing temporal vagueness on the
semantic web for historical datasets

[This post is based on the Master Information Sciences project of Fabian Witeczek and reuses text from his thesis. The research is part of VU’s effort in the Intavia project and was co-supervised by Go Sugimoto]

To represent properly temporal data on the Semantic Web, there is a need for an ontology to represent vague or imprecise dates. In the context of his research, Fabian Witeczek developed an ontology that can be used to represent various forms of such vague dates. The engineering process of the ontology started with a requirements analysis that contained the collection of data records from existing Digital Humanities Linked Data sets containing temporally vague dates: Biographynet and Europeana. The occurrences of vagueness were evaluated, and categories of vagueness were defined.

The categories were evaluated through a survey conducted with domain experts in the digital humanities domain. The experts were also questioned about their problems when working with temporally vague dates. The survey results confirmed the meaningfulness of the ontology requirements and the categories of vagueness which were: 1) Unknown deviation, 2) within a time span, 3) before or after a
specific date, 4) date options, and 5) complete vagueness.

Visualization of the vague date ontology

Based on the findings, the ontology was designed and implemented, scoping to year-granularity only. Lastly, the ontology was tested and evaluated by linking its instances to instances of a historical dataset. This research concludes that the presented vague date ontology offers a clear way to specify how vague dates are and in which regard they are vague. However, the ontology requires much effort to make it work in practice for researchers in digital humanities. This is due to precision and deviation values that need to be set for every record within the datasets.

Example SPARQL query using concepts from the vague dates ontology

More information can be found in the Master Thesis, linked below.

The ontology itself is found in Fabian’s github account

Share This:

Comparing Synthetic Data Generation Tools for IoT Data

[This post is based on the Bachelor Information Sciences project of Darin Pavlov and reuses text from his thesis. The research is part of VU’s effort in the InterConnect project and was supervised by Roderick van der Weerdt]

The concepts and technologies behind the Internet of Things (IoT) make it possible to establish networks of interconnected smart devices. Such networks can produce large volumes of data transmitted through sensors and actuators. Machine Learning can play a key role in processing this data towards several use cases in specific domains automotive, healthcare, manufacturing, etc. However, access to data for developing and testing Machine Learning is often hindered due to sensitivity of data, privacy issues etc.

One solution for this problem is to use synthetic data, resembling as much as possible real data. In his study, Darin Pavlov conducted a set of experiments, investigating the effectiveness of synthetic IoT data generation by three different tools:

This table shows the results of one of the two Machine Learning detection tests showing how difficult it is to differentiate the synthetic data from the real one with a Machine Learning model. For two datasets, the result is calculated as 1 minus the average ROC AUC score

Darin compared the tools on various distinguishability metrics. He observed that Mostly AI outperforms the other two generators, although Gretel.ai shows similar satisfactory results on the statistical metrics. The output of SDV on the other hand is poor on all metrics. Through this study we aim to encourage future research within the quickly developing area of synthetic data generation in the context of IoT technology.

More details can be found in Darin’s thesis.

Share This:

Keynote talk at Semantic AI workshop

Yesterday I had the honour and pleasure to give one of the keynote speeches at the 1st workshop of Semantic AI, co-located with SEMANTiCS2022 in Vienna. I my talk “Knowledge Graphs for impactful Data Science In the Digital Humanities and IOT domain“, I talked about challenges and lessons learned in various projects where 1) Knowledge Graphs 2) Machine Learning and 3) User Contexts interact in interesting ways. The slides for my talk can be found below.

Share This:

Co-director of Cultural AI lab

I am happy and proud I to announce that I will join Marieke van Erp and Laura Hollink as co-director of the Cultural AI lab. The lab brings together researchers from various research institutes and heritage organizations to investigate both how AI can be used to address various humanities and heritage challenges but also how we can use methods, theories and insights from the cultural domain to make better, fairer, more inclusive and diverse AI.

I am very excited about this and look forward to the wonderful research collaborations!

Share This:

News article on Knowledge Graphs for Maritime history

In the latest edition of the trade publication E-Data & Research, a nice article (in Dutch) about our research on knowledge graphs for maritime history is published. Thanks to Mathilde Jansen and of course my collaborators Stijn Schouten and Marieke van Erp! The image below shows the print article, the article can be found online here.

Share This:

Using the SAREF ontology for interoperability and machine learning in a Smart Home environment

Our abstract “Using the SAREF ontology for interoperability and machine learning in a Smart Home environment” was accepted for a presentation at the ICT Open conference in 6-7 April 2022 in Amsterdam. In the abstract, we outline the current and future research VU and TNO are conducting in the context of the InterConnect project, specifically around the construction of IOT knowledge graphs, machine learning and rule-based applications. We look forward to presenting it in April.

Share This:

A Polyvocal and Contextualised Semantic Web

[This post is the text of a 1-minute pitch at the IWDS symposium for our poster “A Polyvocal and Contextualised Semantic Web” which was published as the paper”Erp, Marieke van, and Victor de Boer. “A Polyvocal and Contextualised Semantic Web.” European Semantic Web Conference. Springer, Cham, 2021.”]

Knowledge graphs are a popular way of representing and sharing data, information and knowledge in many domains on the Semantic Web. These knowledge graphs however often represent singular -biased- views on the word, this can lead to unwanted bias in AI using this data. We therefore identify a need a more polyvocal Semantic Web.

So. How do we get there?

  1. We need perspective-aware methods for identifying existing polyvocality in datasets and for acquiring it from text or users.
  2. We need datamodels and patterns to represent polyvocal data information and knowledge.
  3. We need visualisations and tools to make the polyvocal knowledge accessible and usable for a wide variety of users, including domain experts or laypersons with varying backgrounds.

In the Cultural AI Lab, we investigate these challenges in several interrelated research projects, but we cannot do it, and should not do it alone and are looking for more voices to join us!

Share This: