As part of the VU Digital Humanities and Social Analytics Minor, this year we again had students do a capstone project in January to show off their DH and SA skills and knowledge. The students were matched with researchers and practitioners in the field to tackle a specific challenge in four weeks. We again thank these wonderful external supervisors for their effort. The students’ effort resulted in really impressive projects, showcased in the compilation video below.
In total, nine projects were executed and we list the titles and hosting organisations below.
Reception of Dutch films by critics and film fans
Rethinking provenance through networks
Gender and Facial Recognition
VU Amsterdam-Computer Science
Impact measurement in VR
Locating Press Photos
Exploring Music Collections through data stories, exploratory interfaces and innovative applications
Netherlands Institute for Sound and Vision
Predicting news headlines tests: What makes users click
The InterConnect project gathers 50 European entities to develop and demonstrate advanced solutions for connecting and converging digital homes and buildings with the electricity sector. Machine Learning (ML) algorithms play a significant role in the InterConnect project. Most prominent are the services that do some kind of forecasting like predicting energy consumption for (Smart) devices and households in general. The SAREF ontology allows us to standardize input formats for common ML approaches and that explainability can be increased by selecting algorithms that inherently have these features (e.g. Decision Trees) and by using interactive web environments like Jupyter Notebooks a convenient solution for users is created where step by step the algorithmic procedures can be followed and visualized and forms an implementation example for explainable AI.
Recently, generative art has been one of the fields where AI, especially deep learning has caught the public eye. Algorithms and online tools such as Dall-E are able to produce astounding results based on large artistic datasets. One class of algorithms that has been at the root of this success is the Generative Adversarial Network (GAN), frequently used in online art-generating tools because of their ability to produce realistic artefacts.
but, is this “””real””” art? is this “””real””” creativity?
To address this, Fay investigated current theories on art and art education and found that these imply that true human creativity can be split into three types: 1) combinational, 2) explorative and 3) transformative creativity but that it also requires real-world experiences and interactions with people and the environment. Therefore, Fay in her thesis proposes to combine the GAN with an Internet of Things (IoT) setup to make it behave more creative.
She then designed a system that extends the original GAN with an interactive IoT system (implemented in an Arduino-based prototype) to simulate a more creative process. The prototype of the design showed a successful implementation of creative behaviour that can react to the environment and gradually change the direction of the generated images.
The generated art was evaluated based on their creativity by doing task-based interviews with domain experts. The results show that the the level to which the generated images are considered to be creative depends heavily on the participant’s view of creativity.
[This post is based on the Master Information Sciences project of Fabian Witeczek and reuses text from his thesis. The research is part of VU’s effort in the Intavia project and was co-supervised by Go Sugimoto]
To represent properly temporal data on the Semantic Web, there is a need for an ontology to represent vague or imprecise dates. In the context of his research, Fabian Witeczek developed an ontology that can be used to represent various forms of such vague dates. The engineering process of the ontology started with a requirements analysis that contained the collection of data records from existing Digital Humanities Linked Data sets containing temporally vague dates: Biographynet and Europeana. The occurrences of vagueness were evaluated, and categories of vagueness were defined.
The categories were evaluated through a survey conducted with domain experts in the digital humanities domain. The experts were also questioned about their problems when working with temporally vague dates. The survey results confirmed the meaningfulness of the ontology requirements and the categories of vagueness which were: 1) Unknown deviation, 2) within a time span, 3) before or after a specific date, 4) date options, and 5) complete vagueness.
Based on the findings, the ontology was designed and implemented, scoping to year-granularity only. Lastly, the ontology was tested and evaluated by linking its instances to instances of a historical dataset. This research concludes that the presented vague date ontology offers a clear way to specify how vague dates are and in which regard they are vague. However, the ontology requires much effort to make it work in practice for researchers in digital humanities. This is due to precision and deviation values that need to be set for every record within the datasets.
More information can be found in the Master Thesis, linked below.
[This post is based on the Bachelor Information Sciences project of Darin Pavlov and reuses text from his thesis. The research is part of VU’s effort in the InterConnect project and was supervised by Roderick van der Weerdt]
The concepts and technologies behind the Internet of Things (IoT) make it possible to establish networks of interconnected smart devices. Such networks can produce large volumes of data transmitted through sensors and actuators. Machine Learning can play a key role in processing this data towards several use cases in specific domains automotive, healthcare, manufacturing, etc. However, access to data for developing and testing Machine Learning is often hindered due to sensitivity of data, privacy issues etc.
One solution for this problem is to use synthetic data, resembling as much as possible real data. In his study, Darin Pavlov conducted a set of experiments, investigating the effectiveness of synthetic IoT data generation by three different tools:
Darin compared the tools on various distinguishability metrics. He observed that Mostly AI outperforms the other two generators, although Gretel.ai shows similar satisfactory results on the statistical metrics. The output of SDV on the other hand is poor on all metrics. Through this study we aim to encourage future research within the quickly developing area of synthetic data generation in the context of IoT technology.
I am happy and proud I to announce that I will join Marieke van Erp and Laura Hollink as co-director of the Cultural AI lab. The lab brings together researchers from various research institutes and heritage organizations to investigate both how AI can be used to address various humanities and heritage challenges but also how we can use methods, theories and insights from the cultural domain to make better, fairer, more inclusive and diverse AI.
I am very excited about this and look forward to the wonderful research collaborations!
In the latest edition of the trade publication E-Data & Research, a nice article (in Dutch) about our research on knowledge graphs for maritime history is published. Thanks to Mathilde Jansen and of course my collaborators Stijn Schouten and Marieke van Erp! The image below shows the print article, the article can be found online here.
Our abstract “Using the SAREF ontology for interoperability and machine learning in a Smart Home environment” was accepted for a presentation at the ICT Open conference in 6-7 April 2022 in Amsterdam. In the abstract, we outline the current and future research VU and TNO are conducting in the context of the InterConnect project, specifically around the construction of IOT knowledge graphs, machine learning and rule-based applications. We look forward to presenting it in April.
[This post is the text of a 1-minute pitch at the IWDS symposium for our poster “A Polyvocal and Contextualised Semantic Web” which was published as the paper”Erp, Marieke van, and Victor de Boer. “A Polyvocal and Contextualised Semantic Web.” European Semantic Web Conference. Springer, Cham, 2021.”]
Knowledge graphs are a popular way of representing and sharing data, information and knowledge in many domains on the Semantic Web. These knowledge graphs however often represent singular -biased- views on the word, this can lead to unwanted bias in AI using this data. We therefore identify a need a more polyvocal Semantic Web.
So. How do we get there?
We need perspective-aware methods for identifying existing polyvocality in datasets and for acquiring it from text or users.
We need datamodels and patterns to represent polyvocal data information and knowledge.
We need visualisations and tools to make the polyvocal knowledge accessible and usable for a wide variety of users, including domain experts or laypersons with varying backgrounds.
In the Cultural AI Lab, we investigate these challenges in several interrelated research projects, but we cannot do it, and should not do it alone and are looking for more voices to join us!