It has pleased NWO to award the HAICu consortium through the National Research Agenda programme. In the HAICu project, AI researchers, Digital Humanities researchers, heritage professionals and engaged citizens work together on scientific breakthroughs to open, link and analyze large-scale multimodal digital heritage collections in context.
At VU, researchers from the User-Centric Data Science group will research how to create compelling narratives as a way to present multiple perspectives in multimodal data and how to provide transparency regarding the origin of data and the ways in which it was created. These questions will be addressed in collaboration with the Museum for World Cultures on how citizen-contributed descriptions can be combined with AI-generated labels into polyvocal narratives around objects related to the Dutch colonial past in Indonesia.
The next generation of Hybrid-Human-AI researchers are here! As part of the second International Conference on Hybrid Human-Artificial Intelligence that was held in June in Munich, German, myself and Amy Loutfi of Örebro University organized a doctoral consortium. We put out a Call for Papers asking for early to late stage PhD candidates on the topic of Hybrid Human-AI research to submit their research proposals. We received 10 submissions and after a smooth peer-reviewing process we were able to invite 8 participants to the workshop in Munich.
The workshop started with a great keynote by Wendy Mackay of Inria, Paris-Saclay, and the Université Paris-Saclay. Wendy is a great authority on Human-Computer Interaction and the relation of that field to Artificial Intelligence and she gave a great talk about the importance of being sensitive to both ends of the AI-HCI scale.
Next, the participants presented their research (plans) in 20 minute presentations, with plenty time for questions and discussions. We were joined by multiple members of the community who provided interesting comments and discussion items after the talks. Each presenter was paired with another participant who would lead the discussion following the presentation. All in all my impression was that this set-up lead to a fruitful and nice atmosphere for in-depth discussions about the research.
Below you find some pictures of the day. The entire programme, including (most of) the papers can be found on the HHAI conference web page. The papers are published by IOS press in the proceedings of the conference: Augmenting Human Intellect.
On behalf of Amy as well: Thank you Azade Farshad, Johanna Wolff, Regina Duarte, Amir Homayounirad, Anastasiya Zakreuskaya, Nicole Orzan, Dhivyabharathi Ramasamy, Cosimo Palma and Wendy Mackay for making the DC work. Thanks as well to the wonderful organization team of HHAI2023 to make everything run so smooth!
Two weeks ago, I visited the 2023 edition of the Digital Humanities Benelux conference in Brussels. It turned out this was the 10th anniversary edition, which goes to show that the Luxembourgian, Belgian and Dutch DH community is alive and kicking! This years gathering at the Royal Library of Belgium brought together humanities and computer science researchers and practitioners from the BeNeLux and beyond. Participants got to meet interesting tools, datasets and use cases, all the while critically assessing issues around perspective, representation and bias in each.
On the workshop day, I attended part of a tutorial organized by people from Göttingen University on the use of Linked Data for historical data. They presented a OpenRefine and WikiData-centric pipeline also including a batch wikidata editing tool https://quickstatements.toolforge.org/.
The second half of that day I attended a workshop on the Kiara tool presented by the people behind the Dharpa project. The basic premise of the tool makes a lot of sense: while many DH people use Python notebooks, it is not always clear what operations specific blocks of code map to. Reusing other peoples code becomes difficult and reusing existing data transformation code is not trivial. The solution of Kiara is an environment in which pre-defined well-documented modules are made available so that users can easily, find, select and combine modules for data transformation. For any DH infrastructure, one has to make decisions in what flexibility to offer users. My hunch is that this limited set of operations will not be enough for arbitrary DH-Data Science pipelines and that full flexibility (provided by python notebooks) will be needed. Nevertheless, we have to keep thinking on how infrastructures provide support for pipeline transparency, reusability and cater to less digital literate users.
On the first day of the main conference, Roeland Ordelman presented our own work on the CLARIAH MediaSuite: Towards ’Stakeholder Readiness’ in the CLARIAH Media Suite: Future-Proofing an Audio-Visual Research Infrastructure. This talk was preceded by a very interesting talk from Loren Verreyen who worked with a digital dataset of program guides (I know of similar datasets archived at Beeld and Geluid). Unfortunately, the much awaited third talk on the Distracted Boyfriend meme was cancelled.
A very nice duo-presentation was given by Daria Kondakova and Jakob Kohler on Messy Myths: Applying Linked Open Data to Study Mythological Narratives. This paper uses the theoretical framework of Zgol to back up the concept of hylemes to analyze mythological texts. Such hylemes are triple-like statements (subject-verb-object) that describe events in text. In the context of the project, these hylemes were then converted to full-blown Linked Open Data to allow for linking and comparing versions of myths. A research prototype can be found here https://dareiadareia-messy-myths.streamlit.app/ .
The GLOBALISE project was also present at the conference with presentation about the East-Asian shipping vocabulary and a poster.
At the poster session, I had the pleasure to present a poster from students of the VU DH minor and their supervisors on a tool to identify and link occupations in biographical descriptions.
The keynote by Patricia Murrieta-Flores from University of Lancaster introduced the concept of Cosmovision with respect to the archiving and enrichment of (colonial) heritage objects from meso-America. This concept of Cosmovision is very related to our polyvocality aims and the connection to computer vision is inspiring if not very challenging.
It is great to see that DHBenelux continues to be a very open and engaging community of humanities and computer science people, bringing together datasets, tools, challenges and methods.
As part of the VU Digital Humanities and Social Analytics Minor, this year we again had students do a capstone project in January to show off their DH and SA skills and knowledge. The students were matched with researchers and practitioners in the field to tackle a specific challenge in four weeks. We again thank these wonderful external supervisors for their effort. The students’ effort resulted in really impressive projects, showcased in the compilation video below.
In total, nine projects were executed and we list the titles and hosting organisations below.
Reception of Dutch films by critics and film fans
UvA-CREATE
Rethinking provenance through networks
Leiden University-Humanities
Gender and Facial Recognition
VU Amsterdam-Computer Science
Impact measurement in VR
UTwente
Locating Press Photos
NIOD
Exploring Music Collections through data stories, exploratory interfaces and innovative applications
Netherlands Institute for Sound and Vision
Predicting news headlines tests: What makes users click
The InterConnect project gathers 50 European entities to develop and demonstrate advanced solutions for connecting and converging digital homes and buildings with the electricity sector. Machine Learning (ML) algorithms play a significant role in the InterConnect project. Most prominent are the services that do some kind of forecasting like predicting energy consumption for (Smart) devices and households in general. The SAREF ontology allows us to standardize input formats for common ML approaches and that explainability can be increased by selecting algorithms that inherently have these features (e.g. Decision Trees) and by using interactive web environments like Jupyter Notebooks a convenient solution for users is created where step by step the algorithmic procedures can be followed and visualized and forms an implementation example for explainable AI.
[This blog post is based on the Artificial Intelligence MSc thesis project from Fay Beening, supervised by myself and Joost de Boo, more information can be found on Fay’s website]
Recently, generative art has been one of the fields where AI, especially deep learning has caught the public eye. Algorithms and online tools such as Dall-E are able to produce astounding results based on large artistic datasets. One class of algorithms that has been at the root of this success is the Generative Adversarial Network (GAN), frequently used in online art-generating tools because of their ability to produce realistic artefacts.
but, is this “””real””” art? is this “””real””” creativity?
To address this, Fay investigated current theories on art and art education and found that these imply that true human creativity can be split into three types: 1) combinational, 2) explorative and 3) transformative creativity but that it also requires real-world experiences and interactions with people and the environment. Therefore, Fay in her thesis proposes to combine the GAN with an Internet of Things (IoT) setup to make it behave more creative.
Arduin-based prototype (image from Fay’s thesis)
She then designed a system that extends the original GAN with an interactive IoT system (implemented in an Arduino-based prototype) to simulate a more creative process. The prototype of the design showed a successful implementation of creative behaviour that can react to the environment and gradually change the direction of the generated images.
Images shown to the participant during the level of creativity task. Images 2 and 6 are creative GAN generated images. Images 1 and 5 are human-made art. Images 3 and 4 are online GAN generated art.
The generated art was evaluated based on their creativity by doing task-based interviews with domain experts. The results show that the the level to which the generated images are considered to be creative depends heavily on the participant’s view of creativity.
[This post is based on the Master Information Sciences project of Fabian Witeczek and reuses text from his thesis. The research is part of VU’s effort in the Intavia project and was co-supervised by Go Sugimoto]
To represent properly temporal data on the Semantic Web, there is a need for an ontology to represent vague or imprecise dates. In the context of his research, Fabian Witeczek developed an ontology that can be used to represent various forms of such vague dates. The engineering process of the ontology started with a requirements analysis that contained the collection of data records from existing Digital Humanities Linked Data sets containing temporally vague dates: Biographynet and Europeana. The occurrences of vagueness were evaluated, and categories of vagueness were defined.
The categories were evaluated through a survey conducted with domain experts in the digital humanities domain. The experts were also questioned about their problems when working with temporally vague dates. The survey results confirmed the meaningfulness of the ontology requirements and the categories of vagueness which were: 1) Unknown deviation, 2) within a time span, 3) before or after a specific date, 4) date options, and 5) complete vagueness.
Visualization of the vague date ontology
Based on the findings, the ontology was designed and implemented, scoping to year-granularity only. Lastly, the ontology was tested and evaluated by linking its instances to instances of a historical dataset. This research concludes that the presented vague date ontology offers a clear way to specify how vague dates are and in which regard they are vague. However, the ontology requires much effort to make it work in practice for researchers in digital humanities. This is due to precision and deviation values that need to be set for every record within the datasets.
Example SPARQL query using concepts from the vague dates ontology
More information can be found in the Master Thesis, linked below.
[This post is based on the Bachelor Information Sciences project of Darin Pavlov and reuses text from his thesis. The research is part of VU’s effort in the InterConnect project and was supervised by Roderick van der Weerdt]
The concepts and technologies behind the Internet of Things (IoT) make it possible to establish networks of interconnected smart devices. Such networks can produce large volumes of data transmitted through sensors and actuators. Machine Learning can play a key role in processing this data towards several use cases in specific domains automotive, healthcare, manufacturing, etc. However, access to data for developing and testing Machine Learning is often hindered due to sensitivity of data, privacy issues etc.
One solution for this problem is to use synthetic data, resembling as much as possible real data. In his study, Darin Pavlov conducted a set of experiments, investigating the effectiveness of synthetic IoT data generation by three different tools:
This table shows the results of one of the two Machine Learning detection tests showing how difficult it is to differentiate the synthetic data from the real one with a Machine Learning model. For two datasets, the result is calculated as 1 minus the average ROC AUC score
Darin compared the tools on various distinguishability metrics. He observed that Mostly AI outperforms the other two generators, although Gretel.ai shows similar satisfactory results on the statistical metrics. The output of SDV on the other hand is poor on all metrics. Through this study we aim to encourage future research within the quickly developing area of synthetic data generation in the context of IoT technology.
Yesterday I had the honour and pleasure to give one of the keynote speeches at the 1st workshop of Semantic AI, co-located with SEMANTiCS2022 in Vienna. I my talk “Knowledge Graphs for impactful Data Science In the Digital Humanities and IOT domain“, I talked about challenges and lessons learned in various projects where 1) Knowledge Graphs 2) Machine Learning and 3) User Contexts interact in interesting ways. The slides for my talk can be found below.
I am happy and proud I to announce that I will join Marieke van Erp and Laura Hollink as co-director of the Cultural AI lab. The lab brings together researchers from various research institutes and heritage organizations to investigate both how AI can be used to address various humanities and heritage challenges but also how we can use methods, theories and insights from the cultural domain to make better, fairer, more inclusive and diverse AI.
I am very excited about this and look forward to the wonderful research collaborations!