I am an Associate Professor (UHD) at the User-Centric Data Science group at the Computer Science department of the Vrije Universiteit Amsterdam (VU) I am also a co-director of the Cultural AI Lab. In my research, I combine (Semantic) Web technologies with Human-Computer Interaction, Knowledge Representation and Information Extraction to tackle research challenges in various domains. These include Cultural Heritage, Digital Humanities and ICT for Development (ICT4D). I am currently involved in the following research projects:
HEDGE-IoT: IoT data conversion and enrichment; user-centric and explainable machine learning
HAICu: Perspective-aware AI to make digital heritage collections more accessible.
InTaVia: making linked cultural heritage and biographical data usable for end-users
PressingMatter: developing data models to support societal reconciliation with the colonial past and its afterlives.
Interconnect: machine learning on IoT and smart energy knowledge graphs
The HorizonEurope project HEDGE-IoT started January 2024. The 3.5 year project will build on existing technology to develop a Holistic Approach towards Empowerment of the DiGitalization of the Energy Ecosystem through adoption of IoT solutions. For VU, this project allows us to continue with the research and development initiated in the InterConnect project on data interoperability and explainable machine learning for smart buildings.
Researchers from the User-Centric Data Science group will participate in the project mostly in the context of the Dutch pilot, which will run in Arnhems Buiten, the former testing location of KEMA in the east of the Netherlands. In the pilot, we will collaborate closely with the other Dutch partners: TNO and Arnhems Buiten. At this site, an innovative business park is being realized that has its own power grid architecture, allowing for exchange of data and energy, opening the possibility for various AI-driven services for end-users.
VU will research a) how such data can be made interoperable and enriched with external information and knowledge and b) how such data can be made accessible to services and end-users through data dashboards that include explainable AI.
The image above shows the Arnhems Buiten buildings and the energy grid (source: Arnhems Buiten)
I was honored to be invited as a keynote speaker for the 5th edition of the SUMAC 2023 workshop (analySis, Understanding and proMotion of heritAge Contents) held in conjunction with ACM Multimedia in Ottawa, Canada. In the keynote, I sketched how Knowledge Graphs as a technology can be applied to the cultural heritage domain with examples of opportunities for new types of research in the field of digital humanities specifically with respect to analyses and visualisation of such (multi-modal) data.
In the talk, I discussed the promises and challenges of designing, constructing and enriching knowledge graphs for cultural heritage and digital humanities and how such integrated and multimodal data can be browsed, queried or analysed using state of the art machine learning.
I also addressed the issue of polyvocality, where multiple perspectives on (historical) information are to be represented. Especially in contexts such as that of (post-)colonial heritage, representing multiple voices is crucial.
The award for the Best Network Institute Academy Assistant project for this year goes to the project titled “Between Art, Data, and Meaning – How can Virtual Reality expand visitors’ perspectives on cultural objects with colonial background?” This project was carried out by VU students Isabel Franke and Stefania Conte, supervised by Thilo Hartmann and UCDS researchers Claudia Libbi and myself A project report and research paper is forthcoming but you can see the poster below.
At VU, researchers from the User-Centric Data Science group will research how to create compelling narratives as a way to present multiple perspectives in multimodal data and how to provide transparency regarding the origin of data and the ways in which it was created. These questions will be addressed in collaboration with the Museum for World Cultures on how citizen-contributed descriptions can be combined with AI-generated labels into polyvocal narratives around objects related to the Dutch colonial past in Indonesia.
The next generation of Hybrid-Human-AI researchers are here! As part of the second International Conference on Hybrid Human-Artificial Intelligence that was held in June in Munich, German, myself and Amy Loutfi of Örebro University organized a doctoral consortium. We put out a Call for Papers asking for early to late stage PhD candidates on the topic of Hybrid Human-AI research to submit their research proposals. We received 10 submissions and after a smooth peer-reviewing process we were able to invite 8 participants to the workshop in Munich.
The workshop started with a great keynote by Wendy Mackay of Inria, Paris-Saclay, and the Université Paris-Saclay. Wendy is a great authority on Human-Computer Interaction and the relation of that field to Artificial Intelligence and she gave a great talk about the importance of being sensitive to both ends of the AI-HCI scale.
Next, the participants presented their research (plans) in 20 minute presentations, with plenty time for questions and discussions. We were joined by multiple members of the community who provided interesting comments and discussion items after the talks. Each presenter was paired with another participant who would lead the discussion following the presentation. All in all my impression was that this set-up lead to a fruitful and nice atmosphere for in-depth discussions about the research.
On behalf of Amy as well: Thank you Azade Farshad, Johanna Wolff, Regina Duarte, Amir Homayounirad, Anastasiya Zakreuskaya, Nicole Orzan, Dhivyabharathi Ramasamy, Cosimo Palma and Wendy Mackay for making the DC work. Thanks as well to the wonderful organization team of HHAI2023 to make everything run so smooth!
Two weeks ago, I visited the 2023 edition of the Digital Humanities Benelux conference in Brussels. It turned out this was the 10th anniversary edition, which goes to show that the Luxembourgian, Belgian and Dutch DH community is alive and kicking! This years gathering at the Royal Library of Belgium brought together humanities and computer science researchers and practitioners from the BeNeLux and beyond. Participants got to meet interesting tools, datasets and use cases, all the while critically assessing issues around perspective, representation and bias in each.
On the workshop day, I attended part of a tutorial organized by people from Göttingen University on the use of Linked Data for historical data. They presented a OpenRefine and WikiData-centric pipeline also including a batch wikidata editing tool https://quickstatements.toolforge.org/.
The second half of that day I attended a workshop on the Kiara tool presented by the people behind the Dharpa project. The basic premise of the tool makes a lot of sense: while many DH people use Python notebooks, it is not always clear what operations specific blocks of code map to. Reusing other peoples code becomes difficult and reusing existing data transformation code is not trivial. The solution of Kiara is an environment in which pre-defined well-documented modules are made available so that users can easily, find, select and combine modules for data transformation. For any DH infrastructure, one has to make decisions in what flexibility to offer users. My hunch is that this limited set of operations will not be enough for arbitrary DH-Data Science pipelines and that full flexibility (provided by python notebooks) will be needed. Nevertheless, we have to keep thinking on how infrastructures provide support for pipeline transparency, reusability and cater to less digital literate users.
A very nice duo-presentation was given by Daria Kondakova and Jakob Kohler on Messy Myths: Applying Linked Open Data to Study Mythological Narratives. This paper uses the theoretical framework of Zgol to back up the concept of hylemes to analyze mythological texts. Such hylemes are triple-like statements (subject-verb-object) that describe events in text. In the context of the project, these hylemes were then converted to full-blown Linked Open Data to allow for linking and comparing versions of myths. A research prototype can be found here https://dareiadareia-messy-myths.streamlit.app/ .
The keynote by Patricia Murrieta-Flores from University of Lancaster introduced the concept of Cosmovision with respect to the archiving and enrichment of (colonial) heritage objects from meso-America. This concept of Cosmovision is very related to our polyvocality aims and the connection to computer vision is inspiring if not very challenging.
As part of the VU Digital Humanities and Social Analytics Minor, this year we again had students do a capstone project in January to show off their DH and SA skills and knowledge. The students were matched with researchers and practitioners in the field to tackle a specific challenge in four weeks. We again thank these wonderful external supervisors for their effort. The students’ effort resulted in really impressive projects, showcased in the compilation video below.
In total, nine projects were executed and we list the titles and hosting organisations below.
Reception of Dutch films by critics and film fans
Rethinking provenance through networks
Gender and Facial Recognition
VU Amsterdam-Computer Science
Impact measurement in VR
Locating Press Photos
Exploring Music Collections through data stories, exploratory interfaces and innovative applications
Netherlands Institute for Sound and Vision
Predicting news headlines tests: What makes users click
The InterConnect project gathers 50 European entities to develop and demonstrate advanced solutions for connecting and converging digital homes and buildings with the electricity sector. Machine Learning (ML) algorithms play a significant role in the InterConnect project. Most prominent are the services that do some kind of forecasting like predicting energy consumption for (Smart) devices and households in general. The SAREF ontology allows us to standardize input formats for common ML approaches and that explainability can be increased by selecting algorithms that inherently have these features (e.g. Decision Trees) and by using interactive web environments like Jupyter Notebooks a convenient solution for users is created where step by step the algorithmic procedures can be followed and visualized and forms an implementation example for explainable AI.
Recently, generative art has been one of the fields where AI, especially deep learning has caught the public eye. Algorithms and online tools such as Dall-E are able to produce astounding results based on large artistic datasets. One class of algorithms that has been at the root of this success is the Generative Adversarial Network (GAN), frequently used in online art-generating tools because of their ability to produce realistic artefacts.
but, is this “””real””” art? is this “””real””” creativity?
To address this, Fay investigated current theories on art and art education and found that these imply that true human creativity can be split into three types: 1) combinational, 2) explorative and 3) transformative creativity but that it also requires real-world experiences and interactions with people and the environment. Therefore, Fay in her thesis proposes to combine the GAN with an Internet of Things (IoT) setup to make it behave more creative.
Arduin-based prototype (image from Fay’s thesis)
She then designed a system that extends the original GAN with an interactive IoT system (implemented in an Arduino-based prototype) to simulate a more creative process. The prototype of the design showed a successful implementation of creative behaviour that can react to the environment and gradually change the direction of the generated images.
Images shown to the participant during the level of creativity task. Images 2 and 6 are creative GAN generated images. Images 1 and 5 are human-made art. Images 3 and 4 are online GAN generated art.
The generated art was evaluated based on their creativity by doing task-based interviews with domain experts. The results show that the the level to which the generated images are considered to be creative depends heavily on the participant’s view of creativity.
[This post is based on the Master Information Sciences project of Fabian Witeczek and reuses text from his thesis. The research is part of VU’s effort in the Intavia project and was co-supervised by Go Sugimoto]
To represent properly temporal data on the Semantic Web, there is a need for an ontology to represent vague or imprecise dates. In the context of his research, Fabian Witeczek developed an ontology that can be used to represent various forms of such vague dates. The engineering process of the ontology started with a requirements analysis that contained the collection of data records from existing Digital Humanities Linked Data sets containing temporally vague dates: Biographynet and Europeana. The occurrences of vagueness were evaluated, and categories of vagueness were defined.
The categories were evaluated through a survey conducted with domain experts in the digital humanities domain. The experts were also questioned about their problems when working with temporally vague dates. The survey results confirmed the meaningfulness of the ontology requirements and the categories of vagueness which were: 1) Unknown deviation, 2) within a time span, 3) before or after a specific date, 4) date options, and 5) complete vagueness.
Visualization of the vague date ontology
Based on the findings, the ontology was designed and implemented, scoping to year-granularity only. Lastly, the ontology was tested and evaluated by linking its instances to instances of a historical dataset. This research concludes that the presented vague date ontology offers a clear way to specify how vague dates are and in which regard they are vague. However, the ontology requires much effort to make it work in practice for researchers in digital humanities. This is due to precision and deviation values that need to be set for every record within the datasets.
Example SPARQL query using concepts from the vague dates ontology
More information can be found in the Master Thesis, linked below.