I am an assistant professor (UD) at the User-Centric Data Science group at the Computer Science department of the Vrije Universiteit Amsterdam (VU). I am also a senior research fellow at Netherlands Institute for Sound and Vision. In my research, I combine (Semantic) Web technologies with Human-Computer Interaction, Knowledge Representation and Information Extraction to tackle research challenges in various domains. These include Cultural Heritage, Digital Humanities and ICT for Development (ICT4D). More information on these projects can be found on this site or through my CV .
It is so nice when two often very distinct research lines come together. In my case, Digital Humanities and ICT for Development rarely meet directly. But they sure did come together when Gossa Lô started with her Master AI thesis. Gossa, a long-time collaborator in the W4RA team, chose to focus on the opportunities for Machine Learning and Natural Language Processing for West-African folk tales. Her research involved constructing a corpus of West-African folk tales, performing various classification and text generation experiments and even included a field trip to Ghana to elicit information about folk tale structures. The work -done as part of an internship at Bolesian.ai– resulted in a beautiful Master AI thesis, which was awarded a very high grade.
As a follow up, we decided to try to rewrite the thesis into an article and submit it to a DH or ICT4D journal. This proved more difficult. Both DH and ICT4D are very multidisciplinary in nature and the combination of both proved a bit too much for many journals, with our article being either too technical, not technical enough, or too much out of scope.
But now, the article ” Exploring West African Folk Narrative Texts Using Machine Learning ” has been published (Open Access) in a special issue of Information on Digital Humanities!
The paper examines how machine learning (ML) and natural language processing (NLP) can be used to identify, analyze, and generate West African folk tales. Two corpora of West African and Western European folk tales were compiled and used in three experiments on cross-cultural folk tale analysis:
- In the text generation experiment, two types of deep learning text generators are built and trained on the West African corpus. We show that although the texts range between semantic and syntactic coherence, each of them contains West African features.
- The second experiment further examines the distinction between the West African and Western European folk tales by comparing the performance of an LSTM (acc. 0.79) with a BoW classifier (acc. 0.93), indicating that the two corpora can be clearly distinguished in terms of vocabulary. An interactive t-SNE visualization of a hybrid classifier (acc. 0.85) highlights the culture-specific words for both.
- The third experiment describes an ML analysis of narrative structures. Classifiers trained on parts of folk tales according to the three-act structure are quite capable of distinguishing these parts (acc. 0.78). Common n-grams extracted from these parts not only underline cross-cultural distinctions in narrative structures, but also show the overlap between verbal and written West African narratives.
All resources, including data and code are found at https://github.com/GossaLo/afr-neural-folktales
[This post is based on Enya Nieland‘s Msc Thesis “Generating Earcons from Knowledge Graphs” ]
Knowledge Graphs are becoming enormously popular, which means that users interacting with such complex networks are diversifying. This requires new and innovative ways of interacting. Several methods for visualizing, summarizing or exploring knowledge have been proposed and developed. In this student project we investigated the potential for interacting with knowledge graphs through a different modality: sound.
The research focused on the question how to generate meaningful sound or music from (knowledge) graphs. The generated sounds should provide users some insights into the properties of the network. Enya framed this challenge with the idea of “earcons” the auditory version of an icon.
Enya eventually developed a method that automatically produces these types of earcon for random knowledge graphs. Each earcon consist of three notes that differ in pitch and duration. As example, listen to the three earcons which are shown in the figure on the left.
The earcon parameters are derived from network metrics such as minimum, maximum and average indegree or outdegree. A tool with user interface allowed users to design the earcons based on these metrics.
The different variants were evaluated in an extensive user test of 30 respondents to find out which variants were the most informative. The results show that indeed, the individual elements of earcons can provide insights into these metrics, but that combining them is confusing to the listener. In this case, simpler is better.
Using this tool could be an addition to a tool such as LOD Laundromat to provide an instant insight into the complexity of KGs. It could additionally benefit people who are visually impaired and want to get an insight into the complexity of Knowledge Graphs
The Virtual Human Rights Lawyer is a joint project of Vrije Universiteit Amsterdam and the Netherlands Office of the Public International Law & Policy Group to help victims of serious human rights violations obtain access to justice at the international level. It enables users to find out how and where they can access existing global and regional human rights mechanisms in order to obtain some form of redress for the human rights violations they face or have faced.
On 1 October 2019, the Horizon2020 Interconnect project has started. The goal of this huge and ambitious project is to achieve a relevant milestone in the democratization of efficient energy management, through a flexible and interoperable ecosystem where distributed energy resources can be soundly integrated with effective benefits to end-users.
To this end, its 51 partners (!) will develop an interoperable IOT and smart-grid infrastructure, based on Semantic technologies, that includes various end-user services. The results will be validated using 7 pilots in EU member states, including one in the Netherlands with 200 appartments.
The role of VU is to develop in close collaboration with TNO extend and validating the SAREF ontology for IOT as well as and other relevant ontologies. VU will lead a task on developing Machine Learning solutions on Knowledge graphs and extend the solutions towards usable middle layers for User-centric ML services in the pilots, specifically in the aforementioned Dutch pilot, where VU will collaborate with TNO and VolkerWessel iCity and Hyrde.
Last week, I attended the SEMANTiCS2019 conference in Karlsruhe, Germany. This was the 15th edition of the conference that brings together Academia and Industry around the topic of Knowledge Engineering and Semantic Technologies and the good news was that this year’s conference was the biggest ever with 426 unique participants.
I was not able to join the workshop day or the dbpedia day on monday and thursday respectively, but was there for the main programme. The first day opened with a keynote from Oracle’s Michael J. Sullivan about Hybrid Knowledge Management Architecture and how Oracle is betting on Semantic Technology to work in combination with data lake architectures.
The 2nd keynote by Michel Dumontier of Maastricht University covered the principles of FAIR publishing of data and current avances in actually measuring FAIRness of datasets.
During one of the parallel sessions I attended the presentation of the eventual best paper winner Robin Keskisärkkä, Eva Blomqvist, Leili Lind, and Olaf Hartig. RSP-QL*: Enabling Statement-Level Annotations in RDF Streams . This was a very nice talk for a very nice and readable paper. The paper describes the combination of current RDF stream reasoning language RSP-QL and how it can be extended with the principles of RDF* that allow for statements about statements without traditional re-ification. The paper nicely mixes formal semantics, an elegant solution, working code, and a clear use case and evaluation. Congratulations to the winners.
Other winners included the best poster, which was won by our friends over at UvA.
The second day for me was taken up by the Special Track on Cultural Heritage and Digital Humanities, which consisted of research papers, use case presentations and posters that relate to the use of Semantic technologies in this domain. The program was quite nice, as the embedded tweets below hopefully show.
All in all, this years edition of SEMANTICS was a great one, I hope next year will be even more interesting (I will be general chairing it).
In the past year, together with Ingrid Vermeulen (VU Amsterdam) and Chris Dijkshoorn (Rijksmuseum Amsterdam), I had the pleasure to supervise two students from VU, Babette Claassen and Jeroen Borst, who participated in a Network Institute Academy Assistant project around art provenance and digital methods. The growing number of datasets and digital services around art-historical information presents new opportunities for conducting provenance research at scale. The Linked Art Provenance project investigated to what extent it is possible to trace provenance of art works using online data sources.
In the interdisciplinary project, Babette (Art Market Studies) and Jeroen (Artificial Intelligence) collaborated to create a workflow model, shown below, to integrate provenance information from various online sources such as the Getty provenance index. This included an investigation of potential usage of automatic information extraction of structured data of these online sources.
This model was validated through a case study, where we investigate whether we can capture information from selected sources about an auction (1804), during which the paintings from the former collection of Pieter Cornelis van Leyden (1732-1788) were dispersed. An example work , the Lacemaker, is shown above. Interviews with various art historian validated the produced workflow model.
The workflow model also provides a basic guideline for provenance research and together with the Linked Open Data process can possibly answer relevant research questions for studies in the history of collecting and the art market.
More information can be found in the Final report
Last week, while abroad, I received the very sad news that Maarten van Someren passed away. Maarten was one of the core teachers and AI researchers at Universiteit van Amsterdam for 36 years and for many people in AI in the Netherlands, he was a great teacher and mentor. For me personally, as my co-promotor he was one of the persons who shaped me into the AI researcher and teacher I am today.
Before Maarten asked me to do a PhD project under his and Bob Wielinga‘s supervision, I had known him for several years as UvA’s most prolific AI teacher. Maarten was involved in many courses, (many in Machine Learning) and in coordinating roles. I fondly look back at Maarten explaining Decision Trees, the A* algorithm and Vapnik–Chervonenkis dimensions. He was one of the staff members who really was a bridge between research and education and gave students the idea that we were actually part of the larger AI movement in the Netherlands.
After I finished my Master’s at UvA in 2003, I bumped into Maarten in the UvA elevator and he asked me whether I would be interested in doing a PhD project on Ontology Learning. Maarten explained that I would start out being supervised by both him and Bob Wielinga, but that after a while one of them would take the lead, depending on the direction the research took. In the years that followed, I tried to make sure that direction was such that both Bob and Maarten remained my supervisors as I felt I was learning so much from them. From Maarten I learned how to always stay critical about the assumptions in your research. Maarten for example kept insisting that I explain why we would need semantic technologies in the first place, rather than taking this as an assumption. Looking back, this has tremendously helped me sharpen my research and I am very thankful for his great help. I was happy to work further with him as a postdoc on the SiteGuide project before moving to VU.
In the last years, I met Maarten several times at shared UvA-VU meetings and I was looking forward to collaborations in AI education and research. I am very sad that I will no longer be able to collaborate with him. AI in the Netherlands has lost a very influential person in Maarten.
[This post describes the research of Michelle de Böck and is based on her MSc Information Sciences thesis.]
Digitization of cultural heritage content allows for the digital archiving, analysis and other processing of that content. The practice of scanning and transcribing books, newspapers and images, 3d-scanning artworks or digitizing music has opened up this heritage for example for digital humanities research or even for creative computing. However, with respect to the performing arts, including theater and more specifically dance, digitization is a serious research challenge. Several dance notation schemes exist, with the most established one being Labanotation, developed in 1920 by Rudolf von Laban. Labanotation uses a vertical staff notation to record human movement in time with various symbols for limbs, head movement, types and directions of movements.
Where for musical scores, good translations to digital formats exist (e.g. MIDI), for Lanabotation, these are lacking. While there are structured formats (LabanXML, MovementXML), the majority of content still only exists either in non-digitized form (on paper) or in scanned images. The research challenge of Michelle de Böck’s thesis therefore was to identify design features for a system capable of recognizing Labanotation from scanned images.
Michelle designed such a system and implemented this in MATLAB, focusing on a few movement symbols. Several approaches were developed and compared, including approaches using pre-trained neural networks for image recognition (AlexNet). This approach outperformed others, resulting in a classification accuracy of 78.4%. While we are still far from developing a full-fledged OCR system for Labanotation, this exploration has provided valuable insights into the feasibility and requirements of such a tool.
As part of the ESWC 2019 conference program, the ESWC PhD Symposium was held in wonderful Portoroz, Slovenia. The aim of the symposium, this year organized by Maria-Esther Vidal and myself, is to provide a forum for PhD students in the area of Semantic Web to present their work and discuss their projects with peers and mentors.
Even though this year, we received 5 submissions, all of the submissions were of high quality, so the full day symposium featured five talks by both early and middle/late stage PhD students. The draft papers can be found on the symposium web page and our opening slides can be found here. Students were mentored by amazing mentors to improve their papers and presentation slides. A big thank you to those mentors: Paul Groth, Rudi Studer, Maria Maleshkova, Philippe Cudre-Mauroux, and Andrea Giovanni Nuzzolese.
The program also featured a keynote by Stefan Schlobach, who talked about the road to a PhD “and back again”. He discussed a) setting realistic goals, b) finding your path towards those goals and c) being a responsible scientist and person after the goal is reached.
Students also presented their work through a poster session and the posters will also be found at the main conference poster session on tuesday 4 June.
On 23 May, as part of the VU ICT4D course, for the 6th time, W4RA and SIKS organized the annual symposium “Perspectives on ICT4D“. This year’s theme was how to tackle “Global Challenges” in a collaborative, trans-disciplinary way. Food Security is one of the Global Challenges Lia van Wesenbeeck – Director of the Amsterdam Centre for World Food Studies – gave a great presentation on “Tackling World Food Challenges”.
Our international speaker on the same topic, Mr. Seydou Tangara, coordinator of the AOPP, was unfortunately not able to join due to visa problems. He was replaced by prof. Hans Akkermans, who presented the Vienna manifesto on digital humanism and its relation to ICT4D.
Andre Baart from UvA talked about the CARPA project and challenges in developing applications for people in Mali while Jaap Gordijn discussed the need for business modelling for developing sustainable services, with interesting case studies from Sarawak, Malaysia.
The ICT4D students presented their voice application services during the coffee break. They demonstrated applications ranging from equipment-lending services to seed markets and weather services.