I am an assistant professor (UD) at the User-Centric Data Science group at the Computer Science department of the Vrije Universiteit Amsterdam (VU). I am also a senior research fellow at Netherlands Institute for Sound and Vision. In my research, I combine (Semantic) Web technologies with Human-Computer Interaction, Knowledge Representation and Information Extraction to tackle research challenges in various domains. These include Cultural Heritage, Digital Humanities and ICT for Development (ICT4D). I am currently involved in the following research projects:
InTaVia: making linked cultural heritage and biographical data usable for end-users
PressingMatter: developing data models to support societal reconciliation with the colonial past and its afterlives.
Interconnect: machine learning on IoT and smart energy knowledge graphs
CLARIAH: investigating how to use linked data for connecting Linked media
This year, we organized the SEMANTiCS2021 conference in Amsterdam. Due to the ongoing COVID-19 retrictions, we opted for a hybrid conference. And hybrid it was! With 200 onsite and 264 online tickets sold this was as much a mix between online and onsite as it was a mix between industry and academia. The research track consisted of 19 papers, and the industry track was made up of 24 presentations. With four wonderful keynote speakers, a poster session and various special tracks and workshops, this was quite a full programme!
As far as I am concerned, a true success! See my Twitter-generate impression below.
[This post presents research done by Daan Raven in the context of his Master Project Information Sciences]
There is a long tradition in the Cultural Heritage domain of using structured, machine-interoperable knowledge using semantic methods and tools. However, research into developing and using ontologies specific to works of art of individual artists is persistently lacking. Such knowledge graphs would improve access to heritage information by making reasoning and inferencing possible. In his research, Daan Raven developed and applied a re-usable method, building on the ‘Methontology’ method for ontology development. We describe the steps of specification, conceptualization, integration, implementation and evaluation in a case study concerning ceramic-glass sculptor Barbara Nanning.
This work was presented at Digital Humanities Benelux 2021. The abstract and presentation as well as other digital resources related to the project can be found below:
As supervisor of many MSc and BSc theses, I find myself giving writing tips and guidelines quite often. Inspired by Jan van Gemert’s guidelines, I compiled my own document with tips and guidelines for writing an CS/AI/IS bachelor or master thesis. These are things that I personally care about and other lecturers might have different ideas. Also, this is by no means a complete list and I will use it as a living document. You can find it here: https://tinyurl.com/victorthesiswriting
This Monday, Accenture and the UN organized the Knowledge Graphs for Social Good workshop, part of the Knowledge Graph conference. My submission to this workshop “Knowledge Graphs for the Rural Poor” was about ICT for Development research previously done within the FP7 VOICES in collaboration with students. In the contribution, we argue that there are three challenges to make Knowledge Graphs relevant and accessible for the Rural Poor.
Make KGs usable in low-resource, low-connectivity contexts
Make KGs accessible for users with various (cultural) backgrounds and levels of literacy;
Develop knowledge sharing cases and applications relevant for the rural poor
I contributed to a (Dutch) article written in response to the new European AI guidelines. The article, written by members of the Cultural AI lab, argues that we need both cultural data and cultural understanding to build truly responsible AI. It has been published in the Public Spaces blog.
It was great to see that one of this year’s Digital Humanities in Practice projects lead to a conversation between the students in that project Helene Ayar and Edith Brooks, their external supervisors Willemien Sanders (UU) and Mari Wigham (NISV) and an advisor for another project André Krouwel (VU). That conversation resulted in original research and CLARIAH MediaSuite data story “‘Who’s speaking?’- Politicians and parties in the media during the Dutch election campaign 2021” where the content of news programmes was analysed for politicians’ names, their gender and party affiliation.
This year’s edition of the VU Digital Humanities in Practice course was of course a virtual one. In this course, students of the Minor Digital Humanities and Social Analytics put everything that they have learned in that minor in practice, tackling a real-world DH or Social Analytics challenge. As in previous years, this year we had wonderful projects provided and supervised by colleagues from various institutes. We had projects related to the Odissei and Clariah research infrastructures, projects supervised by KNAW-HUC, Stadsarchief Amsterdam, projects from Utrecht University, UvA, Leiden University and our own Vrije Universiteit. We had a project related to Kieskompas and even a project supervised by researchers from Bologna University. A wide variety of challenges, datasets and domains! We would like to thank all the supervisors and the students on making this course a success.
The compilation video below shows all the projects’ results. It combines 2-minute videos produced by each of the 10 student groups.
After a very nice virtual poster session, everybody got to vote on the Best Poster Award. The winners are group 3, whose video you can also see in the video above. Below we list all the projects and the external supervisors.
Extracting named entities from Social Science data.
At this year’s Metadata and Semantics Research Conference (MTSR2020), I just presented our work on Linked Data Scopes: an ontology to describe data manipulation steps. The paper was co-authored with Ivette Bonestroo, one of our Digital Humanities minor students as well as Rik Hoekstra and Marijn Koolen from KNAW-HUC. The paper builds on earlier work by the latter two co-authors and was conducted in the context of the CLARIAH-plus project.
With the rise of data driven methods in the humanities, it becomes necessary to develop reusable and consistent methodological patterns for dealing with the various data manipulation steps. This increases transparency, replicability of the research. Data scopes present a qualitative framework for such methodological steps. In this work we present a Linked Data model to represent and share Data Scopes. The model consists of a central Data scope element, with linked elements for data Selection, Linking, Modeling, Normalisation and Classification. We validate the model by representing the data scope for 24 articles from two domains: Humanities and Social Science.
[This blog post is based on the Master thesis Information Sciences of Bram Schmidt, conducted at the KNAW Humanities cluster and IISG. It reuses text from his thesis]
Place names (toponyms) are very ambiguous and may change over time. This makes it hard to link mentions of places to their corresponding modern entity and coordinates, especially in a historical context. We focus on historical Toponym Disambiguation approach of entity linking based on identified context toponyms.
The thesis specifically looks at the American Gazetteer. These texts contain fundamental information about major places in its vicinity. By identifying and exploiting these tags, we aim to estimate the most likely position for the historical entry and accordingly link it to its corresponding contemporary counterpart.
Therefore, in this case study, Bram Schmidt examined the toponym recognition performance of state-of-the-art Named Entity Recognition (NER) tools spaCy and Stanza concerning historical texts and we tested two new heuristics to facilitate efficient entity linking to the geographical database of GeoNames.
We tested our method against a subset of manually annotated records of the gazetteer. Results show that both NER tools do function insufficiently in their task to automatically identify relevant toponyms out of the free text of a historical lemma. However, exploiting correctly identified context toponyms by calculating the minimal distance among them proves to be successful and combining the approaches into one algorithm shows improved recall score.