I am an assistant professor (UD) at the User-Centric Data Science group at the Computer Science department of the Vrije Universiteit Amsterdam (VU). I am also a senior research fellow at Netherlands Institute for Sound and Vision. In my research, I combine (Semantic) Web technologies with Human-Computer Interaction, Knowledge Representation and Information Extraction to tackle research challenges in various domains. These include Cultural Heritage, Digital Humanities and ICT for Development (ICT4D). More information on these projects can be found on this site or through my CV .
[This post describes research by Fahad Ali and is based on his Msc. thesis]
Contextual constraints (lack of infrastructure, low-literacy etc.) play an important role in ICT for Development (ICT4D) projects. The Kasadaka project offers a technological platform for knowledge sharing applications in rural areas in Sub-Saharan Africa. However, lack of stable internet connections restrict exchange of data between distributed Kasadaka instances, which leads us to research alternative ways of machine-to-machine (m2m) communication.
Fahad Ali’s research focuses on mobile elements and using wifi sneakernets for this m2m to enable information sharing between geographically distributed devices. He developed a Raspberry Pi-based device called the Wifi-donkey that can be mounted on a vehicle and facilitates information exchange with nearby devices, using the built-in wifi card of the rPi 3.The solution is based on Piratebox offline file-sharing and communications system built with free software and uses off-the-shelf Linux software components and configuration settings to allow it to discover and connect to nearby Kasadaka devices based using Wifi technologies.
We evaluated the solution by simulating a low resource setting and testing it by performing so-called “pass-bys” in an Amsterdam residential area. In these cases, SPARQL queries are exchanged between host and client devices and we measure amount of RDF triples transferred. This setup matches earlier case requirements as described in Onno Valkering’s work.Results show that the system works fairly reliably in the simulated setting. The machine-to-machine communication method can be used in various ICT4D projects that require some sort of data sharing functionality.
You can find out more about Fahad’s work through the following resources:
- Fahad Ali’s Msc Thesis [PDF]
- The codebase for the configuration files on github.com/fahad105/masterproject/
- His final presentation slides (embedded below)
Last week, I visited the 11th Metadata and Semantics Research Conference (MTSR2017) in Tallinn, Estonia. This conference brings together computer scientists. information scientists and people from the domain of digital libraries to discuss their work in metadata and semantics. The 2017 edition of the conference draws around 70 people which is a great size for a single-track conference with lively discussions. The paper included interesting tracks on Cultural Heritage and Library (meta)data as well as one on Digital Humanities.
On the last day I presented our paper “Enriching Media Collections for Event-based Exploration” [draft pdf], co-authored with the people in the CLARIAH and DIVE+ team working on data enrichment and APIs: Liliana Melgar, Oana Inel Carlos Martinez Ortiz, Lora Aroyo and Johan Oomen. The slides for the presentation can be found here on slideshare. We were very happy to hear that our paper was presented the MTSR2017 Best Paper Award!
In the paper, we present a methodology to publish, represent, enrich, and link heritage collections so that they can be explored by domain expert users. We present four methods to derive events from media object descriptions. We also present a case study where four datasets with mixed media types are made accessible to scholars and describe the building blocks for event-based proto-narratives in the knowledge graph
This post describes the MSc theses of Ana-Liza Tjon-a-Pauw and Josien Jansen.
As a semantic web researcher, it is hard to sometimes not see ontologies and triples in aspects of my private life. In this case, through my contacts with dancers and choreographers, I have since a long time been interested in exploring knowledge representation for dance. After a few failed attempts to get a research project funded, I decided to let enthusiastic MSc. students have a go to continue with this exploration. This year, two Information Sciences students, Josien Jansen and Ana-Liza Tjon-a-Pauw, were willing to take up this challenge, with great success. With their background as dancers they did not only have the necessary background knowledge at but also access to dancers who could act as study and test subjects.
The questions of the two projects was therefore: 1) How can we model and represent dance in a sensible manner so that computers can make sense of choreographs and 2) How can we communicate those choreographies to the dancers?
Josien’s thesis addressed this first question. Investigating to what extent choreographers can be supported by semi-automatic analysis of choreographies through the generation of new creative choreography elements. She conducted an online questionnaire among 54 choreographers. The results show that a significant subgroup is willing to use an automatic choreography assistant in their creative process. She further identified requirements for such an assistant, including the semantic levels at which should operate and communicate with the end-users. The requirements are used for a design of a choreography assistant “Dancepiration”, which we implemented as a mobile application. The tool allows choreographers to enter (parts of) a choreography and uses multiple strategies for generating creative variations in three dance styles. Josien evaluated the tool in a user study where we test a) random variations and b) variations based on semantic distance in a dance ontology. The results show that this latter variant is better received by participants. We furthermore identify many differences between the varying dance styles to what extent the assistant supports creativity.
In her thesis, Ana-Liza dove deeper into the human-computer interaction side of the story. Where Josien had classical ballet and modern dance as background and focus, Ana-Liza looked at Dancehall and Hip-Hop dance styles. For her project, Ana-Liza developed four prototypes that could communicate pieces of computer-generated choreography to dancers through Textual Descriptions, 2-D Animations, 3-D Animations, and Audio Descriptions. Each of these presentation methods has its own advantages and disadvantages, so Ana-Liza made an extensive user survey with seven domain experts (dancers). Despite the relatively small group of users, there was a clear preference for the 3-D animations. Based on the results, Ana-Liza also designed an interactive choreography assistant (IDCAT).
The combined theses formed the basis of a scientific article on dance representation and communication that was accepted for publication in the renowned ACE entertainment conference, co-authored by us and co-supervisor Frank Nack.
You can find more information here:
VU’s Network Institute has a yearly Academy Assistant programme where small interdisciplinary research projects are funded. Within these projects, Master students from different disciplines are given the opportunity to work on these projects under supervision of VU staff members. As in previous years, this year, I also participate as a supervisor in one of these projects, in collaboration with Petra Bos from the Applied Linguistics department. And after having found two enthusiastic students: Dana Hakman from Information Science and Cerise Muller from Applied Linguistics, the project has just started.
Our project “ABC-Kb: A Knowledge base supporting the Assessment of language impairment in Bilingual Children” is aimed at supporting language therapists by (re-)structuring information about language development for bilingual children. Speech language therapists and clinical linguists face the challenge of diagnosing children as young as possible, also when their home language is not Dutch. Their achievements on standard (Dutch) language tests will not be reliable indicators for a language impairment. If diagnosticians had access to information on the language development in the Home Language of these children, this would be tremendously helpful in the diagnostic process.
This project aims to develop a knowledge base (KB) collecting relevant information on the specificities of 60 different home languages (normal and atypical language development), and on contrastive analyses of any of these languages with Dutch. To this end, we leverage an existing wiki: meertaligheidentaalstoornissenvu.wikispaces.com
This months’ edition of Europeana Insight features articles from this year’s LODLAM Challenge finalists, which include the winner: DIVE+. The online article “DIVE+: EXPLORING INTEGRATED LINKED MEDIA” discusses the DIVE+ User studies, data enrichment, exploratory interface and impact on the cultural heritage domain.
The paper was co-authored by Victor de Boer, Oana Inel, Lora Aroyo, Chiel van den Akker, Susane Legene, Carlos Martinez, Werner Helmich, Berber Hagendoorn, Sabrina Sauer, Jaap Blom, Liliana Melgar and Johan Oomen
This year, I was conference chair of the SEMANTiCS conference, which was held 11-14 Sept in Amsterdam. The conference was in my view a great success, with over 310 visitors across the four days, 24 parallel sessions including academic and industry talks, six keynotes, three awards, many workshops and lots of cups of coffee. I will be posting more looks back soon, but below is a storify item giving an idea of all the cool stuff that happened in the past week.
The extended abstract “Investigations into Speech Synthesis and Deep Learning-based colorization for audiovisual archives” has been accepted for publication at the NEM (New Eureopean Media) Summit 2017 to be held in Madrid end-of-November. This paper is based on Rudy Marsman’s thesis “Speech technology and colorization for audiovisual archives” and describes his research on using AI technologies in the context of an the Netherlands Institute for Sound and Vision. Specifically, Rudy experimented with developing speech synthesis software based on a library of narrated news videos (using the voice of the late Philip Bloemendal) and with the use of pre-trained deep learning colorization networks to colorize archival videos.
You can read more in the draft paper [PDF]:
Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen New life for old media: Investigations into Speech Synthesis and Deep Learning-based colorization for audiovisual archives. Extended Abstract proceedings of NEM summit 2017
Update: the slides as presented by Johan Oomen at NEM
[This post is based on Anggarda Prameswari’s Information Sciences MSc. Thesis]
For her M.Sc. Project, conducted at the Netherlands Institute for Sound and Vision (NISV), Information Sciences student Anggarda Prameswari (pictured right) investigated a local crowdsourcing application to allow NISV to gather crowd annotations for archival audio content. Crowdsourcing and other human computation techniques have proven their use for collecting large numbers of annotations, including in the domain of cultural heritage. Most of the time, crowdsourcing campaigns are done through online tools. Local crowdsourcing is a variant where annotation activities are based on specific locations related to the task.
Anggarda, in collaboration with NISV’s Themistoklis Karavellas, developed a platform called “Elevator Annotator”, to be used on-site. The platform is designed as a standalone Raspberry Pi-powered box which can be placed in an on-site elevator for example. It features a speech recognition software and a button-based UI to communicate with participants (see video below).
The effectiveness of the platform was evaluated in two different locations (at NISV and at Vrije Universiteit) and with two different modes of interaction (voice input and button-based input) through a local crowdsourcing experiment. In this experiments, elevator-travellers were asked to participate in an experiment. Agreeing participants were then played a short sound clip from the collection to be annotated and asked to identify a musical instrument.
The results show that this approach is able to achieve annotations with reasonable accuracy, with up to 4 annotations per hour. Given that these results were acquired from one elevator, this new form of crowdsourcing can be a promising method of eliciting annotations from on-site participants.
Furthermore, a significant difference was found between participants from the two locations. This indicates that indeed, it makes sense to think about localized versions of on-site crowdsourcing.
- Elevator Annotator website, including instructions and code to build your own Elevator Annotator
- Read all the details in Anggarda’s Elevator Annotator Thesis
- Look at the slides (embedded below)
At the Digital Humanities Benelux 2017 conference, the e-humanities Events working group organized a panel with the titel “A Pragmatic Approach to Understanding and Utilizing Events in Cultural Heritage”. In this panel, researchers from Vrije Universiteit Amsterdam, CWI, NIOD, Huygens ING, and Nationaal Archief presented different views on Events as objects of study and Events as building blocks for historical narratives.
— Lora Aroyo (@laroyo) July 5, 2017
The session was packed and the introductory talks were followed by a lively discussion. From this discussion it became clear that consensus on the nature of Events or what typology of Events would be useful is not to be expected soon. At the same time, a simple and generic data model for representing Events allows for multiple viewpoints and levels of aggregations to be modeled. The combined slides of the panel can be found below. For those interested in more discussion about Events: A workshop at SEMANTICS2017 will also be organized and you can join!
We are excited to announce that DIVE+ has been awarded the Grand Prize at the LODLAM Summit, held at the Fondazione Giorgio Cini this week. The summit brought together ~100 experts in the vibrant and global community of Linked Open Data in Libraries, Archives and Museums. It is organised bi-annually since 2011. Earlier editions were held in the US, Canada and Australia, making the 2017 edition the first in Europe.
The Grand Prize (USD$2,000) was awarded by the LODLAM community. It’s recognition of how DIVE+ demonstrates social, cultural and technical impact of linked data. The Open Data Prize (of USD$1,000) was awarded to WarSampo for its groundbreaking approach to publish open data
.Five finalists were invited to present their work, selected from a total of 21 submissions after an open call published earlier this year. Johan Oomen, head of research at the Netherlands Institute for Sound and Vision presented DIVE+ on day one of the summit. The slides of his pitch have been published, as well as the demo video that was submitted to the open call. Next to DIVE+ (Netherlands) and WarSampo (Finland) the finalists were Oslo public library (Norway), Fishing in the Data Ocean (Taiwan) and Genealogy Project (China). The diversity of the finalists is a clear indication that the use of linked data technology is gaining momentum. Throughout the summit, delegates have been capturing the outcomes of various breakout sessions. Please look at the overview of session notes and follow @lodlam on Twitter to keep track.
DIVE+ is an event-centric linked data digital collection browser aimed to provide an integrated and interactive access to multimedia objects from various heterogeneous online collections. It enriches the structured metadata of online collections with linked open data vocabularies with focus on events, people, locations and concepts that are depicted or associated with particular collection objects. DIVE+ is the result of a true interdisciplinary collaboration between computer scientists, humanities scholars, cultural heritage professionals and interaction designers. DIVE+ is integrated in the national CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) research infrastructure.
DIVE+ is a collaborative effort of the VU University Amsterdam (Victor de Boer, Oana Inel, Lora Aroyo, Chiel van den Akker, Susane Legene), Netherlands Institute for Sound and Vision (Jaap Blom, Liliana Melgar, Johan Oomen), Frontwise (Werner Helmich), University of Groningen (Berber Hagendoorn, Sabrina Sauer) and the Netherlands eScience Centre (Carlos Martinez). It is supported by CLARIAH and NWO.
The LODLAM Challenge was generously sponsored by Synaptica. We would also like to thank the organisers, especially Valentine Charles and Antoine Isaac of Europeana and Ingrid Mason of Aarnet for all of their efforts. LODLAM 2017 has been a truly unforgettable experience for the DIVE+ team.