Digital Humanities in Practice 2018/2019

Last friday, the students of the class of 2018/2019 of the course Digital Humanities and Social Analytics in Practice presented the results of their capstone internship project. This course and project is the final element of the Digital Humanities and Social Analytics minor programme in which students from very different backgrounds gain skills and knowledge about the interdisciplinary topic.

Poster presentation of the DHiP projects

The course took the form of a 4-week internship at an organization working with humanities or social science data and challenges and student groups were asked to use these skills and knowledge to address a research challenge. Projects ranged from cleaning, indexing, visualizing and analyzing humanities data sets to searching for bias in news coverage of political topics. The students showed their competences not only in their research work but also in communicating this research through great posters.

The complete list of student projects and collaborating institutions is below:

  • “An eventful 80 years’ war” at Rijksmuseum identifying and mapping historical events from various sources.
  • An investigation into the use of structured vocabularies also at the Rijksmuseum
  • “Collecting and Modelling Event WW2 from Wikipedia and Wikidata” in collaboration with Netwerk Oorlogsbronnen (see poster image below)
  • A project where an search index for Development documents governed by the NICC foundation was built.
  • “EviDENce: Ego Documents Events modelliNg – how individuals recall mass violence” – in collaboration with KNAW Humanities Cluster (HUC)
  • “Historical Ecology” – where students searched for mentions of animals in historical newspapers – also with KNAW-HUC
  • Project MIGRANT: Mobilities and connection project in collaboration with KNAW-HUC and Huygens ING
  • Capturing Bias with media data analysis – an internal project at VU looking at indentifying media bias
  • Locating the CTA Archive Amsterdam where a geolocation service and search tool was built
  • Linking Knowledge Graphs of Symbolic Music with the Web – also an internal project at VU working with Albert Merono
One of the posters visualizing the events and persons related to the occupation of the Netherlands in WW2
Update: The student posters are now online at https://github.com/biktorrr/dhip2019posters

Share This:

Architectural Digital Humanities student projects

In the context of our ArchiMediaL project on Digital Architectural History, a number of student projects explored opportunities and challenges around enriching the colonialarchitecture.eu dataset. This dataset lists buildings and sites in countries outside of Europe that at the time were ruled by Europeans (1850-1970).

Patrick Brouwer wrote his IMM bachelor thesis “Crowdsourcing architectural knowledge: Experts versus non-experts” about the differences in annotation styles between architecture historical experts  and non-expert crowd annotators. The data suggests that although crowdsourcing is a viable option for annotating this type of content. Also, expert annotations were of a higher quality than those of non-experts. The image below shows a screenshot of the user study survey.

Rouel de Romas also looked at crowdsourcing , but focused more on the user interaction and the interface involved in crowdsourcing. In his thesis “Enriching the metadata of European colonial maps with crowdsourcing”  he -like Patrick- used the Accurator platform, developed by Chris Dijkshoorn. A screenshot is seen below.  The results corroborate the previous study that the in most cases the annotations provided by the participants do meet the requirements provided by the architectural historian; thus, crowdsourcing is an effective method to enrich the metadata of European colonial maps.

Finally, Gossa Lo looked at automatic enrichment using OCR techniques on textual documents for her Mini-Master projcet. She created a specific pipeline for this, which can be seen in the image below. Her code and paper are available on this Github page:https://github.com/biktorrr/aml_colonialnlp

Share This:

Who uses DBPedia anyway?

[this post is based on Frank Walraven‘s Master thesis]

Who uses DBPedia anyway? This was the question that started a research project for Frank Walraven. This question came up during one of the meetings of the Dutch DBPedia chapter, of which VUA is a member. If usage and users are better understood, this can lead to better servicing of those users, by for example prioritizing the enrichment or improvement of specific sections of DBPedia Characterizing use(r)s of a Linked Open Data set is an inherently challenging task as in an open Web world, it is difficult to know who are accessing your digital resources. For his Msc project research, which he conducted at the Dutch National Library supervised by Enno Meijers , Frank used a hybrid approach using both a data-driven method based on user log analysis and a short survey of know users of the dataset. As a scope Frank selected just the Dutch DBPedia dataset.

For the data-driven part of the method, Frank used a complete user log of HTTP requests on the Dutch DBPedia. This log file (see link below) consisted of over 4.5 Million entries and logged both URI lookups and SPARQL endpoint requests. For this research only a subset of the URI lookups were concerned.

As a first analysis step, the requests’ origins IPs were categorized. Five classes can be identified (A-E), with the vast majority of IP addresses being in class “A”: Very large networks and bots. Most of the IP addresses in these lists could be traced back to search engine

indexing bots such as those from Yahoo or Google. In classes B-F, Frank manually traced the top 30 most encounterd IP-addresses, concluding that even there 60% of the requests came from bots, 10% definitely not from bots, with 30% remaining unclear.

The second analysis step in the data-driven method consisted of identifying what types of pages were most requested. To cluster the thousands of DBPedia URI request, Frank retriev

ed the ‘categories’ of the pages. These categories are extracted from Wikipedia category links. An example is the “Android_TV” resource, which has two categories: “Google” and “Android_(operating_system)”. Following skos:broader links, a ‘level 2 category’ could also be found to aggregate to an even higher level of abstraction. As not all resources have such categories, this does not give a complete image, but it does provide some ideas on the most popular categories of items requested. After normalizing for categories with large amounts of incoming links, for example the category “non-endangered animal”, the most popular categories where 1. Domestic & International movies, 2. Music, 3. Sports, 4. Dutch & International municipality information and 5. Books.

Frank also set up a user survey to corroborate this evidence. The survey contained questions about the how and why of the respondents Dutch DBPedia use, including the categories they were most interested in. The survey was distributed using the Dutch DBPedia websitea and via twitter however only attracted 5 respondents. This illustrates

the difficulty of the problem that users of the DBPedia resource are not necessarily easily reachable through communication channels. The five respondents were all quite closely related to the chapter but the results were interesting nonetheless. Most of the users used the DBPedia SPARQL endpoint. The full results of the survey can be found through Frank’s thesis, but in terms of corroboration the survey revealed that four out of the five categories found in the data-driven method were also identified in the top five resulting from the survey. The fifth one identified in the survey was ‘geography’, which could be matched to the fifth from the data-driven method.Frank’s research shows that although it remains a challenging problem, using a combination of data-driven and user-driven methods, it is indeed possible to get an indication into the most-used categories on DBPedia. Within the Dutch DBPedia Chapter, we are currently considering follow-up research questions based on Frank’s research.

Share This:

The Benefits of Linking Metadata for Internal and External users of an Audiovisual Archive

[This post describes the Master Project work of Information Science students Tim de Bruyn and John Brooks and is based on their theses]

Audiovisual archives adopt structured vocabularies for their metadata management. With Semantic Web and Linked Data now becoming more and more stable and commonplace technologies, organizations are looking now at linking these vocabularies to external sources, for example those of Wikidata, DBPedia or GeoNames.

However, the benefits of such endeavors to the organizations are generally underexplored. For their master project research, done in the form of an internship at the Netherlands Institute for Sound and Vision (NISV), Tim de Bruyn and John Brooks conducted a case study into the benefits of linking the “Common Thesaurus for Audiovisual Archives(or GTAA) and the general-purpose dataset Wikidata. In their approach, they identified various use cases for user groups that are both internal (Tim) as well as external (John) to the organization. Not only were use cases identified and matched to a partial alignment of GTAA and Wikidata, but several proof of concept prototypes that address these use cases were developed. 

 

For the internal users, three cases were elaborated, including a calendar service where personnel receive notifications when an author of a work has passed away 70 years ago, thereby changing copyright status of the work. This information is retrieved from the Wikidata page of the author, aligned with the GTAA entry (see fig 1 above).

A second internal case involves the new ‘story platform’ of NISV. Here Tim implemented a prototype enduser application to find stories related to the one currently shown to the user, based on persons occuring in that story (fig 2).

The external cases centered around the users of the CLARIAH Media Suite. For this extension, several humanities researchers were interviewed to identify worthwile extensions with Wikidata information. Based on the outcomes of these interviews, John Brooks developed the Wikidata retrieval service (fig 3).

The research presented in the two theses are a good example of User-Centric Data Science, where affordances provided by data linkages are aligned with various user needs. The various tools were evaluated with end users to ensure they match their actual needs. The research was reported in a research paper which will be presented at the MTSR2018 conference: (Victor de Boer, Tim de Bruyn, John Brooks, Jesse de Vos. The Benefits of Linking Metadata for Internal and External users of an Audiovisual Archive. To appear in Proceedings of MTSR 2018 [Draft PDF])

Find out more:

See my slides for the MTSR presentation below

 

Share This:

Developing a Sustainable Weather Information System in Rural Burkina Faso

[This post describes the Information Sciences Master Project of Hameedat Omoine and is based on her thesis.] 

In the quest to improve the lives of farmers and improve agricultural productivity in rural Burkina Faso, meteorological data has been identified as one of the is key information needs for local farmers. Various online weather information services are available, but many are not tailored specifically to tis target user group. In a research case study, Hameedat Omoine designed a weather information system that collects not only weather but also related agricultural information and provides the farmers with this information to allow them to improve agricultural productivity and the livelihood of the people of rural Burkina Faso.

The research and design of the system was conducted at and in collaboration with 2CoolMonkeys, a Utrecht-based Open data and App-development company with expertise in ICT for Development (ICT4D).

Following the design science research methodology, Hameedat investigated the requirements for a weather information system, and the possible options for ensuring the sustainability of the system. Using a structured approach, she developed the application and evaluated it in the field with potential Burkinabe end users. The mobile interface of the application featured weather information and crop advice (seen in the  images above). A demonstration video is shown below

Hameedat developed multiple alternative models to investigate the sustainability of the application. For this she used the e3value approach and language. The image below shows a model for the case where a local radio station is involved.

Share This:

Testimonials Digital Humanities minor at DHBenelux2018

At the DHBenelux 2018 conference, students from the VU minor “Digital Humanities and Social Analytics” presented their final DH in Practice work. In this video, the students talk about their experience in the minor and the internship projects. We also meet other participants of the conference talking about the need for interdisciplinary research.

 

Share This:

An Augmented Reality App to Annotate Art

[This post is based on the Bachelor project by Jurjen Braam and reuses content from his thesis]

The value of Augmented Reality applications has been shown for a number of different tasks. Most of these show that AR applications add to the immersiveness of an experience. For his Bachelor Project, VU student Jurjen Braam researched to what extent AR technology makes sense for the task of annotating artworks.

To this end, Jurjen built a mobile application which allows experts or laypeople to add textual annotations to artworks in three different modes. One mode doesnt show the artwork, but allows for textual input, the 2nd mode shows the work in an image and allows for localised annotations. The last mode is the AR mode, which projects the artwork in the physical space, using the device camera and screen.

Three modes of the Application (Text, 2D, AR)

Jurjen evaluated the three modes through a small user study, which showed that immersion and enjoyment was highest in the AR mode but that this mode was least efficient. Also, participants indicated that for annotation tasks, larger screens would be preferable.

User evaluation in action

This research was a unique endeavour combining a proven technology (AR) and well-known task (Annotation) which identified interesting possibilities for follow-up research.

Share This:

A Voice Service Development Kit for the Kasadaka platform

[This post is written by André Baart and describes his MSc thesis]

While the internet usage in the developing world is still low, the adoption of simple mobile phones is widespread. A way to offer the advantages of the internet to these populations is voice-based information systems. The KasaDaka voice-services platform is aimed at providing voice-services in the context of ICT for Development (ICT4D). The platform is based on a Raspberry Pi and a GSM modem, which enables affordable voice-service hosting, using the locally available GSM network. The platform takes into account the special requirements of the ICT4D context, such as limited internet connectivity and low literacy rates.

This research focuses on lowering the barrier to entry of voice-service development, by reducing the skill set needed to do so. A Voice Service Development Kit (VSDK) is developed that allows the development of voice-services by deploying and customizing provided building-blocks. These building blocks each represent a type of interaction that is often found in voice-services. (for example a menu, user voice input or the playback of a message) The researcher argues that the simplification of voice-service development is an essential step towards sustainable voice-services in the ICT4D context; As this increases the potential number of local voice-service developers, hremoving the dependency on foreign (and thus expensive) developers and engineers. This simplification should ideally be achieved by providing a graphical interface to voice-service development.

The VSDK was evaluated during the ICT4D course at the Vrije Universiteit Amsterdam, where students built applications for various ICT4D use-cases using the VSDK. Afterwards a survey was conducted, which provided insight on the students’ experiences with voice-service development and the VSDK. From the results of the evaluation is concluded that the building-block approach to voice-service development used in the VSDK, is successful for the development of simple voice-services. It allows newcomers to (voice-service) development, to quickly develop (simple) voice-services from a graphical interface, without requiring programming experience.

The VSDK combined with the existing KasaDaka platform provides a good solution to the hosting and development of voice-services in the ICT4D context.

More details can be found in the complete thesis.A slidedeck is included below. You can find the VSDK code on Andre’s Github: http://github.com/abaart/KasaDaka-VSDK

 

Share This:

Machine-to-machine communication in rural conditions: Realizing KasadakaNet

[This post describes research by Fahad Ali and is based on his Msc. thesis]

Contextual constraints (lack of infrastructure, low-literacy etc.) play an important role in ICT for Development (ICT4D) projects. The Kasadaka project offers a technological platform for knowledge sharing applications in rural areas in Sub-Saharan Africa. However, lack of stable internet connections restrict exchange of data between distributed Kasadaka instances, which leads us to research alternative ways of machine-to-machine (m2m) communication.

Example of a KasadakaNet situation, with a wifi-donkey mounted on a bus, visiting a city and two remote villages, creating a so-called sneakernet

Fahad Ali’s research focuses on mobile elements and using wifi sneakernets for this m2m to enable information sharing between geographically distributed devices. He developed a Raspberry Pi-based device called the Wifi-donkey that can be mounted on a vehicle and facilitates information exchange with nearby devices, using the built-in wifi card of the rPi 3.The solution is based on Piratebox offline file-sharing and communications system built with free software and uses off-the-shelf Linux software components and configuration settings to allow it to discover and connect to nearby Kasadaka devices based using Wifi technologies.

Experimental setup: the wifi-donkey taped to an Amsterdam balcony to test range and bandwith.

We evaluated the solution by simulating a low resource setting and testing it by performing so-called “pass-bys” in an Amsterdam residential area. In these cases, SPARQL queries are exchanged between host and client devices and we measure amount of RDF triples transferred. This setup matches earlier case requirements as described in Onno Valkering’s work.Results show that the system works fairly reliably in the simulated setting. The machine-to-machine communication method can be used in various ICT4D projects that require some sort of data sharing functionality.

You can find out more about Fahad’s work through the following resources:

Share This:

Dancing and Semantics

This post describes the MSc theses of Ana-Liza Tjon-a-Pauw and Josien Jansen. 

As a semantic web researcher, it is hard to sometimes not see ontologies and triples in aspects of my private life. In this case, through my contacts with dancers and choreographers, I have since a long time been interested in exploring knowledge representation for dance. After a few failed attempts to get a research project funded, I decided to let enthusiastic MSc. students have a go to continue with this exploration. This year, two Information Sciences students, Josien Jansen and Ana-Liza Tjon-a-Pauw, were willing to take up this challenge, with great success. With their background as dancers they did not only have the necessary background knowledge at but also access to dancers who could act as study and test subjects.

The questions of the two projects was therefore: 1) How can we model and represent dance in a sensible manner so that computers can make sense of choreographs and 2) How can we communicate those choreographies to the dancers?

Screenshot of the mobile choreography assistant prototype

Josien’s thesis addressed this first question. Investigating to what extent choreographers can be supported by semi-automatic analysis of choreographies through the generation of new creative choreography elements. She conducted an online questionnaire among 54 choreographers. The results show that a significant subgroup is willing to use an automatic choreography assistant in their creative process. She further identified requirements for such an assistant, including the semantic levels at which should operate and communicate with the end-users. The requirements are used for a design of a choreography assistant “Dancepiration”, which we implemented as a mobile application. The tool allows choreographers to enter (parts of) a choreography and uses multiple strategies for generating creative variations in three dance styles. Josien  evaluated the tool in a user study where we test a) random variations and b) variations based on semantic distance in a dance ontology. The results show that this latter variant is better received by participants. We furthermore identify many differences between the varying dance styles to what extent the assistant supports creativity.

Four participants during the 2nd user experiment. From left to right this shows variations presented through textual, 2D animation, 3D animation, and auditory instructions.

In her thesis, Ana-Liza dove deeper into the human-computer interaction side of the story. Where Josien had classical ballet and modern dance as background and focus, Ana-Liza looked at Dancehall and Hip-Hop dance styles. For her project, Ana-Liza developed four prototypes that could communicate pieces of computer-generated choreography to dancers through Textual Descriptions, 2-D Animations, 3-D Animations, and Audio Descriptions. Each of these presentation methods has its own advantages and disadvantages, so Ana-Liza made an extensive user survey with seven domain experts (dancers). Despite the relatively small group of users, there was a clear preference for the 3-D animations. Based on the results, Ana-Liza also designed an interactive choreography assistant (IDCAT).

The combined theses formed the basis of a scientific article on dance representation and communication that was accepted for publication in the renowned ACE entertainment conference, co-authored by us and co-supervisor Frank Nack.

You can find more information here:

Share This: