Exploring West African Folk Narrative Texts Using Machine Learning

It is so nice when two often very distinct research lines come together. In my case, Digital Humanities and ICT for Development rarely meet directly. But they sure did come together when Gossa Lô started with her Master AI thesis. Gossa, a long-time collaborator in the W4RA team, chose to focus on the opportunities for Machine Learning and Natural Language Processing for West-African folk tales. Her research involved constructing a corpus of West-African folk tales, performing various classification and text generation experiments and even included a field trip to Ghana to elicit information about folk tale structures. The work -done as part of an internship at Bolesian.ai– resulted in a beautiful Master AI thesis, which was awarded a very high grade.

As a follow up, we decided to try to rewrite the thesis into an article and submit it to a DH or ICT4D journal. This proved more difficult. Both DH and ICT4D are very multidisciplinary in nature and the combination of both proved a bit too much for many journals, with our article being either too technical, not technical enough, or too much out of scope.

But now, the article ” Exploring West African Folk Narrative Texts Using Machine Learning ” has been published (Open Access) in a special issue of Information on Digital Humanities!

Experiment 1: RNN network architecture of word-level (left) and character-level (right) models

T-SNE visualisation of the 2nd experiment

The paper examines how machine learning (ML) and natural language processing (NLP) can be used to identify, analyze, and generate West African folk tales. Two corpora of West African and Western European folk tales were compiled and used in three experiments on cross-cultural folk tale analysis:

  1. In the text generation experiment, two types of deep learning text generators are built and trained on the West African corpus. We show that although the texts range between semantic and syntactic coherence, each of them contains West African features.
  2. The second experiment further examines the distinction between the West African and Western European folk tales by comparing the performance of an LSTM (acc. 0.79) with a BoW classifier (acc. 0.93), indicating that the two corpora can be clearly distinguished in terms of vocabulary. An interactive t-SNE visualization of a hybrid classifier (acc. 0.85) highlights the culture-specific words for both.
  3. The third experiment describes an ML analysis of narrative structures. Classifiers trained on parts of folk tales according to the three-act structure are quite capable of distinguishing these parts (acc. 0.78). Common n-grams extracted from these parts not only underline cross-cultural distinctions in narrative structures, but also show the overlap between verbal and written West African narratives.
Example output of the word-level model text generator on translated W-African folk tale fragments

All resources, including data and code are found at https://github.com/GossaLo/afr-neural-folktales

Share This:

Virtual Human Rights Lawyer project

The Virtual Human Rights Lawyer is a joint project of Vrije Universiteit Amsterdam and the Netherlands Office of the Public International Law & Policy Group to help victims of serious human rights violations obtain access to justice at the international level. It enables users to find out how and where they can access existing global and regional human rights mechanisms in order to obtain some form of redress for the human rights violations they face or have faced.

In the video above Marieke de Hoon of VU’s Law faculty and Charlotte Gerritsen (Artificial Intelligence) talk about the goals for the project.

Share This:

“New life for old media” to be presented at NEM Summit 2017

The extended abstract “Investigations into Speech Synthesis and Deep Learning-based colorization for audiovisual archives” has been accepted for publication at the NEM (New NEM (cc-by circle ©heese https://www.flickr.com/photos/gratisdbth/7805513264)Eureopean Media) Summit 2017 to be held in Madrid end-of-November. This paper is based on Rudy Marsman’s thesis “Speech technology and colorization for audiovisual archives” and describes his research on using AI technologies in the context of an the Netherlands Institute for Sound and Vision. Specifically, Rudy experimented with developing speech synthesis software based on a library of narrated news videos (using the voice of the late Philip Bloemendal) and with the use of pre-trained deep learning colorization networks to colorize archival videos.

You can read more in the draft paper [PDF]:

Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen New life for old media: Investigations into Speech Synthesis and Deep Learning-based colorization for audiovisual archives. Extended Abstract proceedings of NEM summit 2017

Update: the slides as presented by Johan Oomen at NEM

Share This: