Simulating creativity in GANs with IoT

[This blog post is based on the Artificial Intelligence MSc thesis project from Fay Beening, supervised by myself and Joost de Boo, more information can be found on Fay’s website]

Recently, generative art has been one of the fields where AI, especially deep learning has caught the public eye. Algorithms and online tools such as Dall-E are able to produce astounding results based on large artistic datasets. One class of algorithms that has been at the root of this success is the Generative Adversarial Network (GAN), frequently used in online art-generating tools because of their ability to produce realistic artefacts.

but, is this “””real””” art? is this “””real””” creativity?

To address this, Fay investigated current theories on art and art education and found that these imply that true human creativity can be split into three types: 1) combinational, 2) explorative and 3) transformative creativity but that it also requires real-world experiences and interactions with people and the environment. Therefore, Fay in her thesis proposes to combine the GAN with an Internet of Things (IoT) setup to make it behave more creative.

Arduin-based prototype (image from Fay’s thesis)

She then designed a system that extends the original GAN with an interactive IoT system (implemented in an Arduino-based prototype) to simulate a more creative process. The prototype of the design showed a successful implementation of creative behaviour that can react to the environment and gradually change the direction of the generated images.

Images shown to the participant during the level of creativity task. Images 2 and 6 are creative GAN generated images. Images 1 and 5 are human-made art. Images 3 and 4 are online GAN generated art.

The generated art was evaluated based on their creativity by doing task-based interviews with domain experts. The results show that the the level to which the generated images are considered to be creative depends heavily on the participant’s view of creativity.

Share This:

Thesis writing guidelines

As supervisor of many MSc and BSc theses, I find myself giving writing tips and guidelines quite often. Inspired by Jan van Gemert’s guidelines, I compiled my own document with tips and guidelines for writing an CS/AI/IS bachelor or master thesis. These are things that I personally care about and other lecturers might have different ideas. Also, this is by no means a complete list and I will use it as a living document. You can find it here: https://tinyurl.com/victorthesiswriting

Share This:

Student-supported project in the news

It was great to see that one of this year’s Digital Humanities in Practice projects lead to a conversation between the students in that project Helene Ayar and Edith Brooks, their external supervisors Willemien Sanders (UU) and Mari Wigham (NISV) and an advisor for another project André Krouwel (VU). That conversation resulted in original research and CLARIAH MediaSuite data story “‘Who’s speaking?’- Politicians and parties in the media during the Dutch election campaign 2021” where the content of news programmes was analysed for politicians’ names, their gender and party affiliation.

The results are very interesting and subsequently appeared on Dutch news site NOS.nl, showing that right-wing politicians are more represented on radio and tv: “Onderzoek: Rechts domineert de verkiezingscampagne op radio en tv“. Well done and congratulations!

Share This:

Historical Toponym Disambiguation

[This blog post is based on the Master thesis Information Sciences of Bram Schmidt, conducted at the KNAW Humanities cluster and IISG. It reuses text from his thesis]

Place names (toponyms) are very ambiguous and may change over time. This makes it hard to link mentions of places to their corresponding modern entity and coordinates, especially in a historical context. We focus on historical Toponym Disambiguation approach of entity linking based on identified context toponyms.

The thesis specifically looks at the American Gazetteer. These texts contain fundamental information about major places in its vicinity. By identifying and exploiting these tags, we aim to estimate the most likely position for the historical entry and accordingly link it to its corresponding contemporary counterpart.

Example of a toponym in the Gazetteer

Therefore, in this case study, Bram Schmidt examined the toponym recognition performance of state-of-the-art Named Entity Recognition (NER) tools spaCy and Stanza concerning historical texts and we tested two new heuristics to facilitate efficient entity linking to the geographical database of GeoNames.

Experiments with different geo-distance heuristics show that indeed this can be used to disambiguate place names.

We tested our method against a subset of manually annotated records of the gazetteer. Results show that both NER tools do function insufficiently in their task to automatically identify relevant toponyms out of the free text of a historical lemma. However, exploiting correctly identified context toponyms by calculating the minimal distance among them proves to be successful and combining the approaches into one algorithm shows improved recall score.

Bram’s thesis was co-supervised by Marieke van Erp and Romke Stapel. His thesis can be found here [pdf]

Share This:

Automating Authorship Attribution

[This blog post was written by Nizar Hirzalla and describes his VU Master AI project conducted at the Koninklijke Bibliotheek (KB), co-supervised by Sara Veldhoen]

Authorship attribution is the process of correctly attributing a publication to its corresponding author, which is often done manually in real-life settings. This task becomes inefficient when there are many options to choose from due to authors having the same name. Authors can be defined by characteristics found in their associated publications, which could mean that machine learning can potentially automate this process. However, authorship attribution tasks introduce a typical class imbalance problem due to a vast number of possible labels in a supervised machine learning setting. To complicate this issue even more, we also use problematic data as input data as this mimics the type of available data for many institutions; data that is heterogeneous and sparse of nature.

Inside the KB (photo S. ter Burg)

The thesis searches for answers regarding how to automate authorship attribution with its known problems and this type of input data, and whether automation is possible in the first place. The thesis considers children’s literature and publications that can have between 5 and 20 potential authors (due to having the same exact name). We implement different types of machine learning methodologies for this method. In addition, we consider all available types of data (as provided by the National Library of the Netherlands), as well as the integration of contextual information.

Furthermore, we consider different types of computational representations for textual input (such as the title of the publication), in order to find the most effective representation for sparse text that can function as input for a machine learning model. These different types of experiments are preceded by a pipeline that consists out of pre-processing data, feature engineering and selection, converting data to other vector space representations and integrating linked data. This pipeline shows to actively improve performance when used with the heterogeneous data inputs.

Implemented neural network architectures for TFIDF (left) and Word2Vec (right) based text classification

Ultimately the thesis shows that automation can be achieved in up to 90% of the cases, and in a general sense can significantly reduce costs and time consumption for authorship attribution in a real-world setting and thus facilitate more efficient work procedures. While doing so, the thesis also finds the following key notions:

  1. Between comparison of machine learning methodologies, two methodologies are considered: author classification and similarity learning. Author classification grants the best raw performance (F1. 0.92), but similarity learning provides the most robust predictions and increased explainability (F1. 0.88). For a real life setting with end users the latter is recommended as it presents a more suitable option for integration of machine learning with cataloguers, with only a small hit to performance.
  2. The addition of contextual information actively increases performance, but performance depends on the type of information inclusion. Publication metadata and biographical author information are considered for this purpose. Publication metadata shows to have the best performance (predominantly the publisher and year of publication), while biographical author information in contrast negatively affects performance.
  3. We consider BERT, word embeddings (Word2Vec and fastText) and TFIDF for representations of textual input. BERT ultimately grants the best performance; up to 200% performance increase when compared to word embeddings. BERT is a sophisticated language model with an applied transformer, which leads to more intricate semantic meaning representation of text that can be used to identify associated authors. 
  4. Based on surveys and interviews, we also find that end users mostly attribute importance to author related information when engaging in manual authorship attribution. Looking more in depth into the machine learning models, we can see that these primarily use publication metadata features to base predictions upon. We find that such differences in perception of information should ultimately not lead to negative experiences, as multiple options exist for harmonizing both parties’ usage of information.
Summary of the final performances of the best performing models from the differing implemented methodologies

Share This:

Hearing (Knowledge) Graphs

[This post is based on Enya Nieland‘s Msc Thesis “Generating Earcons from Knowledge Graphs” ]

Three earcons with varying pitch, rythm and both pitch and rythm

Knowledge Graphs are becoming enormously popular, which means that users interacting with such complex networks are diversifying. This requires new and innovative ways of interacting. Several methods for visualizing, summarizing or exploring knowledge have been proposed and developed. In this student project we investigated the potential for interacting with knowledge graphs through a different modality: sound.

The research focused on the question how to generate meaningful sound or music from (knowledge) graphs. The generated sounds should provide users some insights into the properties of the network. Enya framed this challenge with the idea of “earcons” the auditory version of an icon.

Enya eventually developed a method that automatically produces these types of earcon for random knowledge graphs. Each earcon consist of three notes that differ in pitch and duration. As example, listen to the three earcons which are shown in the figure on the left.

Earcon where pitch varies
Earcon where note duration varies
Earcon where both pitch and rythm vary

The earcon parameters are derived from network metrics such as minimum, maximum and average indegree or outdegree. A tool with user interface allowed users to design the earcons based on these metrics.

The pipeline for creating earcons
The GUI

The different variants were evaluated in an extensive user test of 30 respondents to find out which variants were the most informative. The results show that indeed, the individual elements of earcons can provide insights into these metrics, but that combining them is confusing to the listener. In this case, simpler is better.

Using this tool could be an addition to a tool such as LOD Laundromat to provide an instant insight into the complexity of KGs. It could additionally benefit people who are visually impaired and want to get an insight into the complexity of Knowledge Graphs

Share This:

Linked Art Provenance

In the past year, together with Ingrid Vermeulen (VU Amsterdam) and Chris Dijkshoorn (Rijksmuseum Amsterdam), I had the pleasure to supervise two students from VU, Babette Claassen and Jeroen Borst, who participated in a Network Institute Academy Assistant project around art provenance and digital methods. The growing number of datasets and digital services around art-historical information presents new opportunities for conducting provenance research at scale. The Linked Art Provenance project investigated to what extent it is possible to trace provenance of art works using online data sources.

Caspar Netscher, the Lacemaker, 1662, oil on canvas. London: the Wallace Collection, P237

In the interdisciplinary project, Babette (Art Market Studies) and Jeroen (Artificial Intelligence) collaborated to create a workflow model, shown below, to integrate provenance information from various online sources such as the Getty provenance index. This included an investigation of potential usage of automatic information extraction of structured data of these online sources.

This model was validated through a case study, where we investigate whether we can capture information from selected sources about an auction (1804), during which the paintings from the former collection of Pieter Cornelis van Leyden (1732-1788) were dispersed. An example work , the Lacemaker, is shown above. Interviews with various art historian validated the produced workflow model.

The workflow model also provides a basic guideline for provenance research and together with the Linked Open Data process can possibly answer relevant research questions for studies in the history of collecting and the art market.

More information can be found in the Final report

Share This:

Exploring Automatic Recognition of Labanotation Dance Scores

[This post describes the research of Michelle de Böck and is based on her MSc Information Sciences thesis.]

Digitization of cultural heritage content allows for the digital archiving, analysis and other processing of that content. The practice of scanning and transcribing books, newspapers and images, 3d-scanning artworks or digitizing music has opened up this heritage for example for digital humanities research or even for creative computing. However, with respect to the performing arts, including theater and more specifically dance, digitization is a serious research challenge. Several dance notation schemes exist, with the most established one being Labanotation, developed in 1920 by Rudolf von Laban. Labanotation uses a vertical staff notation to record human movement in time with various symbols for limbs, head movement, types and directions of movements.

Generated variations of movements used for training the recognizers

Where for musical scores, good translations to digital formats exist (e.g. MIDI), for Lanabotation, these are lacking. While there are structured formats (LabanXML, MovementXML), the majority of content still only exists either in non-digitized form (on paper) or in scanned images. The research challenge of Michelle de Böck’s thesis therefore was to identify design features for a system capable of recognizing Labanotation from scanned images.

Examples of Labanotation files used in the evaluation of the system.

Michelle designed such a system and implemented this in MATLAB, focusing on a few movement symbols. Several approaches were developed and compared, including approaches using pre-trained neural networks for image recognition (AlexNet). This approach outperformed others, resulting in a classification accuracy of 78.4%. While we are still far from developing a full-fledged OCR system for Labanotation, this exploration has provided valuable insights into the feasibility and requirements of such a tool.

Share This:

Architectural Digital Humanities student projects

In the context of our ArchiMediaL project on Digital Architectural History, a number of student projects explored opportunities and challenges around enriching the colonialarchitecture.eu dataset. This dataset lists buildings and sites in countries outside of Europe that at the time were ruled by Europeans (1850-1970).

Patrick Brouwer wrote his IMM bachelor thesis “Crowdsourcing architectural knowledge: Experts versus non-experts” about the differences in annotation styles between architecture historical experts  and non-expert crowd annotators. The data suggests that although crowdsourcing is a viable option for annotating this type of content. Also, expert annotations were of a higher quality than those of non-experts. The image below shows a screenshot of the user study survey.

Rouel de Romas also looked at crowdsourcing , but focused more on the user interaction and the interface involved in crowdsourcing. In his thesis “Enriching the metadata of European colonial maps with crowdsourcing”  he -like Patrick- used the Accurator platform, developed by Chris Dijkshoorn. A screenshot is seen below.  The results corroborate the previous study that the in most cases the annotations provided by the participants do meet the requirements provided by the architectural historian; thus, crowdsourcing is an effective method to enrich the metadata of European colonial maps.

Finally, Gossa Lo looked at automatic enrichment using OCR techniques on textual documents for her Mini-Master projcet. She created a specific pipeline for this, which can be seen in the image below. Her code and paper are available on this Github page:https://github.com/biktorrr/aml_colonialnlp

Share This:

Who uses DBPedia anyway?

[this post is based on Frank Walraven‘s Master thesis]

Who uses DBPedia anyway? This was the question that started a research project for Frank Walraven. This question came up during one of the meetings of the Dutch DBPedia chapter, of which VUA is a member. If usage and users are better understood, this can lead to better servicing of those users, by for example prioritizing the enrichment or improvement of specific sections of DBPedia Characterizing use(r)s of a Linked Open Data set is an inherently challenging task as in an open Web world, it is difficult to know who are accessing your digital resources. For his Msc project research, which he conducted at the Dutch National Library supervised by Enno Meijers , Frank used a hybrid approach using both a data-driven method based on user log analysis and a short survey of know users of the dataset. As a scope Frank selected just the Dutch DBPedia dataset.

For the data-driven part of the method, Frank used a complete user log of HTTP requests on the Dutch DBPedia. This log file (see link below) consisted of over 4.5 Million entries and logged both URI lookups and SPARQL endpoint requests. For this research only a subset of the URI lookups were concerned.

As a first analysis step, the requests’ origins IPs were categorized. Five classes can be identified (A-E), with the vast majority of IP addresses being in class “A”: Very large networks and bots. Most of the IP addresses in these lists could be traced back to search engine

indexing bots such as those from Yahoo or Google. In classes B-F, Frank manually traced the top 30 most encounterd IP-addresses, concluding that even there 60% of the requests came from bots, 10% definitely not from bots, with 30% remaining unclear.

The second analysis step in the data-driven method consisted of identifying what types of pages were most requested. To cluster the thousands of DBPedia URI request, Frank retriev

ed the ‘categories’ of the pages. These categories are extracted from Wikipedia category links. An example is the “Android_TV” resource, which has two categories: “Google” and “Android_(operating_system)”. Following skos:broader links, a ‘level 2 category’ could also be found to aggregate to an even higher level of abstraction. As not all resources have such categories, this does not give a complete image, but it does provide some ideas on the most popular categories of items requested. After normalizing for categories with large amounts of incoming links, for example the category “non-endangered animal”, the most popular categories where 1. Domestic & International movies, 2. Music, 3. Sports, 4. Dutch & International municipality information and 5. Books.

Frank also set up a user survey to corroborate this evidence. The survey contained questions about the how and why of the respondents Dutch DBPedia use, including the categories they were most interested in. The survey was distributed using the Dutch DBPedia websitea and via twitter however only attracted 5 respondents. This illustrates

the difficulty of the problem that users of the DBPedia resource are not necessarily easily reachable through communication channels. The five respondents were all quite closely related to the chapter but the results were interesting nonetheless. Most of the users used the DBPedia SPARQL endpoint. The full results of the survey can be found through Frank’s thesis, but in terms of corroboration the survey revealed that four out of the five categories found in the data-driven method were also identified in the top five resulting from the survey. The fifth one identified in the survey was ‘geography’, which could be matched to the fifth from the data-driven method.Frank’s research shows that although it remains a challenging problem, using a combination of data-driven and user-driven methods, it is indeed possible to get an indication into the most-used categories on DBPedia. Within the Dutch DBPedia Chapter, we are currently considering follow-up research questions based on Frank’s research.

Share This: