Master Project Andrea Bravo Balado: Linking Historical Ship Records to Newspaper Archives

[This post was written by Andrea Bravo Balado and is cross-posted at her own blog. It describes her MSc. project supervised  by myself]

Linking historical datasets and making them available for the Web has increasingly become a subject of research in the field of digital humanities. In the Netherlands, history is intimately related to the maritime activity because it has been essential in the development of economic, social and cultural aspects of Dutch society. As such an important sector, it has been well documented by shipping companies, governments, newspapers and other institutions.

janwillemsen: foto Rotterdam historische schepen (click to view on flickr)In this master project we assume that, given the importance of maritime activity in every day life in the XIX and XX centuries, announcements on the departures and arrivals of ships or mentions of accidents or other events, can be found in newspapers.

We have taken a two-stage approach: first, an heuristic-based method for record linkage and then machine-learning algorithms for article classification to be used for filtering in combination with domain features. Evaluation of the linking method has shown that certain domain features were indicative of mentions of ships in newspapers. Moreover, the classifier methods scored near perfect precision in predicting ship related articles.

Enriching historical ship records with links to newspaper archives is significant for the digital history community since it connects two datasets that would have otherwise required extensive annotating work and man hours to align. Our work is part of the Dutch Ships and Sailors Linked Data Cloud project. Check out Andrea’s thesis[pdf].

[googleapps domain=”docs” dir=”presentation/d/1HSzQIWc5SX4AGjOsOlja6gF-n44OwGJRxixklUSQ6Gs/embed” query=”start=false&loop=false&delayms=30000″ width=”680″ height=”411″ /]

Share This:

Master project Rianne Nieland: Talking to Linked Data

[This post was written by Rianne Nieland. It describes her MSc. project supervised  by myself]

People in developing countries cannot access information on the Web, because they have no Internet access and are often low literate. A solution could be to provide voice-based access to data on the Web by using the GSM network.

afbeeldingIn my master project I have investigated how to make general-purpose data sets efficiently available using voice interfaces for GSM. To achieve this, I have developed two voice interfaces, one for Wikipedia and one for DBpedia. I have made two voice interfaces with two different kinds of input data sources, namely normal web data and Linked Data, to be able to compare them.

To develop the two voice interfaces, I first did requirements elicitation from literature and developed a user interface and conversion algorithms for Wikipedia and DBpedia concepts. With user tests the users evaluated the two voice interfaces, to be able to compare them on speed, error rate and usability.

[Rianne’s thesis presentation slides can be found on slideshare and is embedded below. Her thesis is attached here: Eindversie-Paper-Rianne-Nieland-2057069]


[slideshare id=37310122&w=476&h=400&sc=no]

Share This:

CSWS2013 summer school and keynote in Shanghai

ShanghaiLast week, Knud Moeller from datalysator and I were invited to give a set of lectures about Linked Data in the CSWS 2013 summer school in Shanghai, China. As far as we are concerned the summer school was a success. About 60 students received three mornings worth of lectures about the principles and practice of Linked Data from the two of us. In the afternoon, they heard talks about Semantic Web efforts from the likes of Baidu and Google.

Interested students Because of the unavailability/-reachability of twitter, facebook, slideshare and wordpress in China, the lecture materual can be found are online as pdfs through a HTML page at my VU homepage.

I also had the honour of giving a keynote speech about Linked Data for Cultural Heritage and Digital History in the main conference. Those slides can be found on Slideshare.

Share This:

ICT 4 Development course final presentations

[crosspost from]

This friday, a brand new course at the VU University Amsterdam came to a satisfying close. The ICT 4 Development course (ICT4D) was offered to VUA Computer Science students for the first  year and I feel it was a success. The course, which was a collaboration between the Computer Science department and the Center for International Cooperation of the same university, aimed to teach students how one should go about designing and deploying ICT projects in developing areas.

Student group presenting their XO deployment planTo this end, the students learned about the importance of considering local socio-economic contexts but also got to experience two technologies often used for development projects. The students received a crash course in the Sugar operating system for the XO laptop from the One Laptop Per Child project and were presented with a tutorial on VoiceXML for developing voice-based applications. Students formed groups and chose either one of these technologies to solve a real-world problen in its development context.

The course ended today with student group presentations. Three groups presented an XO deployment. One of these included an agricultural program in Namibia that involves teaching children about growing local food next to their schools. The XO laptop can assist this education by providing tips for growing the crops. Two other presentations focused on XO deployments in neighbouring countries Iran and Iraq and included mockups and prototypes for XO programs (activities) that assist children both inside and outside school. There is even a good chance that the program in Iraq will actually be deployed and one of the teachers (who happened to be one of the student’s mother) was present at the presentation.

student group presenting their VoiceXML moduleThe fourth group developed an additional voice module for the RadioMarché system currently deployed in Mali, allowing local farmers to call in with their mobile phones when they want to sell produce. A voice menu enables them to tell the system how much of a specific product they have to offer and how much money they want in return.

All in all, this trip around the world showed how much the students have learned. We hope some of the projects will actually lead to real deployments and are looking forward to teaching the course again next year.

Share This:

Nichesourcing pluvial data digitization for the Sahel

Example pluvial records digitized through Binyam's nichesourcing effort (photo's W. Tuijp)
Example pluvial records digitized through Binyam’s nichesourcing effort (photo’s W. Tuijp)

At EKAW 2012 I presented a position paper co-authored with a number of VU-colleagues on nichesourcing as a next phase in crowdsourcing practice. In Nichesourching, tasks are not distributed to the faceless crowd but rather to small groups of amateur experts that share a set of characteristics. These characteristics ensure that they can perform tasks that require specific knowledge with higher quality and furthermore they are more motivated through their connection with the context. The presentation slides are archived on Slideshare, the paper itself can be found here.

The paper and presentation features two use cases. One use case concerns the Master’s project by Binyam Tesfa, supervised by me and Pieter De Leenheer. Binyam investigated a Nichesourcing approach for digitizing pluvial data from the Sahel region in Africa. He developed and published a nichesourcing application on the web targetin the African diaspora (African expats currently living in the North). Binyam evaluated its success in terms of attracting dedicated participants and digitizing considerable amount of digital data. With one week release of our Nichesourcing application, the participants produced more than 5000 cells of structured digitized pluvial data. We also found that the anticipated niche (people with African affiliation) dedicatedly participated in the digitization. Binyam’s thesis can be found here: Nichesourcing: a case study for pluvial data digitization for the Sahel by B. Tesfa [PDF]

The other use case presented is the Rijksmuseum print annotation use case where 700.000 prints are to be annotated by amateur experts. Prints depicting flowers are distributed to flower-enthousiasts, prints of castles to castle-geeks etc. For this use case, the people in the COMMIT/ SEALINCMedia project are currently developing a nichesourcing methodology and application.

Share This:

VU Computer Science videos

Screenshot of nine videos (click to go to media page)

A while ago, the VU Computer Science commissioned the production of ten videos in which VU computer scientists explain their research. Pepijn Borgwat and his friends at Synergique (with a little from myself) made these ten videos, each clocking in at around 2 minutes. The videos will be used for marketing and educational purposes or even for dissemination of scientific results.

The videos are available in Dutch and English at the VUScience Youtube channel and at Pepijn’s Vimeo page. They are embedded below.

Continue reading

Share This:

W4RA student mini-workshop

Group photo of the First Web alliance for Regreening in Africa Student mini-symposium

Today, we held an informal workshop for students involved in MSc. projects related to the VOICES project and other activities associated with “the Web for Warm Countries”. The goal of this meeting was for the students to inform eachother about the current status of their research project and to sketch the bigger picture. I gave a short talk decribing the various running projects (VOICES, Furoba Blon, IDSWrapper, SemanticXO) as well as possible future projects (ICONS, the ICT4D course).

Six students presented us with their updates:

  • Henk Kroon told us a bit about his efforts into creating a client application that uses the  Linked Data based on RadioMarché.
  • Rokhsareh Nakhaei presented us with extensive models for her design of a serious game that will be used to gather voice fragments in different languages.
  • Albert Chifura is talking to many stakeholders to identify sustainable business models for the M-Event use case of VOICES
  • Binyam Tesfa is also developing a crowdsourcing application. He is doing this for digitizing pluvial data from the Sahel. He targets a specific niche (the African ‘diaspora’) to do this.
  • Deepak Chetri is doing literature research into the design of Voice-based interfaces for low-literate users in developing countries.
  • Gavarni Winter is the newest addition to the W4RA family, he is still contemplating the specific research questions.

Also present were Pieter De Leenheer, supervisor for a number of projects and Wendelien Tuyp from CIS, who could answer a number of questions about the African context. From my point of view, the meeting was a succes and we agreed to organize a second installment later this year.

Share This: