Big Data Europe Platform paper at ICWE 2017

With the launch of the Big Data Europe platform behind us, we are telling the world about our nice platform and the many pilots in the societal challenge domains that we have executed and evaluated. We wrote everything down in one comprehensive paper which was accepted at the 7th international conference on Web Engineering (ICWE 2017) which is to be held in Rome next month.

High-level BDE architecture (copied from the paper Auer et al.)

The paper “The BigDataEurope Platform – Supporting the Variety Dimension of Big Data”  is co-written by a very large team (see below) and it presents the BDE platform — an easy-to-deploy, easy-to-use and adaptable (cluster-based and standalone) platform for the execution of big data components and tools like Hadoop, Spark, Flink, Flume and Cassandra.  To facilitate the processing of heterogeneous data, a particular innovation of the platform is the Semantic Layer, which allows to directly process RDF data and to map and transform arbitrary data into RDF. The platform is based upon requirements gathered from seven of the societal challenges put forward by the European Commission in the Horizon 2020 programme and targeted by the BigDataEurope pilots. It is validated through pilot applications in each of these seven domains. .A draft version of the paper can be found here.


The full reference is:

Sören Auer, Simon Scerri, Aad Versteden, Erika Pauwels, Angelos Charalambidis, Stasinos Konstantopoulos, Jens Lehmann, Hajira Jabeen, Ivan Ermilov, Gezim Sejdiu, Andreas Ikonomopoulos, Spyros Andronopoulos, Mandy Vlachogiannis, Charalambos Pappas, Athanasios Davettas, Iraklis A. Klampanos, Efstathios Grigoropoulos, Vangelis Karkaletsis, Victor de Boer, Ronald Siebes, Mohamed Nadjib Mami, Sergio Albani, Michele Lazzarini, Paulo Nunes, Emanuele Angiuli, Nikiforos Pittaras, George Giannakopoulos, Giorgos Argyriou, George Stamoulis, George Papadakis, Manolis Koubarakis, Pythagoras Karampiperis, Axel-Cyrille Ngonga Ngomo, Maria-Esther Vidal.   . Proceedings of The International Conference on Web Engineering (ICWE), ICWE2017, LNCS, Springer, 2017


Share This:

Paper about automatic labeling in IJDL

mompeltOur paper  “Evaluating Unsupervised Thesaurus-based Labeling of Audiovisual Content in an Archive Production Environment” was accepted for publication in the International Journal on Digital Libraries (IJDL). This paper, co-authored with Roeland Ordelman and Josefien Schuurman reports on a series of information extraction experiments carried out at the Netherlands Institute for Sound and Vision (NISV). Specifically, in the paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using subtitles. We look at how such an approach can provide acceptable results given requirements with respect to archival quality, authority and service levels to external users.


For this, we developed a text extraction pipeline (TESS), pictured here which extracts key terms and matches them to the NISV thesaurus, the GTAA. This journal paper is an extended version of the paper previously accepted at the TPDL conference and here provide an analysis of the term extraction after being taken into production, where we focus on performance variation with respect to term types and television programs. Having implemented the procedure in our production work-flow allows us to gradually develop the system further and to also assess the effect of the transformation from manual to automatic annotation from an end-user perspective.

The paper will appear on the Journal site shortly. A final draft version of the paper can be found here: deboer_ijdl2016evaluating_draft [PDF].



Share This:

Two TPDL papers accepted!

Today, the TPDL (International Conference on Theory and Practice of Digital Libraries) results came in and both papers on which I am a co-author got accepted. Today is a good day 🙂 tess_algThe first paper, we present work done during my stay at Netherlands Institute for Sound and Vision on automatic term extraction from subtitles. The interesting thing about this paper was that it was mainly how these algorithms were functioning in a ‘real’ context, that is within a larger media ecosystem. The paper was co-authored with Roeland Ordelman and Josefien Schuurman.

Screenshot of the QHP toolOn the second paper, I am one of the co-authors. In the paper “Supporting Exploration of Historical Perspectives across Collections”, we present an exploratory search application that highlights different perspectives on World War II across collections (including Verrijkt Koninkrijk). The project is funded by the Amsterdam Data Science seed project with Daan Odijk, research assistants Cristina Gârbacea and Thomas Schoegje, VU/CWI-colleagues Laura Hollink and Jacco van Ossenbruggen and  historian Kees Ribbens (NIOD). You can read more about it on Daan’s blog.

Share This: