Comparing Synthetic Data Generation Tools for IoT Data

[This post is based on the Bachelor Information Sciences project of Darin Pavlov and reuses text from his thesis. The research is part of VU’s effort in the InterConnect project and was supervised by Roderick van der Weerdt]

The concepts and technologies behind the Internet of Things (IoT) make it possible to establish networks of interconnected smart devices. Such networks can produce large volumes of data transmitted through sensors and actuators. Machine Learning can play a key role in processing this data towards several use cases in specific domains automotive, healthcare, manufacturing, etc. However, access to data for developing and testing Machine Learning is often hindered due to sensitivity of data, privacy issues etc.

One solution for this problem is to use synthetic data, resembling as much as possible real data. In his study, Darin Pavlov conducted a set of experiments, investigating the effectiveness of synthetic IoT data generation by three different tools:

This table shows the results of one of the two Machine Learning detection tests showing how difficult it is to differentiate the synthetic data from the real one with a Machine Learning model. For two datasets, the result is calculated as 1 minus the average ROC AUC score

Darin compared the tools on various distinguishability metrics. He observed that Mostly AI outperforms the other two generators, although Gretel.ai shows similar satisfactory results on the statistical metrics. The output of SDV on the other hand is poor on all metrics. Through this study we aim to encourage future research within the quickly developing area of synthetic data generation in the context of IoT technology.

More details can be found in Darin’s thesis.

Share This:

Keynote talk at Semantic AI workshop

Yesterday I had the honour and pleasure to give one of the keynote speeches at the 1st workshop of Semantic AI, co-located with SEMANTiCS2022 in Vienna. I my talk “Knowledge Graphs for impactful Data Science In the Digital Humanities and IOT domain“, I talked about challenges and lessons learned in various projects where 1) Knowledge Graphs 2) Machine Learning and 3) User Contexts interact in interesting ways. The slides for my talk can be found below.

Share This:

Co-director of Cultural AI lab

I am happy and proud I to announce that I will join Marieke van Erp and Laura Hollink as co-director of the Cultural AI lab. The lab brings together researchers from various research institutes and heritage organizations to investigate both how AI can be used to address various humanities and heritage challenges but also how we can use methods, theories and insights from the cultural domain to make better, fairer, more inclusive and diverse AI.

I am very excited about this and look forward to the wonderful research collaborations!

Share This:

News article on Knowledge Graphs for Maritime history

In the latest edition of the trade publication E-Data & Research, a nice article (in Dutch) about our research on knowledge graphs for maritime history is published. Thanks to Mathilde Jansen and of course my collaborators Stijn Schouten and Marieke van Erp! The image below shows the print article, the article can be found online here.

Share This:

Using the SAREF ontology for interoperability and machine learning in a Smart Home environment

Our abstract “Using the SAREF ontology for interoperability and machine learning in a Smart Home environment” was accepted for a presentation at the ICT Open conference in 6-7 April 2022 in Amsterdam. In the abstract, we outline the current and future research VU and TNO are conducting in the context of the InterConnect project, specifically around the construction of IOT knowledge graphs, machine learning and rule-based applications. We look forward to presenting it in April.

Share This:

A Polyvocal and Contextualised Semantic Web

[This post is the text of a 1-minute pitch at the IWDS symposium for our poster “A Polyvocal and Contextualised Semantic Web” which was published as the paper”Erp, Marieke van, and Victor de Boer. “A Polyvocal and Contextualised Semantic Web.” European Semantic Web Conference. Springer, Cham, 2021.”]

Knowledge graphs are a popular way of representing and sharing data, information and knowledge in many domains on the Semantic Web. These knowledge graphs however often represent singular -biased- views on the word, this can lead to unwanted bias in AI using this data. We therefore identify a need a more polyvocal Semantic Web.

So. How do we get there?

  1. We need perspective-aware methods for identifying existing polyvocality in datasets and for acquiring it from text or users.
  2. We need datamodels and patterns to represent polyvocal data information and knowledge.
  3. We need visualisations and tools to make the polyvocal knowledge accessible and usable for a wide variety of users, including domain experts or laypersons with varying backgrounds.

In the Cultural AI Lab, we investigate these challenges in several interrelated research projects, but we cannot do it, and should not do it alone and are looking for more voices to join us!

Share This:

Knowledge Graphs for Social Good presentation at UDS Ghana

Last week, I was invited to give a guest lecture at the University of Development Studies in Tamale, Ghana. Vrije Universiteit has a very interesting and fruitful collaboration with this great university. In my presentation “Knowledge Graphs for Social Good”, I introduce the principles and practice of knowledge graphs and their role with AI. I also talked about how knowledge graphs can be (and are) used for social impact. Finally, I talk about four challenges we encountered in our own efforts to make knowledge graphs meaningful for rural users in the Global South. 

We expect the recording to be shared, for now, the slides are embedded below and can be downloaded from Google Slides. 

Share This:

Linked Data, SPARQL and GraphDB tutorial

For several courses, I made a set of video lectures around Linked Data principles and practice, specifically in the context of Digital Humanities.

http://www.victordeboer.com/linked-data-for-digital-humanities/

This contains videos showing

  1. the principles of Linked Data
  2. The RDF-Turtle syntax
  3. Making RDF using OntoRefine
  4. SPARQL
  5. The hands-on exercises with Dutch Ships and Sailors

Also, this includes a sub-tutorial on using GraphDB, OntoRefine and SPARQL to

  • Download and install GraphDB
  • Get some interesting data in CSV
  • Convert to triples using OntoRefine
  • Find potential links in DBPedia
  • Link your data using SPARQL -> import data
  • Try out interesting SPARQL queries

Share This:

SEMANTiCS2021 in Amsterdam

This year, we organized the SEMANTiCS2021 conference in Amsterdam. Due to the ongoing COVID-19 retrictions, we opted for a hybrid conference. And hybrid it was! With 200 onsite and 264 online tickets sold this was as much a mix between online and onsite as it was a mix between industry and academia. The research track consisted of 19 papers, and the industry track was made up of 24 presentations. With four wonderful keynote speakers, a poster session and various special tracks and workshops, this was quite a full programme!

As far as I am concerned, a true success! See my Twitter-generate impression below.

Share This:

Modeling Ontologies for Individual Artists

[This post presents research done by Daan Raven in the context of his Master Project Information Sciences]

There is a long tradition in the Cultural Heritage domain of using structured, machine-interoperable knowledge using semantic methods and tools. However, research into developing and using ontologies specific to works of art of individual artists is persistently lacking. Such knowledge graphs would improve access to heritage information by making reasoning and inferencing possible. In his research, Daan Raven developed and applied a re-usable method, building on the ‘Methontology’ method for ontology development. We describe the steps of specification, conceptualization, integration, implementation and evaluation in a case study concerning ceramic-glass sculptor Barbara Nanning.

This work was presented at Digital Humanities Benelux 2021. The abstract and presentation as well as other digital resources related to the project can be found below:

Below are some examples of competency questions with pointers to SPARQL queries in YASGUI.

Which artworks in the Verre Églomisé collection of Nanning are currently stored in her private collection?https://api.triplydb.com/s/wKZG4UFq5
Show me a timeline of all process that require the use of an Annealing Kilnhttps://api.triplydb.com/s/j4Qk0tHzK
 # Show me all process steps that require the use of an annealing kiln and that have a landing page
https://api.triplydb.com/s/N5mo4uTM3
Show me (in Gallery) all objects made by “Jiří Pačinek Glass Lindava” (person in Wikidata)https://api.triplydb.com/s/C6LsEgiZF
Show me (in Geo) the locations of creation steps for various works (uses geonames)https://api.triplydb.com/s/THTkhOYjd

Share This: