FUnctionality Sharing In Open eNvironments
Heinz Nixdorf Chair for Distributed Information Systems

Combining Image and Caption Analysis for Classifying Charts in Biodiversity Texts

Title: Combining Image and Caption Analysis for Classifying Charts in Biodiversity Texts
Authors: Pawandeep Kaur and Dora Kiesal
Source: IVAPP 2020
Place: Malta
Date: 2020-02-27
Type: Publication

Chart type classification through caption analysis is a new area of study. Distinct keywords in the captions that relate to the visualization vocabulary (e.g., for scatterplot: dot, y-axis, x-axis, bubble) and keywords from the specific domain (e.g., species richness, species abundance, phylogenetic associations in the case of biodiversity research), serve as parameters to train a text classifier. For better chart comprehensibility, along with the visual characteristics of the chart, a classifier should also understand these parameters well. Such conceptual/semantic chart classifiers then will not only be useful for chart classification purposes but also for other visualization studies. One of the application of such classifier is in domain knowledgebase visualization recommendation system. Where these text classifiers can provide the recommendation of visualization types or schema based on the classification of the text provided along with the dataset. Motivated by this use case, in this paper, we have explored our idea of semantic chart classifiers. We have taken the assistance of state-of-the-art natural language processing (NLP) and computer vision algorithms to create a biodiversity domain-based visualization classifier. With an average test accuracy (F1-score) of 92.2\% over all 15 classes, we can prove that our classifiers can differentiate between different chart types conceptually and visually.