[1910.11040] Toward a view-based data cleaning architecture
We believe that our view-based architecture can provide some insights for data managers that are difficult to obtain using automatic solutions
Big data analysis has become an active area of study
with the growth of machine learning techniques.
To properly analyze data,
it is important to maintain high-quality data.
Thus, research on data cleaning is also important.
It is difficult to automatically detect and correct inconsistent values
for data requiring expert knowledge or
data created by many contributors, such as
integrated data from heterogeneous data sources.
An example of such data is metadata for scientific datasets,
which should be confirmed by data managers while handling the data.
To support the efficient cleaning of data by data managers,
we propose a data cleaning architecture
in which data managers interactively browse and correct portions of data through views.
In this paper, we explain our view-based data cleaning architecture and
discuss some remaining issues.