Dataset Comparison
Dataframes compare support
Profiling compare is supported from ydata-profiling version 3.5.0 onwards. Profiling compare is not (yet!) available for Spark Dataframes
ydata-profiling
can be used to compare multiple version of the same
dataset. This is useful when comparing data from multiple time periods,
such as two years. Another common scenario is to view the dataset
profile for training, validation and test sets in machine learning.
The following syntax can be used to compare two datasets:
The comparison report uses the title
attribute out of Settings
as a
label throughout. The colors are configured in
settings.html.style.primary_colors
. The numeric precision parameter
settings.report.precision
can be played with to obtain some additional
space in reports.
In order to compare more than two reports, the following syntax can be used:
Comparing more than 2 datasets | |
---|---|
Note
This functionality only ensures the support report comparison
of two datasets. It is possible to obtain the statistics - the report
may have formatting issues. One of the settings that can be changed is
settings.report.precision
. As a rule of thumb, the value 10 can be
used for a single report and 8 for comparing two reports.