Available Settings
A set of options is available in order to customize the behaviour of ydata-profiling
and the appearance of the generated report. The depth of customization allows the creation of behaviours highly targeted at the specific dataset being analysed. The available settings are listed below. To learn how to change them, check :doc:changing_settings
.
General settings
Global report settings:
Parameter | Type | Default | Description |
---|---|---|---|
title |
string | Pandas Profiling Report | Title for the report, shown in the header and title bar. |
pool_size |
integer | 0 | Number of workers in thread pool. When set to zero, it is set to the number of CPUs available. |
progress_bar |
boolean | True |
If True , ydata-profiling will display a progress bar. |
Variable summary settings
Settings related with the information displayed for each variable.
Parameter | Type | Default | Description |
---|---|---|---|
sort |
None, asc or desc | nan | Sort the variables asc (ending), desc (ending) or None (leaves original sorting). |
variables.descriptions |
dict | {} | Ability to display a description alongside the descriptive statistics of each variable ({'var_name': 'Description'}). |
vars.num.quantiles |
list[float] | [0.05,0.25,0.5,0.75,0.95] | The quantiles to calculate. Note that .25, .5 and .75 are required for the computation of other metrics (median and IQR). |
vars.num.skewness_threshold |
integer | 20 | Warn if the skewness is above this threshold. |
vars.num.low_categorical_threshold |
integer | 5 | If the number of distinct values is smaller than this number, then the series is considered to be categorical. Set to 0 to disable. |
vars.num.chi_squared_threshold |
float | 0.999 | Set to 0 to disable chi-squared calculation. |
vars.cat.length |
boolean | True |
Check the string length and aggregate values (min, max, mean, media). |
vars.cat.characters |
boolean | False |
Check the distribution of characters and their Unicode properties. Often informative, but may be computationally expensive. |
vars.cat.words |
boolean | False |
Check the distribution of words. Often informative, but may be computationally expensive. |
vars.cat.cardinality_threshold |
integer | 50 | Warn if the number of distinct values is above this threshold. |
vars.cat.imbalance_threshold |
float | 0.5 | Warn if the imbalance score is above this threshold. |
vars.cat.n_obs |
integer | 5 | Display this number of observations. |
vars.cat.chi_squared_threshold |
float | 0.999 | Same as above, but for categorical variables. |
vars.bool.n_obs |
integer | 3 | Same as above, but for boolean variables. |
vars.bool.imbalance_threshold |
float | 0.5 | Warn if the imbalance score is above this threshold. |
Setting dataset schema type
Configure the schema type for a given dataset.
Missing data overview plots
Settings related with the missing data section and the visualizations it can include.
Parameter | Type | Default | Description |
---|---|---|---|
missing_diagrams.bar |
boolean | True |
Display a bar chart with counts of missing values for each column. |
missing_diagrams.matrix |
boolean | True |
Display a matrix of missing values. Similar to the bar chart, but might provide overview of the co-occurrence of missing values in rows. |
missing_diagrams.heatmap |
boolean | True |
Display a heatmap of missing values, that measures nullity correlation (i.e. how strongly the presence or absence of one variable affects the presence of another). |
Configuration example: disable heatmap for large datasets | |
---|---|
Correlations
Settings regarding correlation metrics and thresholds.
The default value is auto
, but the following correlation matrices are available:
Parameter | Description |
---|---|
auto |
Calculates the column pairwise correlation depending on the type schema: |
- numerical to numerical variable: Spearman correlation coefficient | |
- categorical to categorical variable: Cramer's V association coefficient | |
- numerical to categorical: Cramer's V association coefficient with the numerical variable discretized automatically | |
spearman |
Spearman's correlation measures the strength and direction of monotonic association between two variables. Great to evaluate the strength of the relation between categorical or ordinal variables. |
pearson |
The Pearson correlation coefficient is the most common way of measuring a linear correlation. It is a number between –1 and 1 that measures the strength and direction of the relationship between two variables. |
kendall |
Kendall rank correlation coefficient is a statistic used to measure the ordinal association between two measured quantities. Kendall's is often used when data doesn't meet one of the requirements of Pearson's correlation. |
phi_k |
Phi K is especially suitable for working with mixed-type variables. Using this coefficient we can find (un)expected correlation and evaluate their statistical significance. |
cramers |
Cramers is a correlation matrix that is commonly used to examine the association between categorical variables when there is more than 2x2 contingency. |
For each correlation matrix you can use the following configurations:
Parameter | Type | Default | Description |
---|---|---|---|
correlations.auto.calculate |
boolean | True |
Whether to compute 'auto' correlation |
correlations.auto.warn_high_correlations |
boolean | True |
Show warning for correlations higher than the threshold |
correlations.auto.threshold |
float | 0.9 | Warning threshold |
correlations.pearson.calculate |
boolean | False |
Whether to calculate Pearson correlation |
correlations.pearson.warn_high_correlations |
boolean | True |
Show warning for correlations higher than the threshold |
correlations.pearson.threshold |
float | 0.9 | Warning threshold |
correlations.spearman.calculate |
boolean | False |
Whether to calculate Spearman correlation |
correlations.spearman.warn_high_correlations |
boolean | False |
Show warning for correlations higher than the threshold |
correlations.spearman.threshold |
float | 0.9 | Warning threshold |
correlations.kendall.calculate |
boolean | False |
Whether to calculate Kendall rank correlation |
correlations.kendall.warn_high_correlations |
boolean | False |
Show warning for correlations higher than the threshold |
correlations.kendall.threshold |
float | 0.9 | Warning threshold |
correlations.phi_k.calculate |
boolean | False |
Whether to calculate Phi K correlation |
correlations.phi_k.warn_high_correlations |
boolean | False |
Show warning for correlations higher than the threshold |
correlations.phi_k.threshold |
float | 0.9 | Warning threshold |
correlations.cramers.calculate |
boolean | False |
Whether to calculate Cramer's V association coefficient |
correlations.cramers.warn_high_correlations |
boolean | True |
Show warning for correlations higher than the threshold |
correlations.cramers.threshold |
float | 0.9 | Warning threshold |
For instance, to disable all correlation computations (might be relevant for large datasets):
Interactions
Settings related with the interactions section.
Parameter | Type | Default | Description |
---|---|---|---|
interactions.continuous |
boolean | True |
Generate a 2D scatter plot (or hexagonal binned plot) for all continuous variable pairs. |
interactions.targets |
list | [] | When a list of variable names is given, only interactions between these and all other variables are computed. |
Report's appearance
Settings related with the appearance and style of the report.
Parameter | Type | Default | Description |
---|---|---|---|
html.minify_html |
bool | True |
If True , the output HTML is minified using the htmlmin package. |
html.use_local_assets |
bool | True |
If True , all assets (stylesheets, scripts, images) are stored locally. If False , a CDN is used for some stylesheets and scripts. |
html.inline |
boolean | True |
If True , all assets are contained in the report. If False , then a web export is created, where all assets are stored in the '[REPORT_NAME]_assets/' directory. |
html.navbar_show |
boolean | True |
Whether to include a navigation bar in the report |
html.style.theme |
string | None |
Select a bootswatch theme. Available options: flatly (dark blue) and united (orange) |
html.style.logo |
string | nan | A base64 encoded logo, to display in the navigation bar. |
html.style.primary_color |
string | #337ab7 | The primary color to use in the report. |
html.style.full_width |
boolean | False |
By default, the width of the report is fixed. If set to True , the full width of the screen is used. |