Dataset description
When sharing reports with coworkers or publishing online, it might be
important to include metadata of the dataset, such as author, copyright
holder or descriptions. ydata-profiling allows complementing a report
with that information. Inspired by schema.org\'s
Dataset, the currently supported properties
are description, creator, author, url, copyright_year and
copyright_holder.
The following example shows how to generate a report with a
description, copyright_holder copyright_year, creator and url.
In the generated report, these properties are found in the Overview,
under About.
| Add profile report description | 
|---|
|  | report = df.profile_report(
    title="Masked data",
    dataset={
        "description": "This profiling report was generated using a sample of 5% of the original dataset.",
        "copyright_holder": "StataCorp LLC",
        "copyright_year": 2020,
        "url": "http://www.stata-press.com/data/r15/auto2.dta",
    },
)
report.to_file(Path("stata_auto_report.html"))
 | 
Column descriptions
In addition to providing dataset details, often users want to include
column-specific descriptions when sharing reports with team members and
stakeholders. ydata-profiling supports creating these descriptions, so
that the report includes a built-in data dictionary. By default, the
descriptions are presented in the Overview section of the report, next
to each variable.
| Generate a report with per-variable descriptions | 
|---|
|  | profile = df.profile_report(
    variables={
        "descriptions": {
            "files": "Files in the filesystem, # variable name: variable description",
            "datec": "Creation date",
            "datem": "Modification date",
        }
    }
)
profile.to_file(report.html)
 | 
Alternatively, column descriptions can be loaded from a JSON file:
| dataset_column_definition.json | 
|---|
|  | {
    column name 1: column 1 definition,
    column name 2: column 2 definition
}
 | 
| Generate a report with descriptions per variable from a JSON definitions file | 
|---|
|  | import json
import pandas as pd
import ydata_profiling
definition_file = dataset_column_definition.json
# Read the variable descriptions
with open(definition_file, r) as f:
    definitions = json.load(f)
# By default, the descriptions are presented in the Overview section, next to each variable
report = df.profile_report(variable={"descriptions": definitions})
# We can disable showing the descriptions next to each variable
report = df.profile_report(
    variable={"descriptions": definitions}, show_variable_description=False
)
report.to_file("report.html")
 | 
Dataset schema
In addition to providing dataset details, users often want to include
set type schemas. This is particularly important when integrating
ydata-profiling generation with the information already in a data
catalog. When using ydata-profiling ProfileReport, users can set the
type_schema property to control the generated profiling data types. By
default, the type_schema is automatically inferred with visions.
| Set the variable type schema to Generate the profile report | 
|---|
|  | import json
import pandas as pd
from ydata_profiling import ProfileReport
from ydata_profiling.utils.cache import cache_file
file_name = cache_file(
    "titanic.csv",
    "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv",
)
df = pd.read_csv(file_name)
type_schema = {"Survived": "categorical", "Embarked": "categorical"}
# We can set the type_schema only for the variables that we are certain of their types. All the other will be automatically inferred.
report = ProfileReport(df, title="Titanic EDA", type_schema=type_schema)
report.to_file("report.html")
 |