Dataset description & Metadata

Dataset description

When sharing reports with coworkers or publishing online, it might be important to include metadata of the dataset, such as author, copyright holder or descriptions. ydata-profiling allows complementing a report with that information. Inspired by\'s Dataset, the currently supported properties are description, creator, author, url, copyright_year and copyright_holder.

The following example shows how to generate a report with a description, copyright_holder copyright_year, creator and url. In the generated report, these properties are found in the Overview, under About.

Add profile report description
report = df.profile_report(
    title="Masked data",
        "description": "This profiling report was generated using a sample of 5% of the original dataset.",
        "copyright_holder": "StataCorp LLC",
        "copyright_year": 2020,
        "url": "",


Column descriptions

In addition to providing dataset details, often users want to include column-specific descriptions when sharing reports with team members and stakeholders. ydata-profiling supports creating these descriptions, so that the report includes a built-in data dictionary. By default, the descriptions are presented in the Overview section of the report, next to each variable.

Generate a report with per-variable descriptions
profile = df.profile_report(
        "descriptions": {
            "files": "Files in the filesystem, # variable name: variable description",
            "datec": "Creation date",
            "datem": "Modification date",


Alternatively, column descriptions can be loaded from a JSON file:

    column name 1: column 1 definition,
    column name 2: column 2 definition
Generate a report with descriptions per variable from a JSON definitions file
import json
import pandas as pd
import ydata_profiling

definition_file = dataset_column_definition.json

# Read the variable descriptions
with open(definition_file, r) as f:
    definitions = json.load(f)

# By default, the descriptions are presented in the Overview section, next to each variable
report = df.profile_report(variable={"descriptions": definitions})

# We can disable showing the descriptions next to each variable
report = df.profile_report(
    variable={"descriptions": definitions}, show_variable_description=False


Dataset schema

In addition to providing dataset details, users often want to include set type schemas. This is particularly important when integrating ydata-profiling generation with the information already in a data catalog. When using ydata-profiling ProfileReport, users can set the type_schema property to control the generated profiling data types. By default, the type_schema is automatically inferred with visions.

Set the variable type schema to Generate the profile report
import json
import pandas as pd

from ydata_profiling import ProfileReport
from ydata_profiling.utils.cache import cache_file

file_name = cache_file(
df = pd.read_csv(file_name)

type_schema = {"Survived": "categorical", "Embarked": "categorical"}

# We can set the type_schema only for the variables that we are certain of their types. All the other will be automatically inferred.
report = ProfileReport(df, title="Titanic EDA", type_schema=type_schema)