When sharing reports with coworkers or publishing online, it might be
important to include metadata of the dataset, such as author, copyright
holder or descriptions. ydata-profiling allows complementing a report
with that information. Inspired by schema.org\'s
Dataset, the currently supported properties
are description, creator, author, url, copyright_year and
copyright_holder.
The following example shows how to generate a report with a
description, copyright_holdercopyright_year, creator and url.
In the generated report, these properties are found in the Overview,
under About.
report=df.profile_report(title="Masked data",dataset={"description":"This profiling report was generated using a sample of 5% of the original dataset.","copyright_holder":"StataCorp LLC","copyright_year":2020,"url":"http://www.stata-press.com/data/r15/auto2.dta",},)report.to_file(Path("stata_auto_report.html"))
Column descriptions
In addition to providing dataset details, often users want to include
column-specific descriptions when sharing reports with team members and
stakeholders. ydata-profiling supports creating these descriptions, so
that the report includes a built-in data dictionary. By default, the
descriptions are presented in the Overview section of the report, next
to each variable.
importjsonimportpandasaspdimportydata_profilingdefinition_file=dataset_column_definition.json# Read the variable descriptionswithopen(definition_file,r)asf:definitions=json.load(f)# By default, the descriptions are presented in the Overview section, next to each variablereport=df.profile_report(variable={"descriptions":definitions})# We can disable showing the descriptions next to each variablereport=df.profile_report(variable={"descriptions":definitions},show_variable_description=False)report.to_file("report.html")
Dataset schema
In addition to providing dataset details, users often want to include
set type schemas. This is particularly important when integrating
ydata-profiling generation with the information already in a data
catalog. When using ydata-profiling ProfileReport, users can set the
type_schema property to control the generated profiling data types. By
default, the type_schema is automatically inferred with visions.
Set the variable type schema to Generate the profile report
importjsonimportpandasaspdfromydata_profilingimportProfileReportfromydata_profiling.utils.cacheimportcache_filefile_name=cache_file("titanic.csv","https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv",)df=pd.read_csv(file_name)type_schema={"Survived":"categorical","Embarked":"categorical"}# We can set the type_schema only for the variables that we are certain of their types. All the other will be automatically inferred.report=ProfileReport(df,title="Titanic EDA",type_schema=type_schema)report.to_file("report.html")