History & community

The ydata-profiling project became what it is today due to the work of the creators to make it successful. This page aims to highlight a bit of the development history. For the full picture, have a look at the contributor history.

YData is the company behind this successful package being responsible for releases such as the support for time-series, compare datasets and spark support.

Thank you to our amazing contributors

A big thank you to all our amazing contributors!

Contributors wall made with contrib.rocks.

Inception

In 2016, Jos Polfliet was working for SAS Institute and was getting bored with doing the same types of exploratory data analysis over and over again. Automating his own logic, he noticed it was useful and decided to open-source it under the MIT License. The package was named pandas-profiling as a contraction of pandas and data profiling. The idea was to enable the user to perform automated exploratory data analysis, beyond what the df.describe() function was offering and by abusing Jupyter\'s HTML output. Since that start, human years of repetitive plotting and summary statistics have been saved from the Machine Learning community.

Second life

Since May 2019, principal development has been taken over by Simon Brugman. The startup that he co-founded was an early adopter of the package, and he heavily invested in growing the package with experience brought from using it in the industry. Simon led the package through a huge refactor (99.5% was changed) and two major releases, and great collaborations, most notably with Ian Eaves in visions.

Profiling as part of Data-Centric AI -----------Since February 2022, YData as committed to continuous support and improvement of pandas-profiling. Our drive as maintainers of the package is to make data scientist fall in love by the ease and quality of profiling delivered by one of the best profiling packages. Since 2022, YData team have already delivered several new releases, that included major features such as Time-Series datasets analysis, compare of 2 datasets and most recently integration with big data engine, Spark.

pandas-profiling has been named one of the Top 20 ML packages by Google.

A huge thank you to 2 of the most iconic contributors who made possible to compare 2 dataset Simon Brugman, and to take the scale of profiling to another level with Spark Edwin Chan.

Where are we now?

At the time of writing, pandas-profiling is receiving a new face and name ydata-profiling. Derived from the most recent and major feature, Spark support, we have decided to move from [pandas]{.title-ref} to a name that opens the possibility of new integrations and developments.

This is the most popular tool in the world for data exploration in Python, counting with > 11k Github stars, 50 million downloads and users working in any industry, including many at FAANG, banks and insurance companies, startups and universities.

What's next?

ydata-profiling is committed to the mission of helping data-scientists to adopt a Data-Centric approach towards the development of AI. Continuous development and support will to be part of the development of one of the most beloved open-sources by the data science community.

New features are expected, and it will be important to learn from you your needs and expectations so the future can be even brighter. Join the DCAI community and let us know your thoughts.