History & community
The ydata-profiling
project became what it is today due to the work of
the creators to make it successful. This page aims to highlight a bit
of the development history. For the full picture, have a look at the
contributor
history.
YData is the company behind this successful package being responsible for releases such as the support for time-series, compare datasets and spark support.
Thank you to our amazing contributors
A big thank you to all our amazing contributors!
Contributors wall made with contrib.rocks.
Inception
In 2016, Jos Polfliet was
working for SAS Institute and was getting bored with doing the same
types of exploratory data analysis over and over again. Automating his
own logic, he noticed it was useful and decided to open-source it under
the MIT License. The package was named pandas-profiling
as a
contraction of pandas and data profiling. The idea was to enable the
user to perform automated exploratory data analysis, beyond what the
df.describe()
function was offering and by abusing Jupyter\'s HTML
output. Since that start, human years of repetitive plotting and summary
statistics have been saved from the Machine Learning community.
Second life
Since May 2019, principal development has been taken over by Simon Brugman. The startup that he co-founded was an early adopter of the package, and he heavily invested in growing the package with experience brought from using it in the industry. Simon led the package through a huge refactor (99.5% was changed) and two major releases, and great collaborations, most notably with Ian Eaves in visions.
Profiling as part of Data-Centric AI -----------Since February
2022, YData as committed to continuous support and improvement of
pandas-profiling
. Our drive as maintainers of the package is to make
data scientist fall in love by the ease and quality of profiling
delivered by one of the best profiling packages. Since 2022,
YData team have already delivered several new
releases, that included major features such as Time-Series datasets
analysis, compare of 2 datasets and most recently integration with big
data engine, Spark.
pandas-profiling
has been named one of the Top 20 ML packages by
Google.
A huge thank you to 2 of the most iconic contributors who made possible to compare 2 dataset Simon Brugman, and to take the scale of profiling to another level with Spark Edwin Chan.
Where are we now?
At the time of writing, pandas-profiling
is receiving a new face and
name ydata-profiling
. Derived from the most recent and major feature,
Spark support, we have decided to move from [pandas]{.title-ref} to a
name that opens the possibility of new integrations and developments.
This is the most popular tool in the world for data exploration in Python, counting with > 11k Github stars, 50 million downloads and users working in any industry, including many at FAANG, banks and insurance companies, startups and universities.
What's next?
ydata-profiling
is committed to the mission of helping data-scientists
to adopt a Data-Centric approach towards the development of AI.
Continuous development and support will to be part of the development of
one of the most beloved open-sources by the data science community.
New features are expected, and it will be important to learn from you your needs and expectations so the future can be even brighter. Join the DCAI community and let us know your thoughts.