Dataset statistics
Number of variables | 15 |
---|---|
Number of observations | 45726 |
Missing cells | 29682 |
Missing cells (%) | 4.3% |
Duplicate rows | 10 |
Duplicate rows (%) | < 0.1% |
Total size in memory | 24.7 MiB |
Average record size in memory | 566.8 B |
Variable types
Text | 3 |
---|---|
Numeric | 5 |
Categorical | 4 |
DateTime | 1 |
Boolean | 1 |
Unsupported | 1 |
source has constant value "NASA" | Constant |
Dataset has 10 (< 0.1%) duplicate rows | Duplicates |
reclat is highly overall correlated with reclat_city and 1 other fields | High correlation |
reclat_city is highly overall correlated with reclat and 1 other fields | High correlation |
reclong is highly overall correlated with reclat and 1 other fields | High correlation |
nametype is highly imbalanced (98.2%) | Imbalance |
fall is highly imbalanced (83.4%) | Imbalance |
reclat has 7315 (16.0%) missing values | Missing |
reclong has 7315 (16.0%) missing values | Missing |
GeoLocation has 7315 (16.0%) missing values | Missing |
reclat_city has 7315 (16.0%) missing values | Missing |
mass (g) is highly skewed (γ1 = 76.91847245) | Skewed |
unhashable is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
reclat has 6438 (14.1%) zeros | Zeros |
reclong has 6214 (13.6%) zeros | Zeros |
Reproduction
Analysis started | 2024-09-07 10:05:24.881877 |
---|---|
Analysis finished | 2024-09-07 10:05:29.789002 |
Duration | 4.91 seconds |
Software version | ydata-profiling v0.0.dev0 |
Download configuration | config.json |
name
Text
Distinct | 45716 |
---|---|
Distinct (%) | > 99.9% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.3 MiB |
Value | Count | Frequency (%) |
yamato | 7269 | 5.7% |
range | 6575 | 5.2% |
africa | 4502 | 3.6% |
northwest | 4499 | 3.5% |
hills | 3995 | 3.2% |
queen | 3445 | 2.7% |
alexandra | 3444 | 2.7% |
mountains | 3004 | 2.4% |
al | 2663 | 2.1% |
grove | 2496 | 2.0% |
Other values (37726) | 84860 |
Most occurring characters
Value | Count | Frequency (%) |
81032 | 10.0% | |
a | 72715 | 8.9% |
e | 48167 | 5.9% |
n | 38392 | 4.7% |
0 | 34943 | 4.3% |
r | 33097 | 4.1% |
i | 32658 | 4.0% |
l | 31873 | 3.9% |
t | 30898 | 3.8% |
o | 30428 | 3.7% |
Other values (86) | 378919 |
Most occurring categories
Value | Count | Frequency (%) |
(unknown) | 813122 |
Most frequent character per category
(unknown)
Value | Count | Frequency (%) |
81032 | 10.0% | |
a | 72715 | 8.9% |
e | 48167 | 5.9% |
n | 38392 | 4.7% |
0 | 34943 | 4.3% |
r | 33097 | 4.1% |
i | 32658 | 4.0% |
l | 31873 | 3.9% |
t | 30898 | 3.8% |
o | 30428 | 3.7% |
Other values (86) | 378919 |
Most occurring scripts
Value | Count | Frequency (%) |
(unknown) | 813122 |
Most frequent character per script
(unknown)
Value | Count | Frequency (%) |
81032 | 10.0% | |
a | 72715 | 8.9% |
e | 48167 | 5.9% |
n | 38392 | 4.7% |
0 | 34943 | 4.3% |
r | 33097 | 4.1% |
i | 32658 | 4.0% |
l | 31873 | 3.9% |
t | 30898 | 3.8% |
o | 30428 | 3.7% |
Other values (86) | 378919 |
Most occurring blocks
Value | Count | Frequency (%) |
(unknown) | 813122 |
Most frequent character per block
(unknown)
Value | Count | Frequency (%) |
81032 | 10.0% | |
a | 72715 | 8.9% |
e | 48167 | 5.9% |
n | 38392 | 4.7% |
0 | 34943 | 4.3% |
r | 33097 | 4.1% |
i | 32658 | 4.0% |
l | 31873 | 3.9% |
t | 30898 | 3.8% |
o | 30428 | 3.7% |
Other values (86) | 378919 |
id
Real number (ℝ)
Distinct | 45716 |
---|---|
Distinct (%) | > 99.9% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 26883.906 |
Minimum | 1 |
---|---|
Maximum | 57458 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 357.4 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 2388.75 |
Q1 | 12681.25 |
median | 24256.5 |
Q3 | 40653.5 |
95-th percentile | 54890.75 |
Maximum | 57458 |
Range | 57457 |
Interquartile range (IQR) | 27972.25 |
Descriptive statistics
Standard deviation | 16863.446 |
---|---|
Coefficient of variation (CV) | 0.62726917 |
Kurtosis | -1.1601308 |
Mean | 26883.906 |
Median Absolute Deviation (MAD) | 13264 |
Skewness | 0.26653007 |
Sum | 1.2292935 × 109 |
Variance | 2.843758 × 108 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 2 | < 0.1% |
2 | 2 | < 0.1% |
6 | 2 | < 0.1% |
10 | 2 | < 0.1% |
370 | 2 | < 0.1% |
379 | 2 | < 0.1% |
390 | 2 | < 0.1% |
392 | 2 | < 0.1% |
398 | 2 | < 0.1% |
417 | 2 | < 0.1% |
Other values (45706) | 45706 |
Value | Count | Frequency (%) |
1 | 2 | |
2 | 2 | |
4 | 1 | |
5 | 1 | |
6 | 2 | |
7 | 1 | |
8 | 1 | |
9 | 1 | |
10 | 2 | |
11 | 1 |
Value | Count | Frequency (%) |
57458 | 1 | |
57457 | 1 | |
57456 | 1 | |
57455 | 1 | |
57454 | 1 | |
57453 | 1 | |
57436 | 1 | |
57435 | 1 | |
57434 | 1 | |
57433 | 1 |
nametype
Categorical
IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 2.7 MiB |
Valid | |
---|---|
Relict | 75 |
Common Values
Value | Count | Frequency (%) |
Valid | 45651 | |
Relict | 75 | 0.2% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
valid | 45651 | |
relict | 75 | 0.2% |
Most occurring characters
Value | Count | Frequency (%) |
l | 45726 | |
i | 45726 | |
V | 45651 | |
a | 45651 | |
d | 45651 | |
R | 75 | < 0.1% |
e | 75 | < 0.1% |
c | 75 | < 0.1% |
t | 75 | < 0.1% |
Most occurring categories
Value | Count | Frequency (%) |
(unknown) | 228705 |
Most frequent character per category
(unknown)
Value | Count | Frequency (%) |
l | 45726 | |
i | 45726 | |
V | 45651 | |
a | 45651 | |
d | 45651 | |
R | 75 | < 0.1% |
e | 75 | < 0.1% |
c | 75 | < 0.1% |
t | 75 | < 0.1% |
Most occurring scripts
Value | Count | Frequency (%) |
(unknown) | 228705 |
Most frequent character per script
(unknown)
Value | Count | Frequency (%) |
l | 45726 | |
i | 45726 | |
V | 45651 | |
a | 45651 | |
d | 45651 | |
R | 75 | < 0.1% |
e | 75 | < 0.1% |
c | 75 | < 0.1% |
t | 75 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
(unknown) | 228705 |
Most frequent character per block
(unknown)
Value | Count | Frequency (%) |
l | 45726 | |
i | 45726 | |
V | 45651 | |
a | 45651 | |
d | 45651 | |
R | 75 | < 0.1% |
e | 75 | < 0.1% |
c | 75 | < 0.1% |
t | 75 | < 0.1% |
recclass
Text
Distinct | 466 |
---|---|
Distinct (%) | 1.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 2.6 MiB |
Value | Count | Frequency (%) |
l6 | 8341 | |
h5 | 7165 | |
l5 | 4818 | |
h6 | 4530 | |
h4 | 4223 | 8.9% |
ll5 | 2766 | 5.8% |
ll6 | 2046 | 4.3% |
l4 | 1256 | 2.7% |
iron | 1070 | 2.3% |
h4/5 | 428 | 0.9% |
Other values (434) | 10712 |