Overview

Brought to you by YData

Dataset statistics

Number of variables15
Number of observations45726
Missing cells29682
Missing cells (%)4.3%
Duplicate rows10
Duplicate rows (%)< 0.1%
Total size in memory24.7 MiB
Average record size in memory566.8 B

Variable types

Text3
Numeric5
Categorical4
DateTime1
Boolean1
Unsupported1

Alerts

source has constant value "NASA"Constant
Dataset has 10 (< 0.1%) duplicate rowsDuplicates
reclat is highly overall correlated with reclat_city and 1 other fieldsHigh correlation
reclat_city is highly overall correlated with reclat and 1 other fieldsHigh correlation
reclong is highly overall correlated with reclat and 1 other fieldsHigh correlation
nametype is highly imbalanced (98.2%)Imbalance
fall is highly imbalanced (83.4%)Imbalance
reclat has 7315 (16.0%) missing valuesMissing
reclong has 7315 (16.0%) missing valuesMissing
GeoLocation has 7315 (16.0%) missing valuesMissing
reclat_city has 7315 (16.0%) missing valuesMissing
mass (g) is highly skewed (γ1 = 76.91847245)Skewed
unhashable is an unsupported type, check if it needs cleaning or further analysisUnsupported
reclat has 6438 (14.1%) zerosZeros
reclong has 6214 (13.6%) zerosZeros

Reproduction

Analysis started2024-09-07 10:05:24.881877
Analysis finished2024-09-07 10:05:29.789002
Duration4.91 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

name
Text

Distinct45716
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
2024-09-07T10:05:30.049876image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length28
Median length25
Mean length17.782487
Min length2

Characters and Unicode

Total characters813122
Distinct characters96
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45706 ?
Unique (%)> 99.9%

Sample

1st rowAachen
2nd rowAarhus
3rd rowAbee
4th rowAcapulco
5th rowAchiras
ValueCountFrequency (%)
yamato 7269
 
5.7%
range 6575
 
5.2%
africa 4502
 
3.6%
northwest 4499
 
3.5%
hills 3995
 
3.2%
queen 3445
 
2.7%
alexandra 3444
 
2.7%
mountains 3004
 
2.4%
al 2663
 
2.1%
grove 2496
 
2.0%
Other values (37726) 84860
66.9%
2024-09-07T10:05:30.506323image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
81032
 
10.0%
a 72715
 
8.9%
e 48167
 
5.9%
n 38392
 
4.7%
0 34943
 
4.3%
r 33097
 
4.1%
i 32658
 
4.0%
l 31873
 
3.9%
t 30898
 
3.8%
o 30428
 
3.7%
Other values (86) 378919
46.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 813122
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
81032
 
10.0%
a 72715
 
8.9%
e 48167
 
5.9%
n 38392
 
4.7%
0 34943
 
4.3%
r 33097
 
4.1%
i 32658
 
4.0%
l 31873
 
3.9%
t 30898
 
3.8%
o 30428
 
3.7%
Other values (86) 378919
46.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 813122
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
81032
 
10.0%
a 72715
 
8.9%
e 48167
 
5.9%
n 38392
 
4.7%
0 34943
 
4.3%
r 33097
 
4.1%
i 32658
 
4.0%
l 31873
 
3.9%
t 30898
 
3.8%
o 30428
 
3.7%
Other values (86) 378919
46.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 813122
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
81032
 
10.0%
a 72715
 
8.9%
e 48167
 
5.9%
n 38392
 
4.7%
0 34943
 
4.3%
r 33097
 
4.1%
i 32658
 
4.0%
l 31873
 
3.9%
t 30898
 
3.8%
o 30428
 
3.7%
Other values (86) 378919
46.6%

id
Real number (ℝ)

Distinct45716
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26883.906
Minimum1
Maximum57458
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size357.4 KiB
2024-09-07T10:05:30.650615image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2388.75
Q112681.25
median24256.5
Q340653.5
95-th percentile54890.75
Maximum57458
Range57457
Interquartile range (IQR)27972.25

Descriptive statistics

Standard deviation16863.446
Coefficient of variation (CV)0.62726917
Kurtosis-1.1601308
Mean26883.906
Median Absolute Deviation (MAD)13264
Skewness0.26653007
Sum1.2292935 × 109
Variance2.843758 × 108
MonotonicityNot monotonic
2024-09-07T10:05:30.802457image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 2
 
< 0.1%
2 2
 
< 0.1%
6 2
 
< 0.1%
10 2
 
< 0.1%
370 2
 
< 0.1%
379 2
 
< 0.1%
390 2
 
< 0.1%
392 2
 
< 0.1%
398 2
 
< 0.1%
417 2
 
< 0.1%
Other values (45706) 45706
> 99.9%
ValueCountFrequency (%)
1 2
< 0.1%
2 2
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 2
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 2
< 0.1%
11 1
< 0.1%
ValueCountFrequency (%)
57458 1
< 0.1%
57457 1
< 0.1%
57456 1
< 0.1%
57455 1
< 0.1%
57454 1
< 0.1%
57453 1
< 0.1%
57436 1
< 0.1%
57435 1
< 0.1%
57434 1
< 0.1%
57433 1
< 0.1%

nametype
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
Valid
45651 
Relict
 
75

Length

Max length6
Median length5
Mean length5.0016402
Min length5

Characters and Unicode

Total characters228705
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowValid
2nd rowValid
3rd rowValid
4th rowValid
5th rowValid

Common Values

ValueCountFrequency (%)
Valid 45651
99.8%
Relict 75
 
0.2%

Length

2024-09-07T10:05:30.943127image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-09-07T10:05:31.042135image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
valid 45651
99.8%
relict 75
 
0.2%

Most occurring characters

ValueCountFrequency (%)
l 45726
20.0%
i 45726
20.0%
V 45651
20.0%
a 45651
20.0%
d 45651
20.0%
R 75
 
< 0.1%
e 75
 
< 0.1%
c 75
 
< 0.1%
t 75
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 228705
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 45726
20.0%
i 45726
20.0%
V 45651
20.0%
a 45651
20.0%
d 45651
20.0%
R 75
 
< 0.1%
e 75
 
< 0.1%
c 75
 
< 0.1%
t 75
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 228705
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 45726
20.0%
i 45726
20.0%
V 45651
20.0%
a 45651
20.0%
d 45651
20.0%
R 75
 
< 0.1%
e 75
 
< 0.1%
c 75
 
< 0.1%
t 75
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 228705
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 45726
20.0%
i 45726
20.0%
V 45651
20.0%
a 45651
20.0%
d 45651
20.0%
R 75
 
< 0.1%
e 75
 
< 0.1%
c 75
 
< 0.1%
t 75
 
< 0.1%
Distinct466
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
2024-09-07T10:05:31.291976image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length26
Median length2
Mean length3.0525303
Min length1

Characters and Unicode

Total characters139580
Distinct characters62
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique145 ?
Unique (%)0.3%

Sample

1st rowL5
2nd rowH6
3rd rowEH4
4th rowAcapulcoite
5th rowL6
ValueCountFrequency (%)
l6 8341
17.6%
h5 7165
15.1%
l5 4818
10.2%
h6 4530
9.6%
h4 4223
 
8.9%
ll5 2766
 
5.8%
ll6 2046
 
4.3%
l4 1256
 
2.7%
iron 1070
 
2.3%
h4/5 428
 
0.9%
Other values (434) 10712
22.6%
2024-09-07T10:05:31.716837image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/