Overview

Dataset statistics

Number of variables15
Number of observations45726
Missing cells29682
Missing cells (%)4.3%
Duplicate rows10
Duplicate rows (%)< 0.1%
Total size in memory24.0 MiB
Average record size in memory550.8 B

Variable types

Text3
Numeric5
Categorical4
DateTime1
Boolean1
Unsupported1

Alerts

source has constant value ""Constant
Dataset has 10 (< 0.1%) duplicate rowsDuplicates
reclat is highly overall correlated with reclong and 1 other fieldsHigh correlation
reclong is highly overall correlated with reclat and 1 other fieldsHigh correlation
reclat_city is highly overall correlated with reclat and 1 other fieldsHigh correlation
nametype is highly imbalanced (98.2%)Imbalance
fall is highly imbalanced (83.4%)Imbalance
reclat has 7315 (16.0%) missing valuesMissing
reclong has 7315 (16.0%) missing valuesMissing
GeoLocation has 7315 (16.0%) missing valuesMissing
reclat_city has 7315 (16.0%) missing valuesMissing
mass (g) is highly skewed (γ1 = 76.91847245)Skewed
unhashable is an unsupported type, check if it needs cleaning or further analysisUnsupported
reclat has 6438 (14.1%) zerosZeros
reclong has 6214 (13.6%) zerosZeros

Reproduction

Analysis started2023-09-12 08:37:54.979373
Analysis finished2023-09-12 08:38:01.952463
Duration6.97 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

name
Text

Distinct45716
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size3.3 MiB
2023-09-12T09:38:02.236687image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length28
Median length25
Mean length17.782487
Min length2

Characters and Unicode

Total characters813122
Distinct characters96
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45706 ?
Unique (%)> 99.9%

Sample

1st rowAachen
2nd rowAarhus
3rd rowAbee
4th rowAcapulco
5th rowAchiras
ValueCountFrequency (%)
yamato 7269
 
5.7%
range 6575
 
5.2%
africa 4502
 
3.6%
northwest 4499
 
3.5%
hills 3995
 
3.2%
queen 3445
 
2.7%
alexandra 3444
 
2.7%
mountains 3004
 
2.4%
al 2663
 
2.1%
grove 2496
 
2.0%
Other values (37726) 84860
66.9%
2023-09-12T09:38:02.767977image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
81032
 
10.0%
a 72715
 
8.9%
e 48167
 
5.9%
n 38392
 
4.7%
0 34943
 
4.3%
r 33097
 
4.1%
i 32658
 
4.0%
l 31873
 
3.9%
t 30898
 
3.8%
o 30428
 
3.7%
Other values (86) 378919
46.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 440949
54.2%
Decimal Number 205415
25.3%
Uppercase Letter 84942
 
10.4%
Space Separator 81032
 
10.0%
Close Punctuation 295
 
< 0.1%
Open Punctuation 295
 
< 0.1%
Dash Punctuation 98
 
< 0.1%
Other Punctuation 96
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 72715
16.5%
e 48167
10.9%
n 38392
8.7%
r 33097
 
7.5%
i 32658
 
7.4%
l 31873
 
7.2%
t 30898
 
7.0%
o 30428
 
6.9%
s 20972
 
4.8%
m 12393
 
2.8%
Other values (39) 89356
20.3%
Uppercase Letter
ValueCountFrequency (%)
A 14120
16.6%
M 11173
13.2%
R 7599
8.9%
Y 7327
8.6%
N 5796
 
6.8%
H 5676
 
6.7%
G 4682
 
5.5%
L 4630
 
5.5%
D 3777
 
4.4%
Q 3478
 
4.1%
Other values (21) 16684
19.6%
Decimal Number
ValueCountFrequency (%)
0 34943
17.0%
9 24444
11.9%
8 22179
10.8%
1 21986
10.7%
2 19839
9.7%
7 19347
9.4%
3 17379
8.5%
4 16001
7.8%
5 14812
7.2%
6 14485
7.1%
Other Punctuation
ValueCountFrequency (%)
' 67
69.8%
. 29
30.2%
Space Separator
ValueCountFrequency (%)
81032
100.0%
Close Punctuation
ValueCountFrequency (%)
) 295
100.0%
Open Punctuation
ValueCountFrequency (%)
( 295
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 98
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 525891
64.7%
Common 287231
35.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 72715
13.8%
e 48167
 
9.2%
n 38392
 
7.3%
r 33097
 
6.3%
i 32658
 
6.2%
l 31873
 
6.1%
t 30898
 
5.9%
o 30428
 
5.8%
s 20972
 
4.0%
A 14120
 
2.7%
Other values (70) 172571
32.8%
Common
ValueCountFrequency (%)
81032
28.2%
0 34943
12.2%
9 24444
 
8.5%
8 22179
 
7.7%
1 21986
 
7.7%
2 19839
 
6.9%
7 19347
 
6.7%
3 17379
 
6.1%
4 16001
 
5.6%
5 14812
 
5.2%
Other values (6) 15269
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 812638
99.9%
None 484
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
81032
 
10.0%
a 72715
 
8.9%
e 48167
 
5.9%
n 38392
 
4.7%
0 34943
 
4.3%
r 33097
 
4.1%
i 32658
 
4.0%
l 31873
 
3.9%
t 30898
 
3.8%
o 30428
 
3.7%
Other values (58) 378435
46.6%
None
ValueCountFrequency (%)
é 204
42.1%
ÅŸ 125
25.8%
Ö 63
 
13.0%
á 11
 
2.3%
ö 11
 
2.3%
ä 10
 
2.1%
ó 8
 
1.7%
ü 8
 
1.7%
ñ 8
 
1.7%
ã 5
 
1.0%
Other values (18) 31
 
6.4%

id
Real number (ℝ)

Distinct45716
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26883.906
Minimum1
Maximum57458
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size357.4 KiB
2023-09-12T09:38:02.968541image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2388.75
Q112681.25
median24256.5
Q340653.5
95-th percentile54890.75
Maximum57458
Range57457
Interquartile range (IQR)27972.25

Descriptive statistics

Standard deviation16863.446
Coefficient of variation (CV)0.62726917
Kurtosis-1.1601308
Mean26883.906
Median Absolute Deviation (MAD)13264
Skewness0.26653007
Sum1.2292935 × 109
Variance2.843758 × 108
MonotonicityNot monotonic
2023-09-12T09:38:03.146495image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 2
 
< 0.1%
6 2
 
< 0.1%
10 2
 
< 0.1%
370 2
 
< 0.1%
379 2
 
< 0.1%
390 2
 
< 0.1%
392 2
 
< 0.1%
398 2
 
< 0.1%
417 2
 
< 0.1%
2 2
 
< 0.1%
Other values (45706) 45706
> 99.9%
ValueCountFrequency (%)
1 2
< 0.1%
2 2
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 2
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 2
< 0.1%
11 1
< 0.1%
ValueCountFrequency (%)
57458 1
< 0.1%
57457 1
< 0.1%
57456 1
< 0.1%
57455 1
< 0.1%
57454 1
< 0.1%
57453 1
< 0.1%
57436 1
< 0.1%
57435 1
< 0.1%
57434 1
< 0.1%
57433 1
< 0.1%

nametype
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
Valid
45651 
Relict
 
75

Length

Max length6
Median length5
Mean length5.0016402
Min length5

Characters and Unicode

Total characters228705
Distinct characters9
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowValid
2nd rowValid
3rd rowValid
4th rowValid
5th rowValid

Common Values

ValueCountFrequency (%)
Valid 45651
99.8%
Relict 75
 
0.2%

Length

2023-09-12T09:38:03.305038image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-12T09:38:03.420490image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
valid 45651
99.8%
relict 75
 
0.2%

Most occurring characters

ValueCountFrequency (%)
l 45726
20.0%
i 45726
20.0%
V 45651
20.0%
a 45651
20.0%
d 45651
20.0%
R 75
 
< 0.1%
e 75
 
< 0.1%
c 75
 
< 0.1%
t 75
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 182979
80.0%
Uppercase Letter 45726
 
20.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 45726
25.0%
i 45726
25.0%
a 45651
24.9%
d 45651
24.9%
e 75
 
< 0.1%
c 75
 
< 0.1%
t 75
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
V 45651
99.8%
R 75
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 228705
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 45726
20.0%
i 45726
20.0%
V 45651
20.0%
a 45651
20.0%
d 45651
20.0%
R 75
 
< 0.1%
e 75
 
< 0.1%
c 75
 
< 0.1%
t 75
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 228705
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l 45726
20.0%
i 45726
20.0%
V 45651
20.0%
a 45651
20.0%
d 45651
20.0%
R 75
 
< 0.1%
e 75
 
< 0.1%
c 75
 
< 0.1%
t 75
 
< 0.1%
Distinct466
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
2023-09-12T09:38:03.653031image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length26
Median length2
Mean length3.0525303
Min length1

Characters and Unicode

Total characters139580
Distinct characters62
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique145 ?
Unique (%)0.3%

Sample

1st rowL5
2nd rowH6
3rd rowEH4
4th rowAcapulcoite
5th rowL6
ValueCountFrequency (%)
l6 8341
17.6%
h5 7165
15.1%
l5 4818
10.2%
h6 4530
9.6%
h4 4223
 
8.9%
ll5 2766
 
5.8%
ll6 2046
 
4.3%
l4 1256
 
2.7%
iron 1070
 
2.3%
h4/5 428
 
0.9%
Other values (434) 10712
22.6%
2023-09-12T09:38:04.109955image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
L 28467
20.4%
H 18396
13.2%
5 16419
11.8%
6 16132
11.6%
4 6930
 
5.0%
e 3972
 
2.8%
i 3834
 
2.7%
r 3648
 
2.6%
t 3327
 
2.4%
3 3278
 
2.3%
Other values (52) 35177
25.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 57793
41.4%
Decimal Number 44118
31.6%
Lowercase Letter 29926
21.4%
Other Punctuation 3293
 
2.4%
Dash Punctuation 1835
 
1.3%
Space Separator 1747
 
1.3%
Math Symbol 320
 
0.2%
Open Punctuation 274
 
0.2%
Close Punctuation 274
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 3972
13.3%
i 3834
12.8%
r 3648
12.2%
t 3327
11.1%
n 2520
8.4%
o 2458
8.2%
c 1767
 
5.9%
u 1469
 
4.9%
a 1409
 
4.7%
l 1016
 
3.4%
Other values (12) 4506
15.1%
Uppercase Letter
ValueCountFrequency (%)
L 28467
49.3%
H 18396
31.8%
I 2753
 
4.8%
C 1785
 
3.1%
E 1261
 
2.2%
A 985
 
1.7%
M 913
 
1.6%
B 754
 
1.3%
O 542
 
0.9%
V 350
 
0.6%
Other values (10) 1587
 
2.7%
Decimal Number
ValueCountFrequency (%)
5 16419
37.2%
6 16132
36.6%
4 6930
15.7%
3 3278
 
7.4%
2 646
 
1.5%
7 251
 
0.6%
8 216
 
0.5%
9 111
 
0.3%
1 100
 
0.2%
0 35
 
0.1%
Other Punctuation
ValueCountFrequency (%)
/ 1174
35.7%
. 1064
32.3%
, 1031
31.3%
? 24
 
0.7%
Math Symbol
ValueCountFrequency (%)
~ 319
99.7%
< 1
 
0.3%
Dash Punctuation
ValueCountFrequency (%)
- 1835
100.0%
Space Separator
ValueCountFrequency (%)
1747
100.0%
Open Punctuation
ValueCountFrequency (%)
( 274
100.0%
Close Punctuation
ValueCountFrequency (%)
) 274
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 87719
62.8%
Common 51861
37.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
L 28467
32.5%
H 18396
21.0%
e 3972
 
4.5%
i 3834
 
4.4%
r 3648
 
4.2%
t 3327
 
3.8%
I 2753
 
3.1%
n 2520
 
2.9%
o 2458
 
2.8%
C 1785
 
2.0%
Other values (32) 16559
18.9%
Common
ValueCountFrequency (%)
5 16419
31.7%
6 16132
31.1%
4 6930
13.4%
3 3278
 
6.3%
- 1835
 
3.5%
1747
 
3.4%
/ 1174
 
2.3%
. 1064
 
2.1%
, 1031
 
2.0%
2 646
 
1.2%
Other values (10) 1605
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 139580
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L 28467
20.4%
H 18396
13.2%
5 16419
11.8%
6 16132
11.6%
4 6930
 
5.0%
e 3972
 
2.8%
i 3834
 
2.7%
r 3648
 
2.6%
t 3327
 
2.4%
3 3278
 
2.3%
Other values (52) 35177
25.2%

mass (g)
Real number (ℝ)

SKEWED 

Distinct12576
Distinct (%)27.6%
Missing131
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean13278.426
Minimum0
Maximum60000000
Zeros19
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size357.4 KiB
2023-09-12T09:38:04.297623image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1.1
Q17.2
median32.61
Q3202.9
95-th percentile4000
Maximum60000000
Range60000000
Interquartile range (IQR)195.7

Descriptive statistics

Standard deviation574926.01
Coefficient of variation (CV)43.297752
Kurtosis6798.3984
Mean13278.426
Median Absolute Deviation (MAD)30.51
Skewness76.918472
Sum6.0542985 × 108
Variance3.3053992 × 1011
MonotonicityNot monotonic
2023-09-12T09:38:04.475009image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.3 171
 
0.4%
1.2 140
 
0.3%
1.4 138
 
0.3%
2.1 130
 
0.3%
2.4 126
 
0.3%
1.6 120
 
0.3%
0.5 119
 
0.3%
1.1 116
 
0.3%
3.8 114
 
0.2%
1.5 111
 
0.2%
Other values (12566) 44310
96.9%
(Missing) 131
 
0.3%
ValueCountFrequency (%)
0 19
< 0.1%
0.01 2
 
< 0.1%
0.013 1
 
< 0.1%
0.02 1
 
< 0.1%
0.03 1
 
< 0.1%
0.04 1
 
< 0.1%
0.05 1
 
< 0.1%
0.06 1
 
< 0.1%
0.07 3
 
< 0.1%
0.08 2
 
< 0.1%
ValueCountFrequency (%)
60000000 1
< 0.1%
58200000 1
< 0.1%
50000000 1
< 0.1%
30000000 1
< 0.1%
28000000 1
< 0.1%
26000000 1
< 0.1%
24300000 1
< 0.1%
24000000 1
< 0.1%
23000000 1
< 0.1%
22000000 1
< 0.1%

fall
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
Found
44609 
Fell
 
1117

Length

Max length5
Median length5
Mean length4.9755719
Min length4

Characters and Unicode

Total characters227513
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFell
2nd rowFell
3rd rowFell
4th rowFell
5th rowFell

Common Values

ValueCountFrequency (%)
Found 44609
97.6%
Fell 1117
 
2.4%

Length

2023-09-12T09:38:04.635197image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-12T09:38:04.869701image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
found 44609
97.6%
fell 1117
 
2.4%

Most occurring characters

ValueCountFrequency (%)
F 45726
20.1%
o 44609
19.6%
u 44609
19.6%
n 44609
19.6%
d 44609
19.6%
l 2234
 
1.0%
e 1117
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 181787
79.9%
Uppercase Letter 45726
 
20.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 44609
24.5%
u 44609
24.5%
n 44609
24.5%
d 44609
24.5%
l 2234
 
1.2%
e 1117
 
0.6%
Uppercase Letter
ValueCountFrequency (%)
F 45726
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 227513
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 45726
20.1%
o 44609
19.6%
u 44609
19.6%
n 44609
19.6%
d 44609
19.6%
l 2234
 
1.0%
e 1117
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 227513
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F 45726
20.1%
o 44609
19.6%
u 44609
19.6%
n 44609
19.6%
d 44609
19.6%
l 2234
 
1.0%
e 1117
 
0.5%

year
Date

Distinct265
Distinct (%)0.6%
Missing291
Missing (%)0.6%
Memory size357.4 KiB
Minimum1970-01-01 00:00:00
Maximum1970-01-01 00:00:00.000002
2023-09-12T09:38:05.012599image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:38:05.172901image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

reclat
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct12738
Distinct (%)33.2%
Missing7315
Missing (%)16.0%
Infinite0
Infinite (%)0.0%
Mean-39.107095
Minimum-87.36667
Maximum81.16667
Zeros6438
Zeros (%)14.1%
Negative23416
Negative (%)51.2%
Memory size357.4 KiB
2023-09-12T09:38:05.328297image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum-87.36667
5-th percentile-84.35476
Q1-76.71377
median-71.5
Q30
95-th percentile34.494325
Maximum81.16667
Range168.53334
Interquartile range (IQR)76.71377

Descriptive statistics

Standard deviation46.386011
Coefficient of variation (CV)-1.1861278
Kurtosis-1.4768651
Mean-39.107095
Median Absolute Deviation (MAD)12.76459
Skewness0.49131573
Sum-1502142.6
Variance2151.662
MonotonicityNot monotonic
2023-09-12T09:38:05.502303image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 6438
 
14.1%
-71.5 4761
 
10.4%
-84 3040
 
6.6%
-72 1506
 
3.3%
-79.68333 1130
 
2.5%
-76.71667 680
 
1.5%
-76.18333 539
 
1.2%
-84.21667 263
 
0.6%
-86.36667 226
 
0.5%
-86.71667 217
 
0.5%
Other values (12728) 19611
42.9%
(Missing) 7315
 
16.0%
ValueCountFrequency (%)
-87.36667 4
 
< 0.1%
-87.03333 3
 
< 0.1%
-86.93333 3
 
< 0.1%
-86.71667 217
0.5%
-86.56667 17
 
< 0.1%
-86.54488 1
 
< 0.1%
-86.5379 1
 
< 0.1%
-86.53734 1
 
< 0.1%
-86.53725 1
 
< 0.1%
-86.53035 1
 
< 0.1%
ValueCountFrequency (%)
81.16667 1
< 0.1%
76.53333 1
< 0.1%
76.13333 1
< 0.1%
72.88333 1
< 0.1%
72.68333 1
< 0.1%
70.73333 1
< 0.1%
70 1
< 0.1%
69.1 1
< 0.1%
68 1
< 0.1%
67.88333 1
< 0.1%

reclong
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct14640
Distinct (%)38.1%
Missing7315
Missing (%)16.0%
Infinite0
Infinite (%)0.0%
Mean61.052594
Minimum-165.43333
Maximum354.47333
Zeros6214
Zeros (%)13.6%
Negative4057
Negative (%)8.9%
Memory size357.4 KiB
2023-09-12T09:38:05.669801image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum-165.43333
5-th percentile-90.427
Q10
median35.66667
Q3157.16667
95-th percentile168
Maximum354.47333
Range519.90666
Interquartile range (IQR)157.16667

Descriptive statistics

Standard deviation80.655258
Coefficient of variation (CV)1.3210783
Kurtosis-0.73139356
Mean61.052594
Median Absolute Deviation (MAD)39.53972
Skewness-0.17438133
Sum2345091.2
Variance6505.2706
MonotonicityNot monotonic
2023-09-12T09:38:05.839222image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 6214
 
13.6%
35.66667 4985
 
10.9%
168 3040
 
6.6%
26 1506
 
3.3%
159.75 657
 
1.4%
159.66667 637
 
1.4%
157.16667 542
 
1.2%
155.75 473
 
1.0%
160.5 263
 
0.6%
-70 228
 
0.5%
Other values (14630) 19866
43.4%
(Missing) 7315
 
16.0%
ValueCountFrequency (%)
-165.43333 9
< 0.1%
-165.11667 17
< 0.1%
-163.16667 1
 
< 0.1%
-162.55 1
 
< 0.1%
-157.86667 1
 
< 0.1%
-157.78333 1
 
< 0.1%
-149.5 4
 
< 0.1%
-148.55 2
 
< 0.1%
-148 3
 
< 0.1%
-146.26667 1
 
< 0.1%
ValueCountFrequency (%)
354.47333 1
 
< 0.1%
178.2 1
 
< 0.1%
178.08333 1
 
< 0.1%
175.73028 1
 
< 0.1%
175.13333 1
 
< 0.1%
175 185
0.4%
174.50043 1
 
< 0.1%
174.4 1
 
< 0.1%
172.7 1
 
< 0.1%
172.6 1
 
< 0.1%

GeoLocation
Text

MISSING 

Distinct17100
Distinct (%)44.5%
Missing7315
Missing (%)16.0%
Memory size2.9 MiB
2023-09-12T09:38:06.136668image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length24
Median length22
Mean length17.304809
Min length10

Characters and Unicode

Total characters664695
Distinct characters16
Distinct categories6 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16363 ?
Unique (%)42.6%

Sample

1st row(50.775, 6.08333)
2nd row(56.18333, 10.23333)
3rd row(54.21667, -113.0)
4th row(16.88333, -99.9)
5th row(-33.16667, -64.95)
ValueCountFrequency (%)
0.0 12652
 
16.5%
35.66667 4991
 
6.5%
71.5 4761
 
6.2%
84.0 3041
 
4.0%
168.0 3040
 
4.0%
26.0 1512
 
2.0%
72.0 1506
 
2.0%
79.68333 1130
 
1.5%
76.71667 680
 
0.9%
159.75 657
 
0.9%
Other values (26608) 42852
55.8%
2023-09-12T09:38:06.626626image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 76822
11.6%
6 67560
 
10.2%
7 52499
 
7.9%
0 49033
 
7.4%
3 44771
 
6.7%
1 44476
 
6.7%
5 42757
 
6.4%
( 38411
 
5.8%
, 38411
 
5.8%
38411
 
5.8%
Other values (6) 171544
25.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 406756
61.2%
Other Punctuation 115233
 
17.3%
Open Punctuation 38411
 
5.8%
Space Separator 38411
 
5.8%
Close Punctuation 38411
 
5.8%
Dash Punctuation 27473
 
4.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6 67560
16.6%
7 52499
12.9%
0 49033
12.1%
3 44771
11.0%
1 44476
10.9%
5 42757
10.5%
8 32680
8.0%
2 29923
7.4%
4 23646
 
5.8%
9 19411
 
4.8%
Other Punctuation
ValueCountFrequency (%)
. 76822
66.7%
, 38411
33.3%
Open Punctuation
ValueCountFrequency (%)
( 38411
100.0%
Space Separator
ValueCountFrequency (%)
38411
100.0%
Close Punctuation
ValueCountFrequency (%)
) 38411
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 27473
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 664695
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 76822
11.6%
6 67560
 
10.2%
7 52499
 
7.9%
0 49033
 
7.4%
3 44771
 
6.7%
1 44476
 
6.7%
5 42757
 
6.4%
( 38411
 
5.8%
, 38411
 
5.8%
38411
 
5.8%
Other values (6) 171544
25.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 664695
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 76822
11.6%
6 67560
 
10.2%
7 52499
 
7.9%
0 49033
 
7.4%
3 44771
 
6.7%
1 44476
 
6.7%
5 42757
 
6.4%
( 38411
 
5.8%
, 38411
 
5.8%
38411
 
5.8%
Other values (6) 171544
25.8%

source
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
NASA
45726 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters182904
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNASA
2nd rowNASA
3rd rowNASA
4th rowNASA
5th rowNASA

Common Values

ValueCountFrequency (%)
NASA 45726
100.0%

Length

2023-09-12T09:38:06.804581image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-12T09:38:06.906738image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
nasa 45726
100.0%

Most occurring characters

ValueCountFrequency (%)
A 91452
50.0%
N 45726
25.0%
S 45726
25.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 182904
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 91452
50.0%
N 45726
25.0%
S 45726
25.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 182904
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 91452
50.0%
N 45726
25.0%
S 45726
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 182904
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 91452
50.0%
N 45726
25.0%
S 45726
25.0%

boolean
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.8 KiB
True
22934 
False
22792 
ValueCountFrequency (%)
True 22934
50.2%
False 22792
49.8%
2023-09-12T09:38:06.995649image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

mixed
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.5 MiB
A
22889 
1
22837 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters45726
Distinct characters2
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd rowA
3rd row1
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A 22889
50.1%
1 22837
49.9%

Length

2023-09-12T09:38:07.109967image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-12T09:38:07.216183image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
a 22889
50.1%
1 22837
49.9%

Most occurring characters

ValueCountFrequency (%)
A 22889
50.1%
1 22837
49.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 22889
50.1%
Decimal Number 22837
49.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 22889
100.0%
Decimal Number
ValueCountFrequency (%)
1 22837
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 22889
50.1%
Common 22837
49.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 22889
100.0%
Common
ValueCountFrequency (%)
1 22837
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 45726
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 22889
50.1%
1 22837
49.9%

unhashable
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size2.4 MiB

reclat_city
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct38401
Distinct (%)> 99.9%
Missing7315
Missing (%)16.0%
Infinite0
Infinite (%)0.0%
Mean-39.153542
Minimum-104.31717
Maximum77.749011
Zeros0
Zeros (%)0.0%
Negative26603
Negative (%)58.2%
Memory size357.4 KiB
2023-09-12T09:38:07.348732image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum-104.31717
5-th percentile-87.871058
Q1-78.407752
median-68.975293
Q34.7886449
95-th percentile35.42981
Maximum77.749011
Range182.06618
Interquartile range (IQR)83.196397

Descriptive statistics

Standard deviation46.685687
Coefficient of variation (CV)-1.1923745
Kurtosis-1.446385
Mean-39.153542
Median Absolute Deviation (MAD)17.255843
Skewness0.48160358
Sum-1503926.7
Variance2179.5534
MonotonicityNot monotonic
2023-09-12T09:38:07.521582image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50.51806008 2
 
< 0.1%
43.27957156 2
 
< 0.1%
52.01104434 2
 
< 0.1%
-32.5810219 2
 
< 0.1%
49.60726921 2
 
< 0.1%
-29.65152821 2
 
< 0.1%
36.5165896 2
 
< 0.1%
-23.28864666 2
 
< 0.1%
23.16596589 2
 
< 0.1%
52.70663547 2
 
< 0.1%
Other values (38391) 38391
84.0%
(Missing) 7315
 
16.0%
ValueCountFrequency (%)
-104.3171665 1
< 0.1%
-102.4312375 1
< 0.1%
-102.0868253 1
< 0.1%
-101.5556373 1
< 0.1%
-101.3269284 1
< 0.1%
-101.2084341 1
< 0.1%
-101.0146935 1
< 0.1%
-100.9191264 1
< 0.1%
-100.7856947 1
< 0.1%
-100.5751117 1
< 0.1%
ValueCountFrequency (%)
77.74901083 1
< 0.1%
72.80622023 1
< 0.1%
72.75730423 1
< 0.1%
72.42607973 1
< 0.1%
72.25809595 1
< 0.1%
71.78938297 1
< 0.1%
71.42543169 1
< 0.1%
70.89755212 1
< 0.1%
70.53373183 1
< 0.1%
70.48523932 1
< 0.1%

Interactions

2023-09-12T09:38:00.536794image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:57.969726image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:58.601282image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:59.222897image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:59.935579image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:38:00.663336image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:58.102344image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:58.726906image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:59.347707image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:38:00.054621image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:38:00.791446image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:58.229722image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:58.848715image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:59.472292image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:38:00.185537image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:38:00.917853image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:58.358641image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:58.967952image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:59.685664image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:38:00.301287image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:38:01.036789image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:58.472262image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:59.093055image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:37:59.808000image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:38:00.412617image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Correlations

2023-09-12T09:38:07.642248image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
idmass (g)reclatreclongreclat_citynametypefallbooleanmixed
id1.000-0.1420.261-0.3160.2190.1300.1260.0000.009
mass (g)-0.1421.0000.409-0.2810.4240.0000.0120.0000.003
reclat0.2610.4091.000-0.6500.9430.3490.4500.0000.013
reclong-0.316-0.281-0.6501.000-0.6180.0440.1950.0070.000
reclat_city0.2190.4240.943-0.6181.0000.3790.4240.0150.000
nametype0.1300.0000.3490.0440.3791.0000.0000.0000.000
fall0.1260.0120.4500.1950.4240.0001.0000.0000.000
boolean0.0000.0000.0000.0070.0150.0000.0001.0000.000
mixed0.0090.0030.0130.0000.0000.0000.0000.0001.000

Missing values

2023-09-12T09:38:01.241902image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-09-12T09:38:01.546204image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-09-12T09:38:01.831021image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

nameidnametyperecclassmass (g)fallyearreclatreclongGeoLocationsourcebooleanmixedunhashablereclat_city
0Aachen1ValidL521.0Fell1970-01-01 00:00:00.00000188050.775006.08333(50.775, 6.08333)NASATrue1[1]50.518060
1Aarhus2ValidH6720.0Fell1970-01-01 00:00:00.00000195156.1833310.23333(56.18333, 10.23333)NASAFalseA[1]52.011044
2Abee6ValidEH4107000.0Fell1970-01-01 00:00:00.00000195254.21667-113.00000(54.21667, -113.0)NASAFalse1[1]52.706635
3Acapulco10ValidAcapulcoite1914.0Fell1970-01-01 00:00:00.00000197616.88333-99.90000(16.88333, -99.9)NASAFalseA[1]23.165966
4Achiras370ValidL6780.0Fell1970-01-01 00:00:00.000001902-33.16667-64.95000(-33.16667, -64.95)NASAFalseA[1]-23.288647
5Adhi Kot379ValidEH44239.0Fell1970-01-01 00:00:00.00000191932.1000071.80000(32.1, 71.8)NASATrue1[1]36.516590
6Adzhi-Bogdo (stone)390ValidLL3-6910.0Fell1970-01-01 00:00:00.00000194944.8333395.16667(44.83333, 95.16667)NASATrue1[1]43.279572
7Agen392ValidH530000.0Fell1970-01-01 00:00:00.00000181444.216670.61667(44.21667, 0.61667)NASAFalseA[1]49.607269
8Aguada398ValidL61620.0Fell1970-01-01 00:00:00.000001930-31.60000-65.23333(-31.6, -65.23333)NASAFalse1[1]-32.581022
9Aguila Blanca417ValidL1440.0Fell1970-01-01 00:00:00.000001920-30.86667-64.55000(-30.86667, -64.55)NASAFalseA[1]-29.651528
nameidnametyperecclassmass (g)fallyearreclatreclongGeoLocationsourcebooleanmixedunhashablereclat_city
45716Aachen1ValidL521.0Fell1970-01-01 00:00:00.00000188050.775006.08333(50.775, 6.08333)NASATrue1[1]50.518060
45717Aarhus2ValidH6720.0Fell1970-01-01 00:00:00.00000195156.1833310.23333(56.18333, 10.23333)NASAFalseA[1]52.011044
45718Abee6ValidEH4107000.0Fell1970-01-01 00:00:00.00000195254.21667-113.00000(54.21667, -113.0)NASAFalse1[1]52.706635
45719Acapulco10ValidAcapulcoite1914.0Fell1970-01-01 00:00:00.00000197616.88333-99.90000(16.88333, -99.9)NASAFalseA[1]23.165966
45720Achiras370ValidL6780.0Fell1970-01-01 00:00:00.000001902-33.16667-64.95000(-33.16667, -64.95)NASAFalseA[1]-23.288647
45721Adhi Kot379ValidEH44239.0Fell1970-01-01 00:00:00.00000191932.1000071.80000(32.1, 71.8)NASATrue1[1]36.516590
45722Adzhi-Bogdo (stone)390ValidLL3-6910.0Fell1970-01-01 00:00:00.00000194944.8333395.16667(44.83333, 95.16667)NASATrue1[1]43.279572
45723Agen392ValidH530000.0Fell1970-01-01 00:00:00.00000181444.216670.61667(44.21667, 0.61667)NASAFalseA[1]49.607269
45724Aguada398ValidL61620.0Fell1970-01-01 00:00:00.000001930-31.60000-65.23333(-31.6, -65.23333)NASAFalse1[1]-32.581022
45725Aguila Blanca417ValidL1440.0Fell1970-01-01 00:00:00.000001920-30.86667-64.55000(-30.86667, -64.55)NASAFalseA[1]-29.651528

Duplicate rows

Most frequently occurring

nameidnametyperecclassmass (g)fallyearreclatreclongGeoLocationsourcebooleanmixedreclat_city# duplicates
0Aachen1ValidL521.0Fell1970-01-01 00:00:00.00000188050.775006.08333(50.775, 6.08333)NASATrue150.5180602
1Aarhus2ValidH6720.0Fell1970-01-01 00:00:00.00000195156.1833310.23333(56.18333, 10.23333)NASAFalseA52.0110442
2Abee6ValidEH4107000.0Fell1970-01-01 00:00:00.00000195254.21667-113.00000(54.21667, -113.0)NASAFalse152.7066352
3Acapulco10ValidAcapulcoite1914.0Fell1970-01-01 00:00:00.00000197616.88333-99.90000(16.88333, -99.9)NASAFalseA23.1659662
4Achiras370ValidL6780.0Fell1970-01-01 00:00:00.000001902-33.16667-64.95000(-33.16667, -64.95)NASAFalseA-23.2886472
5Adhi Kot379ValidEH44239.0Fell1970-01-01 00:00:00.00000191932.1000071.80000(32.1, 71.8)NASATrue136.5165902
6Adzhi-Bogdo (stone)390ValidLL3-6910.0Fell1970-01-01 00:00:00.00000194944.8333395.16667(44.83333, 95.16667)NASATrue143.2795722
7Agen392ValidH530000.0Fell1970-01-01 00:00:00.00000181444.216670.61667(44.21667, 0.61667)NASAFalseA49.6072692
8Aguada398ValidL61620.0Fell1970-01-01 00:00:00.000001930-31.60000-65.23333(-31.6, -65.23333)NASAFalse1-32.5810222
9Aguila Blanca417ValidL1440.0Fell1970-01-01 00:00:00.000001920-30.86667-64.55000(-30.86667, -64.55)NASAFalseA-29.6515282