Overview

Brought to you by YData

Dataset statistics

Number of variables14
Number of observations32561
Missing cells4262
Missing cells (%)0.9%
Duplicate rows24
Duplicate rows (%)0.1%
Total size in memory18.1 MiB
Average record size in memory583.0 B

Variable types

Numeric6
Categorical8

Dataset

DescriptionPredict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0)). Prediction task is to determine whether a person makes over 50K a year.
CreatorBarry Becker
AuthorRonny Kohavi and Barry Becker
URLhttps://archive.ics.uci.edu/ml/datasets/adult

Variable descriptions

agedefinition 0
workclassdefinition 1
fnlwgtdefinition 2
educationdefinition 3
education-numdefinition 4
marital-statusdefinition 5
occupationdefinition 6
relationshipdefinition 7
racedefinition 8
sexdefinition 9
capital-gaindefinition 10
capital-lossdefinition 11
hours-per-weekdefinition 12
native-countrydefinition 13

Alerts

Dataset has 24 (0.1%) duplicate rowsDuplicates
education is highly overall correlated with education-numHigh correlation
education-num is highly overall correlated with educationHigh correlation
relationship is highly overall correlated with sexHigh correlation
sex is highly overall correlated with relationshipHigh correlation
workclass is highly imbalanced (52.8%)Imbalance
race is highly imbalanced (65.6%)Imbalance
native-country is highly imbalanced (84.5%)Imbalance
workclass has 1836 (5.6%) missing valuesMissing
occupation has 1843 (5.7%) missing valuesMissing
native-country has 583 (1.8%) missing valuesMissing
capital-gain has 29849 (91.7%) zerosZeros
capital-loss has 31042 (95.3%) zerosZeros

Reproduction

Analysis started2025-09-23 16:06:01.464008
Analysis finished2025-09-23 16:06:05.454836
Duration3.99 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

age
Real number (ℝ)

Distinct73
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.581647
Minimum17
Maximum90
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size254.5 KiB
2025-09-23T16:06:05.509246image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum17
5-th percentile19
Q128
median37
Q348
95-th percentile63
Maximum90
Range73
Interquartile range (IQR)20

Descriptive statistics

Standard deviation13.640433
Coefficient of variation (CV)0.35354718
Kurtosis-0.16612746
Mean38.581647
Median Absolute Deviation (MAD)10
Skewness0.55874337
Sum1256257
Variance186.0614
MonotonicityNot monotonic
2025-09-23T16:06:05.726297image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
36898
 
2.8%
31888
 
2.7%
34886
 
2.7%
23877
 
2.7%
35876
 
2.7%
33875
 
2.7%
28867
 
2.7%
30861
 
2.6%
37858
 
2.6%
25841
 
2.6%
Other values (63)23834
73.2%
ValueCountFrequency (%)
17395
1.2%
18550
1.7%
19712
2.2%
20753
2.3%
21720
2.2%
22765
2.3%
23877
2.7%
24798
2.5%
25841
2.6%
26785
2.4%
ValueCountFrequency (%)
9043
0.1%
883
 
< 0.1%
871
 
< 0.1%
861
 
< 0.1%
853
 
< 0.1%
8410
 
< 0.1%
836
 
< 0.1%
8212
 
< 0.1%
8120
0.1%
8022
0.1%

workclass
Categorical

Imbalance  Missing 

Distinct8
Distinct (%)< 0.1%
Missing1836
Missing (%)5.6%
Memory size2.1 MiB
Private
22696 
Self-emp-not-inc
2541 
Local-gov
 
2093
State-gov
 
1298
Self-emp-inc
 
1116
Other values (3)
 
981

Length

Max length17
Median length8
Mean length9.2745972
Min length8

Characters and Unicode

Total characters284962
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowState-gov
2nd rowSelf-emp-not-inc
3rd rowPrivate
4th rowPrivate
5th rowPrivate

Common Values

ValueCountFrequency (%)
Private22696
69.7%
Self-emp-not-inc2541
 
7.8%
Local-gov2093
 
6.4%
State-gov1298
 
4.0%
Self-emp-inc1116
 
3.4%
Federal-gov960
 
2.9%
Without-pay14
 
< 0.1%
Never-worked7
 
< 0.1%
(Missing)1836
 
5.6%

Length

2025-09-23T16:06:05.812541image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-09-23T16:06:05.876325image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
private22696
73.9%
self-emp-not-inc2541
 
8.3%
local-gov2093
 
6.8%
state-gov1298
 
4.2%
self-emp-inc1116
 
3.6%
federal-gov960
 
3.1%
without-pay14
 
< 0.1%
never-worked7
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e33249
11.7%
30725
10.8%
t27861
9.8%
a27061
9.5%
v27054
9.5%
i26367
9.3%
r23670
8.3%
P22696
8.0%
-14227
 
5.0%
o9006
 
3.2%
Other values (18)43046
15.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)284962
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e33249
11.7%
30725
10.8%
t27861
9.8%
a27061
9.5%
v27054
9.5%
i26367
9.3%
r23670
8.3%
P22696
8.0%
-14227
 
5.0%
o9006
 
3.2%
Other values (18)43046
15.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)284962
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e33249
11.7%
30725
10.8%
t27861
9.8%
a27061
9.5%
v27054
9.5%
i26367
9.3%
r23670
8.3%
P22696
8.0%
-14227
 
5.0%
o9006
 
3.2%
Other values (18)43046
15.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)284962
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e33249
11.7%
30725
10.8%
t27861
9.8%
a27061
9.5%
v27054
9.5%
i26367
9.3%
r23670
8.3%
P22696
8.0%
-14227
 
5.0%
o9006
 
3.2%
Other values (18)43046
15.1%

fnlwgt
Real number (ℝ)

Distinct21648
Distinct (%)66.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean189778.37
Minimum12285
Maximum1484705
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size254.5 KiB
2025-09-23T16:06:05.969801image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum12285
5-th percentile39460
Q1117827
median178356
Q3237051
95-th percentile379682
Maximum1484705
Range1472420
Interquartile range (IQR)119224

Descriptive statistics

Standard deviation105549.98
Coefficient of variation (CV)0.55617497
Kurtosis6.218811
Mean189778.37
Median Absolute Deviation (MAD)59894
Skewness1.4469801
Sum6.1793734 × 109
Variance1.1140798 × 1010
MonotonicityNot monotonic
2025-09-23T16:06:06.067574image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
16419013
 
< 0.1%
20348813
 
< 0.1%
12301113
 
< 0.1%
11336412
 
< 0.1%
12112412
 
< 0.1%
14899512
 
< 0.1%
12667512
 
< 0.1%
18824611
 
< 0.1%
15565911
 
< 0.1%
10230811
 
< 0.1%
Other values (21638)32441
99.6%
ValueCountFrequency (%)
122851
 
< 0.1%
137691
 
< 0.1%
148781
 
< 0.1%
188271
 
< 0.1%
192141
 
< 0.1%
193025
< 0.1%
193952
 
< 0.1%
194101
 
< 0.1%
194911
 
< 0.1%
195201
 
< 0.1%
ValueCountFrequency (%)
14847051
< 0.1%
14554351
< 0.1%
13661201
< 0.1%
12683391
< 0.1%
12265831
< 0.1%
11846221
< 0.1%
11613631
< 0.1%
11256131
< 0.1%
10974531
< 0.1%
10855151
< 0.1%

education
Categorical

High correlation 

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.1 MiB
HS-grad
10501 
Some-college
7291 
Bachelors
5355 
Masters
1723 
Assoc-voc
1382 
Other values (11)
6309 

Length

Max length13
Median length12
Mean length9.433709
Min length4

Characters and Unicode

Total characters307171
Distinct characters32
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBachelors
2nd rowBachelors
3rd rowHS-grad
4th row11th
5th rowBachelors

Common Values

ValueCountFrequency (%)
HS-grad10501
32.3%
Some-college7291
22.4%
Bachelors5355
16.4%
Masters1723
 
5.3%
Assoc-voc1382
 
4.2%
11th1175
 
3.6%
Assoc-acdm1067
 
3.3%
10th933
 
2.9%
7th-8th646
 
2.0%
Prof-school576
 
1.8%
Other values (6)1912
 
5.9%

Length

2025-09-23T16:06:06.158783image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
hs-grad10501
32.3%
some-college7291
22.4%
bachelors5355
16.4%
masters1723
 
5.3%
assoc-voc1382
 
4.2%
11th1175
 
3.6%
assoc-acdm1067
 
3.3%
10th933
 
2.9%
7th-8th646
 
2.0%
prof-school576
 
1.8%
Other values (6)1912
 
5.9%

Most occurring characters

ValueCountFrequency (%)
32561
 
10.6%
e29415
 
9.6%
o26424
 
8.6%
-21964
 
7.2%
l20564
 
6.7%
a19059
 
6.2%
r18619
 
6.1%
c18584
 
6.1%
S17792
 
5.8%
g17792
 
5.8%
Other values (22)84397
27.5%

Most occurring categories

ValueCountFrequency (%)
(unknown)307171
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
32561
 
10.6%
e29415
 
9.6%
o26424
 
8.6%
-21964
 
7.2%
l20564
 
6.7%
a19059
 
6.2%
r18619
 
6.1%
c18584
 
6.1%
S17792
 
5.8%
g17792
 
5.8%
Other values (22)84397
27.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown)307171
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
32561
 
10.6%
e29415
 
9.6%
o26424
 
8.6%
-21964
 
7.2%
l20564
 
6.7%
a19059
 
6.2%
r18619
 
6.1%
c18584
 
6.1%
S17792
 
5.8%
g17792
 
5.8%
Other values (22)84397
27.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown)307171
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
32561
 
10.6%
e29415
 
9.6%
o26424
 
8.6%
-21964
 
7.2%
l20564
 
6.7%
a19059
 
6.2%
r18619
 
6.1%
c18584
 
6.1%
S17792
 
5.8%
g17792
 
5.8%
Other values (22)84397
27.5%

education-num
Real number (ℝ)

High correlation 

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.080679
Minimum1
Maximum16
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size254.5 KiB
2025-09-23T16:06:06.218109image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q19
median10
Q312
95-th percentile14
Maximum16
Range15
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.5727203
Coefficient of variation (CV)0.25521299
Kurtosis0.62344407
Mean10.080679
Median Absolute Deviation (MAD)1
Skewness-0.31167587
Sum328237
Variance6.6188899
MonotonicityNot monotonic
2025-09-23T16:06:06.291708image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
910501
32.3%
107291
22.4%
135355
16.4%
141723
 
5.3%
111382
 
4.2%
71175
 
3.6%
121067
 
3.3%
6933
 
2.9%
4646
 
2.0%
15576
 
1.8%
Other values (6)1912
 
5.9%
ValueCountFrequency (%)
151
 
0.2%
2168
 
0.5%
3333
 
1.0%
4646
 
2.0%
5514
 
1.6%
6933
 
2.9%
71175
 
3.6%
8433
 
1.3%
910501
32.3%
107291
22.4%
ValueCountFrequency (%)
16413
 
1.3%
15576
 
1.8%
141723
 
5.3%
135355
16.4%
121067
 
3.3%
111382
 
4.2%
107291
22.4%
910501
32.3%
8433
 
1.3%
71175
 
3.6%

marital-status
Categorical

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.2 MiB
Married-civ-spouse
14976 
Never-married
10683 
Divorced
4443 
Separated
 
1025
Widowed
 
993
Other values (2)
 
441

Length

Max length22
Median length19
Mean length15.414054
Min length8

Characters and Unicode

Total characters501897
Distinct characters25
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNever-married
2nd rowMarried-civ-spouse
3rd rowDivorced
4th rowMarried-civ-spouse
5th rowMarried-civ-spouse

Common Values

ValueCountFrequency (%)
Married-civ-spouse14976
46.0%
Never-married10683
32.8%
Divorced4443
 
13.6%
Separated1025
 
3.1%
Widowed993
 
3.0%
Married-spouse-absent418
 
1.3%
Married-AF-spouse23
 
0.1%

Length

2025-09-23T16:06:06.378079image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-09-23T16:06:06.445266image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
married-civ-spouse14976
46.0%
never-married10683
32.8%
divorced4443
 
13.6%
separated1025
 
3.1%
widowed993
 
3.0%
married-spouse-absent418
 
1.3%
married-af-spouse23
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e70787
14.1%
r68351
13.6%
i46512
9.3%
-41517
8.3%
d33554
 
6.7%
32561
 
6.5%
s31252
 
6.2%
v30102
 
6.0%
a28568
 
5.7%
o20853
 
4.2%
Other values (15)97840
19.5%

Most occurring categories

ValueCountFrequency (%)
(unknown)501897
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e70787
14.1%
r68351
13.6%
i46512
9.3%
-41517
8.3%
d33554
 
6.7%
32561
 
6.5%
s31252
 
6.2%
v30102
 
6.0%
a28568
 
5.7%
o20853
 
4.2%
Other values (15)97840
19.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown)501897
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e70787
14.1%
r68351
13.6%
i46512
9.3%
-41517
8.3%
d33554
 
6.7%
32561
 
6.5%
s31252
 
6.2%
v30102
 
6.0%
a28568
 
5.7%
o20853
 
4.2%
Other values (15)97840
19.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown)501897
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e70787
14.1%
r68351
13.6%
i46512
9.3%
-41517
8.3%
d33554
 
6.7%
32561
 
6.5%
s31252
 
6.2%
v30102
 
6.0%
a28568
 
5.7%
o20853
 
4.2%
Other values (15)97840
19.5%

occupation
Categorical

Missing 

Distinct14
Distinct (%)< 0.1%
Missing1843
Missing (%)5.7%
Memory size2.2 MiB
Prof-specialty
4140 
Craft-repair
4099 
Exec-managerial
4066 
Adm-clerical
3770 
Sales
3650 
Other values (9)
10993 

Length

Max length18
Median length16
Mean length13.873983
Min length6

Characters and Unicode

Total characters426181
Distinct characters32
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAdm-clerical
2nd rowExec-managerial
3rd rowHandlers-cleaners
4th rowHandlers-cleaners
5th rowProf-specialty

Common Values

ValueCountFrequency (%)
Prof-specialty4140
12.7%
Craft-repair4099
12.6%
Exec-managerial4066
12.5%
Adm-clerical3770
11.6%
Sales3650
11.2%
Other-service3295
10.1%
Machine-op-inspct2002
6.1%
Transport-moving1597
 
4.9%
Handlers-cleaners1370
 
4.2%
Farming-fishing994
 
3.1%
Other values (4)1735
5.3%
(Missing)1843
5.7%

Length

2025-09-23T16:06:06.532866image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
prof-specialty4140
13.5%
craft-repair4099
13.3%
exec-managerial4066
13.2%
adm-clerical3770
12.3%
sales3650
11.9%
other-service3295
10.7%
machine-op-inspct2002
6.5%
transport-moving1597
 
5.2%
handlers-cleaners1370
 
4.5%
farming-fishing994
 
3.2%
Other values (4)1735
5.6%

Most occurring characters

ValueCountFrequency (%)
e42979
 
10.1%
r40333
 
9.5%
a39289
 
9.2%
30718
 
7.2%
-29219
 
6.9%
i28751
 
6.7%
c26001
 
6.1%
l22136
 
5.2%
s20302
 
4.8%
t17359
 
4.1%
Other values (22)129094
30.3%

Most occurring categories

ValueCountFrequency (%)
(unknown)426181
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e42979
 
10.1%
r40333
 
9.5%
a39289
 
9.2%
30718
 
7.2%
-29219
 
6.9%
i28751
 
6.7%
c26001
 
6.1%
l22136
 
5.2%
s20302
 
4.8%
t17359
 
4.1%
Other values (22)129094
30.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown)426181
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e42979
 
10.1%
r40333
 
9.5%
a39289
 
9.2%
30718
 
7.2%
-29219
 
6.9%
i28751
 
6.7%
c26001
 
6.1%
l22136
 
5.2%
s20302
 
4.8%
t17359
 
4.1%
Other values (22)129094
30.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown)426181
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e42979
 
10.1%
r40333
 
9.5%
a39289
 
9.2%
30718
 
7.2%
-29219
 
6.9%
i28751
 
6.7%
c26001
 
6.1%
l22136
 
5.2%
s20302
 
4.8%
t17359
 
4.1%
Other values (22)129094
30.3%

relationship
Categorical

High correlation 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.1 MiB
Husband
13193 
Not-in-family
8305 
Own-child
5068 
Unmarried
3446 
Wife
1568 

Length

Max length15
Median length14
Mean length10.119744
Min length5

Characters and Unicode

Total characters329509
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot-in-family
2nd rowHusband
3rd rowNot-in-family
4th rowHusband
5th rowWife

Common Values

ValueCountFrequency (%)
Husband13193
40.5%
Not-in-family8305
25.5%
Own-child5068
 
15.6%
Unmarried3446
 
10.6%
Wife1568
 
4.8%
Other-relative981
 
3.0%

Length

2025-09-23T16:06:06.607213image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-09-23T16:06:06.664963image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
husband13193
40.5%
not-in-family8305
25.5%
own-child5068
 
15.6%
unmarried3446
 
10.6%
wife1568
 
4.8%
other-relative981
 
3.0%

Most occurring characters

ValueCountFrequency (%)
32561
 
9.9%
n30012
 
9.1%
i27673
 
8.4%
a25925
 
7.9%
-22659
 
6.9%
d21707
 
6.6%
l14354
 
4.4%
b13193
 
4.0%
H13193
 
4.0%
u13193
 
4.0%
Other values (16)115039
34.9%

Most occurring categories

ValueCountFrequency (%)
(unknown)329509
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
32561
 
9.9%
n30012
 
9.1%
i27673
 
8.4%
a25925
 
7.9%
-22659
 
6.9%
d21707
 
6.6%
l14354
 
4.4%
b13193
 
4.0%
H13193
 
4.0%
u13193
 
4.0%
Other values (16)115039
34.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown)329509
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
32561
 
9.9%
n30012
 
9.1%
i27673
 
8.4%
a25925
 
7.9%
-22659
 
6.9%
d21707
 
6.6%
l14354
 
4.4%
b13193
 
4.0%
H13193
 
4.0%
u13193
 
4.0%
Other values (16)115039
34.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown)329509
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
32561
 
9.9%
n30012
 
9.1%
i27673
 
8.4%
a25925
 
7.9%
-22659
 
6.9%
d21707
 
6.6%
l14354
 
4.4%
b13193
 
4.0%
H13193
 
4.0%
u13193
 
4.0%
Other values (16)115039
34.9%

race
Categorical

Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
White
27816 
Black
3124 
Asian-Pac-Islander
 
1039
Amer-Indian-Eskimo
 
311
Other
 
271

Length

Max length19
Median length6
Mean length6.5389884
Min length6

Characters and Unicode

Total characters212916
Distinct characters23
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWhite
2nd rowWhite
3rd rowWhite
4th rowBlack
5th rowBlack

Common Values

ValueCountFrequency (%)
White27816
85.4%
Black3124
 
9.6%
Asian-Pac-Islander1039
 
3.2%
Amer-Indian-Eskimo311
 
1.0%
Other271
 
0.8%

Length

2025-09-23T16:06:06.747547image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-09-23T16:06:06.802780image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
white27816
85.4%
black3124
 
9.6%
asian-pac-islander1039
 
3.2%
amer-indian-eskimo311
 
1.0%
other271
 
0.8%

Most occurring characters

ValueCountFrequency (%)
32561
15.3%
i29477
13.8%
e29437
13.8%
t28087
13.2%
h28087
13.2%
W27816
13.1%
a6552
 
3.1%
c4163
 
2.0%
l4163
 
2.0%
k3435
 
1.6%
Other values (13)19138
9.0%

Most occurring categories

ValueCountFrequency (%)
(unknown)212916
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
32561
15.3%
i29477
13.8%
e29437
13.8%
t28087
13.2%
h28087
13.2%
W27816
13.1%
a6552
 
3.1%
c4163
 
2.0%
l4163
 
2.0%
k3435
 
1.6%
Other values (13)19138
9.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown)212916
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
32561
15.3%
i29477
13.8%
e29437
13.8%
t28087
13.2%
h28087
13.2%
W27816
13.1%
a6552
 
3.1%
c4163
 
2.0%
l4163
 
2.0%
k3435
 
1.6%
Other values (13)19138
9.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown)212916
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
32561
15.3%
i29477
13.8%
e29437
13.8%
t28087
13.2%
h28087
13.2%
W27816
13.1%
a6552
 
3.1%
c4163
 
2.0%
l4163
 
2.0%
k3435
 
1.6%
Other values (13)19138
9.0%

sex
Categorical

High correlation 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
Male
21790 
Female
10771 

Length

Max length7
Median length5
Mean length5.661589
Min length5

Characters and Unicode

Total characters184347
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowMale
4th rowMale
5th rowFemale

Common Values

ValueCountFrequency (%)
Male21790
66.9%
Female10771
33.1%

Length

2025-09-23T16:06:06.876017image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-09-23T16:06:06.925317image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male21790
66.9%
female10771
33.1%

Most occurring characters

ValueCountFrequency (%)
e43332
23.5%
a32561
17.7%
32561
17.7%
l32561
17.7%
M21790
11.8%
F10771
 
5.8%
m10771
 
5.8%

Most occurring categories

ValueCountFrequency (%)
(unknown)184347
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e43332
23.5%
a32561
17.7%
32561
17.7%
l32561
17.7%
M21790
11.8%
F10771
 
5.8%
m10771
 
5.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown)184347
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e43332
23.5%
a32561
17.7%
32561
17.7%
l32561
17.7%
M21790
11.8%
F10771
 
5.8%
m10771
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown)184347
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e43332
23.5%
a32561
17.7%
32561
17.7%
l32561
17.7%
M21790
11.8%
F10771
 
5.8%
m10771
 
5.8%

capital-gain
Real number (ℝ)

Zeros 

Distinct119
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1077.6488
Minimum0
Maximum99999
Zeros29849
Zeros (%)91.7%
Negative0
Negative (%)0.0%
Memory size254.5 KiB
2025-09-23T16:06:06.992305image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile5013
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation7385.2921
Coefficient of variation (CV)6.8531527
Kurtosis154.79944
Mean1077.6488
Median Absolute Deviation (MAD)0
Skewness11.953848
Sum35089324
Variance54542539
MonotonicityNot monotonic
2025-09-23T16:06:07.090769image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
029849
91.7%
15024347
 
1.1%
7688284
 
0.9%
7298246
 
0.8%
99999159
 
0.5%
310397
 
0.3%
517897
 
0.3%
438670
 
0.2%
501369
 
0.2%
861455
 
0.2%
Other values (109)1288
 
4.0%
ValueCountFrequency (%)
029849
91.7%
1146
 
< 0.1%
4012
 
< 0.1%
59434
 
0.1%
9148
 
< 0.1%
9915
 
< 0.1%
105525
 
0.1%
10864
 
< 0.1%
11111
 
< 0.1%
11518
 
< 0.1%
ValueCountFrequency (%)
99999159
0.5%
413102
 
< 0.1%
340955
 
< 0.1%
2782834
 
0.1%
2523611
 
< 0.1%
251244
 
< 0.1%
220401
 
< 0.1%
2005137
 
0.1%
184812
 
< 0.1%
158316
 
< 0.1%

capital-loss
Real number (ℝ)

Zeros 

Distinct92
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean87.30383
Minimum0
Maximum4356
Zeros31042
Zeros (%)95.3%
Negative0
Negative (%)0.0%
Memory size254.5 KiB
2025-09-23T16:06:07.184520image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4356
Range4356
Interquartile range (IQR)0

Descriptive statistics

Standard deviation402.96022
Coefficient of variation (CV)4.6156076
Kurtosis20.376802
Mean87.30383
Median Absolute Deviation (MAD)0
Skewness4.5946291
Sum2842700
Variance162376.94
MonotonicityNot monotonic
2025-09-23T16:06:07.275828image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
031042
95.3%
1902202
 
0.6%
1977168
 
0.5%
1887159
 
0.5%
148551
 
0.2%
184851
 
0.2%
241549
 
0.2%
160247
 
0.1%
174042
 
0.1%
159040
 
0.1%
Other values (82)710
 
2.2%
ValueCountFrequency (%)
031042
95.3%
1551
 
< 0.1%
2134
 
< 0.1%
3233
 
< 0.1%
4193
 
< 0.1%
62512
 
< 0.1%
6533
 
< 0.1%
8102
 
< 0.1%
8806
 
< 0.1%
9742
 
< 0.1%
ValueCountFrequency (%)
43563
 
< 0.1%
39002
 
< 0.1%
37702
 
< 0.1%
36832
 
< 0.1%
30042
 
< 0.1%
282410
< 0.1%
27542
 
< 0.1%
26035
< 0.1%
255912
< 0.1%
25474
 
< 0.1%

hours-per-week
Real number (ℝ)

Distinct94
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.437456
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size254.5 KiB
2025-09-23T16:06:07.367514image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile18
Q140
median40
Q345
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)5

Descriptive statistics

Standard deviation12.347429
Coefficient of variation (CV)0.30534633
Kurtosis2.9166868
Mean40.437456
Median Absolute Deviation (MAD)3
Skewness0.22764254
Sum1316684
Variance152.459
MonotonicityNot monotonic
2025-09-23T16:06:07.462881image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4015217
46.7%
502819
 
8.7%
451824
 
5.6%
601475
 
4.5%
351297
 
4.0%
201224
 
3.8%
301149
 
3.5%
55694
 
2.1%
25674
 
2.1%
48517
 
1.6%
Other values (84)5671
 
17.4%
ValueCountFrequency (%)
120
 
0.1%
232
 
0.1%
339
 
0.1%
454
 
0.2%
560
 
0.2%
664
 
0.2%
726
 
0.1%
8145
0.4%
918
 
0.1%
10278
0.9%
ValueCountFrequency (%)
9985
0.3%
9811
 
< 0.1%
972
 
< 0.1%
965
 
< 0.1%
952
 
< 0.1%
941
 
< 0.1%
921
 
< 0.1%
913
 
< 0.1%
9029
 
0.1%
892
 
< 0.1%

native-country
Categorical

Imbalance  Missing 

Distinct41
Distinct (%)0.1%
Missing583
Missing (%)1.8%
Memory size2.2 MiB
United-States
29170 
Mexico
 
643
Philippines
 
198
Germany
 
137
Canada
 
121
Other values (36)
 
1709

Length

Max length27
Median length14
Mean length13.49975
Min length5

Characters and Unicode

Total characters431695
Distinct characters45
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowUnited-States
2nd rowUnited-States
3rd rowUnited-States
4th rowUnited-States
5th rowCuba

Common Values

ValueCountFrequency (%)
United-States29170
89.6%
Mexico643
 
2.0%
Philippines198
 
0.6%
Germany137
 
0.4%
Canada121
 
0.4%
Puerto-Rico114
 
0.4%
El-Salvador106
 
0.3%
India100
 
0.3%
Cuba95
 
0.3%
England90
 
0.3%
Other values (31)1204
 
3.7%
(Missing)583
 
1.8%

Length

2025-09-23T16:06:07.551509image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united-states29170
91.2%
mexico643
 
2.0%
philippines198
 
0.6%
germany137
 
0.4%
canada121
 
0.4%
puerto-rico114
 
0.4%
el-salvador106
 
0.3%
india100
 
0.3%
cuba95
 
0.3%
england90
 
0.3%
Other values (31)1204
 
3.8%

Most occurring characters

ValueCountFrequency (%)
t88030
20.4%
e59820
13.9%
31978
 
7.4%
a31774
 
7.4%
i31372
 
7.3%
n30568
 
7.1%
d29801
 
6.9%
-29503
 
6.8%
s29416
 
6.8%
S29396
 
6.8%
Other values (35)40037
9.3%

Most occurring categories

ValueCountFrequency (%)
(unknown)431695
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t88030
20.4%
e59820
13.9%
31978
 
7.4%
a31774
 
7.4%
i31372
 
7.3%
n30568
 
7.1%
d29801
 
6.9%
-29503
 
6.8%
s29416
 
6.8%
S29396
 
6.8%
Other values (35)40037
9.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown)431695
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t88030
20.4%
e59820
13.9%
31978
 
7.4%
a31774
 
7.4%
i31372
 
7.3%
n30568
 
7.1%
d29801
 
6.9%
-29503
 
6.8%
s29416
 
6.8%
S29396
 
6.8%
Other values (35)40037
9.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown)431695
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t88030
20.4%
e59820
13.9%
31978
 
7.4%
a31774
 
7.4%
i31372
 
7.3%
n30568
 
7.1%
d29801
 
6.9%
-29503
 
6.8%
s29416
 
6.8%
S29396
 
6.8%
Other values (35)40037
9.3%

Interactions

2025-09-23T16:06:04.696290image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:02.486914image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:02.922448image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:03.448717image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:03.878909image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.295417image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.765373image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:02.561545image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:02.998354image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:03.521902image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:03.950020image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.365345image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.835834image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:02.637746image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:03.072845image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:03.597361image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.022769image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.435791image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.905561image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:02.712186image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:03.239857image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:03.669199image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.093132image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.505388image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.972601image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:02.783326image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:03.310431image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:03.740861image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.161737image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.570302image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:05.036807image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:02.851675image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:03.378968image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:03.808375image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.227027image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-09-23T16:06:04.631844image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

2025-09-23T16:06:07.616600image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
agecapital-gaincapital-losseducationeducation-numfnlwgthours-per-weekmarital-statusnative-countryoccupationracerelationshipsexworkclass
age1.0000.1250.0580.1110.066-0.0780.1430.2820.0300.0960.0270.2730.1230.092
capital-gain0.1251.000-0.0670.1120.119-0.0060.0930.0370.0000.0700.0080.0430.0480.051
capital-loss0.058-0.0671.0000.0420.075-0.0070.0600.0590.0000.0330.0110.0640.0710.023
education0.1110.1120.0421.0001.0000.0170.0890.0890.1290.1960.0720.1210.0930.100
education-num0.0660.1190.0751.0001.000-0.0360.1670.0770.1420.2250.0690.1080.0720.092
fnlwgt-0.078-0.006-0.0070.017-0.0361.000-0.0220.0230.0550.0190.0660.0170.0280.023
hours-per-week0.1430.0930.0600.0890.167-0.0221.0000.1180.0290.1310.0590.1610.2400.097
marital-status0.2820.0370.0590.0890.0770.0230.1181.0000.0640.1300.0830.4880.4620.076
native-country0.0300.0000.0000.1290.1420.0550.0290.0641.0000.0680.4210.0780.0560.030
occupation0.0960.0700.0330.1960.2250.0190.1310.1300.0681.0000.0800.1770.4340.215
race0.0270.0080.0110.0720.0690.0660.0590.0830.4210.0801.0000.0970.1180.055
relationship0.2730.0430.0640.1210.1080.0170.1610.4880.0780.1770.0971.0000.6490.089
sex0.1230.0480.0710.0930.0720.0280.2400.4620.0560.4340.1180.6491.0000.143
workclass0.0920.0510.0230.1000.0920.0230.0970.0760.0300.2150.0550.0890.1431.000

Missing values

2025-09-23T16:06:05.151195image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2025-09-23T16:06:05.265641image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-09-23T16:06:05.399834image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

ageworkclassfnlwgteducationeducation-nummarital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-country
039State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale2174040United-States
150Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States
238Private215646HS-grad9DivorcedHandlers-cleanersNot-in-familyWhiteMale0040United-States
353Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale0040United-States
428Private338409Bachelors13Married-civ-spouseProf-specialtyWifeBlackFemale0040Cuba
537Private284582Masters14Married-civ-spouseExec-managerialWifeWhiteFemale0040United-States
649Private1601879th5Married-spouse-absentOther-serviceNot-in-familyBlackFemale0016Jamaica
752Self-emp-not-inc209642HS-grad9Married-civ-spouseExec-managerialHusbandWhiteMale0045United-States
831Private45781Masters14Never-marriedProf-specialtyNot-in-familyWhiteFemale14084050United-States
942Private159449Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale5178040United-States
ageworkclassfnlwgteducationeducation-nummarital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-country
3255132Private3406610th6Married-civ-spouseHandlers-cleanersHusbandAmer-Indian-EskimoMale0040United-States
3255243Private84661Assoc-voc11Married-civ-spouseSalesHusbandWhiteMale0045United-States
3255332Private116138Masters14Never-marriedTech-supportNot-in-familyAsian-Pac-IslanderMale0011Taiwan
3255453Private321865Masters14Married-civ-spouseExec-managerialHusbandWhiteMale0040United-States
3255522Private310152Some-college10Never-marriedProtective-servNot-in-familyWhiteMale0040United-States
3255627Private257302Assoc-acdm12Married-civ-spouseTech-supportWifeWhiteFemale0038United-States
3255740Private154374HS-grad9Married-civ-spouseMachine-op-inspctHusbandWhiteMale0040United-States
3255858Private151910HS-grad9WidowedAdm-clericalUnmarriedWhiteFemale0040United-States
3255922Private201490HS-grad9Never-marriedAdm-clericalOwn-childWhiteMale0020United-States
3256052Self-emp-inc287927HS-grad9Married-civ-spouseExec-managerialWifeWhiteFemale15024040United-States

Duplicate rows

Most frequently occurring

ageworkclassfnlwgteducationeducation-nummarital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-country# duplicates
825Private1959941st-4th2Never-marriedPriv-house-servNot-in-familyWhiteFemale0040Guatemala3
019Private97261HS-grad9Never-marriedFarming-fishingNot-in-familyWhiteMale0040United-States2
119Private138153Some-college10Never-marriedAdm-clericalOwn-childWhiteFemale0010United-States2
219Private146679Some-college10Never-marriedExec-managerialOwn-childBlackMale0030United-States2
319Private251579Some-college10Never-marriedOther-serviceOwn-childWhiteMale0014United-States2
420Private107658Some-college10Never-marriedTech-supportNot-in-familyWhiteFemale0010United-States2
521Private243368Preschool1Never-marriedFarming-fishingNot-in-familyWhiteMale0050Mexico2
621Private250051Some-college10Never-marriedProf-specialtyOwn-childWhiteFemale0010United-States2
723Private2401375th-6th3Never-marriedHandlers-cleanersNot-in-familyWhiteMale0055Mexico2
925Private308144Bachelors13Never-marriedCraft-repairNot-in-familyWhiteMale0040Mexico2