Overview

Dataset statistics

Number of variables3
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory23.6 KiB
Average record size in memory24.1 B

Variable types

Text2
Categorical1

Reproduction

Analysis started2024-05-07 20:16:38.692930
Analysis finished2024-05-07 20:16:38.927853
Duration0.23 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

Distinct995
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2024-05-07T20:16:39.231040image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length19
Median length13
Mean length6.117
Min length1

Characters and Unicode

Total characters6117
Distinct characters43
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique990 ?
Unique (%)99.0%

Sample

1st rowи
2nd rowв
3rd rowне
4th rowон
5th rowна
ValueCountFrequency (%)
знать 2
 
0.2%
много 2
 
0.2%
что 2
 
0.2%
мало 2
 
0.2%
пора 2
 
0.2%
тот 1
 
0.1%
это 1
 
0.1%
весь 1
 
0.1%
а 1
 
0.1%
с 1
 
0.1%
Other values (987) 987
98.5%
2024-05-07T20:16:39.749269image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
о 645
 
10.5%
т 526
 
8.6%
а 484
 
7.9%
е 395
 
6.5%
с 364
 
6.0%
и 345
 
5.6%
н 339
 
5.5%
ь 316
 
5.2%
р 306
 
5.0%
в 263
 
4.3%
Other values (33) 2134
34.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 6117
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
о 645
 
10.5%
т 526
 
8.6%
а 484
 
7.9%
е 395
 
6.5%
с 364
 
6.0%
и 345
 
5.6%
н 339
 
5.5%
ь 316
 
5.2%
р 306
 
5.0%
в 263
 
4.3%
Other values (33) 2134
34.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 6117
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
о 645
 
10.5%
т 526
 
8.6%
а 484
 
7.9%
е 395
 
6.5%
с 364
 
6.0%
и 345
 
5.6%
н 339
 
5.5%
ь 316
 
5.2%
р 306
 
5.0%
в 263
 
4.3%
Other values (33) 2134
34.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 6117
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
о 645
 
10.5%
т 526
 
8.6%
а 484
 
7.9%
е 395
 
6.5%
с 364
 
6.0%
и 345
 
5.6%
н 339
 
5.5%
ь 316
 
5.2%
р 306
 
5.0%
в 263
 
4.3%
Other values (33) 2134
34.9%
Distinct961
Distinct (%)96.1%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2024-05-07T20:16:40.087794image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length130
Median length39
Mean length13.169
Min length1

Characters and Unicode

Total characters13169
Distinct characters75
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique924 ?
Unique (%)92.4%

Sample

1st rowand, though
2nd rowin, at
3rd rownot
4th rowhe
5th rowon, it, at, to
ValueCountFrequency (%)
to 256
 
11.0%
see 33
 
1.4%
be 20
 
0.9%
in 20
 
0.9%
as 18
 
0.8%
for 16
 
0.7%
come 15
 
0.6%
of 14
 
0.6%
the 13
 
0.6%
by 12
 
0.5%
Other values (1240) 1914
82.1%
2024-05-07T20:16:40.598441image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 1370
 
10.4%
1334
 
10.1%
t 1048
 
8.0%
o 1021
 
7.8%
a 788
 
6.0%
r 730
 
5.5%
, 673
 
5.1%
n 656
 
5.0%
i 616
 
4.7%
s 611
 
4.6%
Other values (65) 4322
32.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 13169
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 1370
 
10.4%
1334
 
10.1%
t 1048
 
8.0%
o 1021
 
7.8%
a 788
 
6.0%
r 730
 
5.5%
, 673
 
5.1%
n 656
 
5.0%
i 616
 
4.7%
s 611
 
4.6%
Other values (65) 4322
32.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 13169
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 1370
 
10.4%
1334
 
10.1%
t 1048
 
8.0%
o 1021
 
7.8%
a 788
 
6.0%
r 730
 
5.5%
, 673
 
5.1%
n 656
 
5.0%
i 616
 
4.7%
s 611
 
4.6%
Other values (65) 4322
32.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 13169
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 1370
 
10.4%
1334
 
10.1%
t 1048
 
8.0%
o 1021
 
7.8%
a 788
 
6.0%
r 730
 
5.5%
, 673
 
5.1%
n 656
 
5.0%
i 616
 
4.7%
s 611
 
4.6%
Other values (65) 4322
32.8%

part of speech
Categorical

Distinct37
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
noun
374 
verb
232 
adjective
127 
adverb
112 
preposition
 
37
Other values (32)
118 

Length

Max length26
Median length4
Mean length5.885
Min length3

Characters and Unicode

Total characters5885
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20 ?
Unique (%)2.0%

Sample

1st rowconjunction
2nd rowpreposition
3rd rowparticle
4th rowpronoun
5th rowpreposition

Common Values

ValueCountFrequency (%)
noun 374
37.4%
verb 232
23.2%
adjective 127
 
12.7%
adverb 112
 
11.2%
preposition 37
 
3.7%
pronoun 36
 
3.6%
misc 12
 
1.2%
conjunction 12
 
1.2%
cardinal number 11
 
1.1%
particle 7
 
0.7%
Other values (27) 40
 
4.0%

Length

2024-05-07T20:16:40.755414image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
noun 378
36.0%
verb 234
22.3%
adjective 129
 
12.3%
adverb 118
 
11.2%
pronoun 40
 
3.8%
preposition 39
 
3.7%
particle 19
 
1.8%
number 18
 
1.7%
cardinal 16
 
1.5%
conjunction 15
 
1.4%
Other values (15) 43
 
4.1%

Most occurring characters

ValueCountFrequency (%)
n 984
16.7%
e 698
11.9%
o 588
10.0%
r 497
8.4%
v 481
8.2%
u 456
7.7%
b 373
 
6.3%
a 309
 
5.3%
i 283
 
4.8%
d 268
 
4.6%
Other values (14) 948
16.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5885
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 984
16.7%
e 698
11.9%
o 588
10.0%
r 497
8.4%
v 481
8.2%
u 456
7.7%
b 373
 
6.3%
a 309
 
5.3%
i 283
 
4.8%
d 268
 
4.6%
Other values (14) 948
16.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5885
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 984
16.7%
e 698
11.9%
o 588
10.0%
r 497
8.4%
v 481
8.2%
u 456
7.7%
b 373
 
6.3%
a 309
 
5.3%
i 283
 
4.8%
d 268
 
4.6%
Other values (14) 948
16.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5885
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 984
16.7%
e 698
11.9%
o 588
10.0%
r 497
8.4%
v 481
8.2%
u 456
7.7%
b 373
 
6.3%
a 309
 
5.3%
i 283
 
4.8%
d 268
 
4.6%
Other values (14) 948
16.1%

Missing values

2024-05-07T20:16:38.798342image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-07T20:16:38.887334image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

russianenglishpart of speech
0иand, thoughconjunction
1вin, atpreposition
2неnotparticle
3онhepronoun
4наon, it, at, topreposition
5яIpronoun
6чтоwhat, that, whyсonjunction, pronoun
7тотthatadjective, pronoun
8бытьto beverb
9сwith, and, from, ofpreposition
russianenglishpart of speech
990художникpainter, artistnoun
991знакsignnoun
992заводfactorynoun
993кулакfistnoun
994использоватьto use, utilize, make use ofverb
995стаканglassnoun
996пахнутьto smellverb
997отсюдаfrom hereadverb
998ротmouthnoun
999пораit's time;at times, now and then(See #279)misc