Dataset statistics
Number of variables | 3 |
---|---|
Number of observations | 1000 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 23.6 KiB |
Average record size in memory | 24.1 B |
Variable types
Text | 2 |
---|---|
Categorical | 1 |
Reproduction
Analysis started | 2023-09-12 08:35:01.032390 |
---|---|
Analysis finished | 2023-09-12 08:35:02.625348 |
Duration | 1.59 second |
Software version | ydata-profiling v0.0.dev0 |
Download configuration | config.json |
russian
Text
Distinct | 995 |
---|---|
Distinct (%) | 99.5% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 7.9 KiB |
Value | Count | Frequency (%) |
знать | 2 | 0.2% |
много | 2 | 0.2% |
что | 2 | 0.2% |
мало | 2 | 0.2% |
пора | 2 | 0.2% |
тот | 1 | 0.1% |
это | 1 | 0.1% |
весь | 1 | 0.1% |
а | 1 | 0.1% |
с | 1 | 0.1% |
Other values (987) | 987 |
Most occurring characters
Value | Count | Frequency (%) |
о | 645 | 10.5% |
т | 526 | 8.6% |
а | 484 | 7.9% |
е | 395 | 6.5% |
с | 364 | 6.0% |
и | 345 | 5.6% |
н | 339 | 5.5% |
ь | 316 | 5.2% |
р | 306 | 5.0% |
в | 263 | 4.3% |
Other values (33) | 2134 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 6106 | |
Decimal Number | 3 | < 0.1% |
Uppercase Letter | 3 | < 0.1% |
Space Separator | 2 | < 0.1% |
Open Punctuation | 1 | < 0.1% |
Other Punctuation | 1 | < 0.1% |
Close Punctuation | 1 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
о | 645 | 10.6% |
т | 526 | 8.6% |
а | 484 | 7.9% |
е | 395 | 6.5% |
с | 364 | 6.0% |
и | 345 | 5.7% |
н | 339 | 5.6% |
ь | 316 | 5.2% |
р | 306 | 5.0% |
в | 263 | 4.3% |
Other values (24) | 2123 |
Uppercase Letter
Value | Count | Frequency (%) |
S | 1 | |
М | 1 | |
Р | 1 |
Decimal Number
Value | Count | Frequency (%) |
6 | 2 | |
3 | 1 |
Space Separator
Value | Count | Frequency (%) |
2 |
Open Punctuation
Value | Count | Frequency (%) |
( | 1 |
Other Punctuation
Value | Count | Frequency (%) |
# | 1 |
Close Punctuation
Value | Count | Frequency (%) |
) | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Cyrillic | 6106 | |
Common | 8 | 0.1% |
Latin | 3 | < 0.1% |
Most frequent character per script
Cyrillic
Value | Count | Frequency (%) |
о | 645 | 10.6% |
т | 526 | 8.6% |
а | 484 | 7.9% |
е | 395 | 6.5% |
с | 364 | 6.0% |
и | 345 | 5.7% |
н | 339 | 5.6% |
ь | 316 | 5.2% |
р | 306 | 5.0% |
в | 263 | 4.3% |
Other values (25) | 2123 |
Common
Value | Count | Frequency (%) |
2 | ||
6 | 2 | |
( | 1 | |
# | 1 | |
3 | 1 | |
) | 1 |
Latin
Value | Count | Frequency (%) |
e | 2 | |
S | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
Cyrillic | 6106 | |
ASCII | 11 | 0.2% |
Most frequent character per block
Cyrillic
Value | Count | Frequency (%) |
о | 645 | 10.6% |
т | 526 | 8.6% |
а | 484 | 7.9% |
е | 395 | 6.5% |
с | 364 | 6.0% |
и | 345 | 5.7% |
н | 339 | 5.6% |
ь | 316 | 5.2% |
р | 306 | 5.0% |
в | 263 | 4.3% |
Other values (25) | 2123 |
ASCII
Value | Count | Frequency (%) |
2 | ||
e | 2 | |
6 | 2 | |
( | 1 | |
S | 1 | |
# | 1 | |
3 | 1 | |
) | 1 |
english
Text
Distinct | 961 |
---|---|
Distinct (%) | 96.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 7.9 KiB |
Value | Count | Frequency (%) |
to | 256 | 11.0% |
see | 33 | 1.4% |
be | 20 | 0.9% |
in | 20 | 0.9% |
as | 18 | 0.8% |
for | 16 | 0.7% |
come | 15 | 0.6% |
of | 14 | 0.6% |
the | 13 | 0.6% |
by | 12 | 0.5% |
Other values (1240) | 1914 |
Most occurring characters
Value | Count | Frequency (%) |
e | 1370 | 10.4% |
1334 | 10.1% | |
t | 1048 | 8.0% |
o | 1021 | 7.8% |
a | 788 | 6.0% |
r | 730 | 5.5% |
, | 673 | 5.1% |
n | 656 | 5.0% |
i | 616 | 4.7% |
s | 611 | 4.6% |
Other values (65) | 4322 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 10588 | |
Space Separator | 1334 | 10.1% |
Other Punctuation | 834 | 6.3% |
Decimal Number | 191 | 1.5% |
Close Punctuation | 75 | 0.6% |
Open Punctuation | 75 | 0.6% |
Uppercase Letter | 59 | 0.4% |
Dash Punctuation | 8 | 0.1% |
Nonspacing Mark | 5 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 1370 | |
t | 1048 | 9.9% |
o | 1021 | 9.6% |
a | 788 | 7.4% |
r | 730 | 6.9% |
n | 656 | 6.2% |
i | 616 | 5.8% |
s | 611 | 5.8% |
l | 534 | 5.0% |
h | 350 | 3.3% |
Other values (34) | 2864 |
Other Punctuation
Value | Count | Frequency (%) |
, | 673 | |
; | 67 | 8.0% |
# | 65 | 7.8% |
' | 11 | 1.3% |
" | 6 | 0.7% |
… | 4 | 0.5% |
. | 3 | 0.4% |
! | 2 | 0.2% |
? | 2 | 0.2% |
: | 1 | 0.1% |
Decimal Number
Value | Count | Frequency (%) |
3 | 28 | |
1 | 24 | |
9 | 24 | |
6 | 20 | |
4 | 20 | |
7 | 20 | |
5 | 18 | |
8 | 15 | |
2 | 13 | |
0 | 9 | 4.7% |
Uppercase Letter
Value | Count | Frequency (%) |
S | 49 | |
R | 3 | 5.1% |
M | 3 | 5.1% |
G | 2 | 3.4% |
A | 1 | 1.7% |
I | 1 | 1.7% |
Space Separator
Value | Count | Frequency (%) |
1334 |
Close Punctuation
Value | Count | Frequency (%) |
) | 75 |
Open Punctuation
Value | Count | Frequency (%) |
( | 75 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 8 |
Nonspacing Mark
Value | Count | Frequency (%) |
́ | 5 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 10604 | |
Common | 2517 | 19.1% |
Cyrillic | 43 | 0.3% |
Inherited | 5 | < 0.1% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 1370 | |
t | 1048 | 9.9% |
o | 1021 | 9.6% |
a | 788 | 7.4% |
r | 730 | 6.9% |
n | 656 | 6.2% |
i | 616 | 5.8% |
s | 611 | 5.8% |
l | 534 | 5.0% |
h | 350 | 3.3% |
Other values (22) | 2880 |
Common
Value | Count | Frequency (%) |
1334 | ||
, | 673 | |
) | 75 | 3.0% |
( | 75 | 3.0% |
; | 67 | 2.7% |
# | 65 | 2.6% |
3 | 28 | 1.1% |
1 | 24 | 1.0% |
9 | 24 | 1.0% |
6 | 20 | 0.8% |
Other values (14) | 132 | 5.2% |
Cyrillic
Value | Count | Frequency (%) |
о | 6 | |
к | 5 | |
и | 5 | |
н | 3 | 7.0% |
а | 3 | 7.0% |
м | 3 | 7.0% |
е | 3 | 7.0% |
в | 3 | 7.0% |
р | 2 | 4.7% |
ч | 2 | 4.7% |
Other values (8) | 8 |
Inherited
Value | Count | Frequency (%) |
́ | 5 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 13117 | |
Cyrillic | 43 | 0.3% |
Diacriticals | 5 | < 0.1% |
Punctuation | 4 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
e | 1370 | 10.4% |
1334 | 10.2% | |
t | 1048 | 8.0% |
o | 1021 | 7.8% |
a | 788 | 6.0% |
r | 730 | 5.6% |
, | 673 | 5.1% |
n | 656 | 5.0% |
i | 616 | 4.7% |
s | 611 | 4.7% |
Other values (45) | 4270 |
Cyrillic
Value | Count | Frequency (%) |
о | 6 | |
к | 5 | |
и | 5 | |
н | 3 | 7.0% |
а | 3 | 7.0% |
м | 3 | 7.0% |
е | 3 | 7.0% |
в | 3 | 7.0% |
р | 2 | 4.7% |
ч | 2 | 4.7% |
Other values (8) | 8 |
Diacriticals
Value | Count | Frequency (%) |
́ | 5 |
Punctuation
Value | Count | Frequency (%) |
… | 4 |
part of speech
Categorical
Distinct | 37 |
---|---|
Distinct (%) | 3.7% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 7.9 KiB |
noun | |
---|---|
verb | |
adjective | |
adverb | |
preposition | 37 |
Other values (32) |
Length
Max length | 26 |
---|---|
Median length | 4 |
Mean length | 5.885 |
Min length | 3 |
Characters and Unicode
Total characters | 5885 |
---|---|
Distinct characters | 24 |
Distinct categories | 5 ? |
Distinct scripts | 3 ? |
Distinct blocks | 2 ? |
Unique
Unique | 20 ? |
---|---|
Unique (%) | 2.0% |
Sample
1st row | conjunction |
---|---|
2nd row | preposition |
3rd row | particle |
4th row | pronoun |
5th row | preposition |
Common Values
Value | Count | Frequency (%) |
noun | 374 | |
verb | 232 | |
adjective | 127 | 12.7% |
adverb | 112 | 11.2% |
preposition | 37 | 3.7% |
pronoun | 36 | 3.6% |
misc | 12 | 1.2% |
conjunction | 12 | 1.2% |
cardinal number | 11 | 1.1% |
particle | 7 | 0.7% |
Other values (27) | 40 | 4.0% |
Length
Value | Count | Frequency (%) |
noun | 378 | |
verb | 234 | |
adjective | 129 | 12.3% |
adverb | 118 | 11.2% |
pronoun | 40 | 3.8% |
preposition | 39 | 3.7% |
particle | 19 | 1.8% |
number | 18 | 1.7% |
cardinal | 16 | 1.5% |
conjunction | 15 | 1.4% |
Other values (15) | 43 | 4.1% |
Most occurring characters
Value | Count | Frequency (%) |
n | 984 | |
e | 698 | |
o | 588 | |
r | 497 | |
v | 481 | |
u | 456 | |
b | 373 | 6.3% |
a | 309 | 5.3% |
i | 283 | 4.8% |
d | 268 | 4.6% |
Other values (14) | 948 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 5806 | |
Space Separator | 51 | 0.9% |
Other Punctuation | 26 | 0.4% |
Open Punctuation | 1 | < 0.1% |
Close Punctuation | 1 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
n | 984 | |
e | 698 | |
o | 588 | |
r | 497 | |
v | 481 | |
u | 456 | |
b | 373 | 6.4% |
a | 309 | 5.3% |
i | 283 | 4.9% |
d | 268 | 4.6% |
Other values (10) | 869 |
Space Separator
Value | Count | Frequency (%) |
51 |
Other Punctuation
Value | Count | Frequency (%) |
, | 26 |
Open Punctuation
Value | Count | Frequency (%) |
( | 1 |
Close Punctuation
Value | Count | Frequency (%) |
) | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 5805 | |
Common | 79 | 1.3% |
Cyrillic | 1 | < 0.1% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
n | 984 | |
e | 698 | |
o | 588 | |
r | 497 | |
v | 481 | |
u | 456 | |
b | 373 | 6.4% |
a | 309 | 5.3% |
i | 283 | 4.9% |
d | 268 | 4.6% |
Other values (9) | 868 |
Common
Value | Count | Frequency (%) |
51 | ||
, | 26 | |
( | 1 | 1.3% |
) | 1 | 1.3% |
Cyrillic
Value | Count | Frequency (%) |
с | 1 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 5884 | |
Cyrillic | 1 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
n | 984 | |
e | 698 | |
o | 588 | |
r | 497 | |
v | 481 | |
u | 456 | |
b | 373 | 6.3% |
a | 309 | 5.3% |
i | 283 | 4.8% |
d | 268 | 4.6% |
Other values (13) | 947 |
Cyrillic
Value | Count | Frequency (%) |
с | 1 |
russian | english | part of speech | |
---|---|---|---|
0 | и | and, though | conjunction |
1 | в | in, at | preposition |
2 | не | not | particle |
3 | он | he | pronoun |
4 | на | on, it, at, to | preposition |
5 | я | I | pronoun |
6 | что | what, that, why | сonjunction, pronoun |
7 | тот | that | adjective, pronoun |
8 | быть | to be | verb |
9 | с | with, and, from, of | preposition |
russian | english | part of speech | |
---|---|---|---|
990 | художник | painter, artist | noun |
991 | знак | sign | noun |
992 | завод | factory | noun |
993 | кулак | fist | noun |
994 | использовать | to use, utilize, make use of | verb |
995 | стакан | glass | noun |
996 | пахнуть | to smell | verb |
997 | отсюда | from here | adverb |
998 | рот | mouth | noun |
999 | пора | it's time;at times, now and then(See #279) | misc |