Dataset statistics
| Number of variables | 3 |
|---|---|
| Number of observations | 1000 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 23.6 KiB |
| Average record size in memory | 24.1 B |
Variable types
| Text | 2 |
|---|---|
| Categorical | 1 |
Reproduction
| Analysis started | 2023-09-12 08:35:01.032390 |
|---|---|
| Analysis finished | 2023-09-12 08:35:02.625348 |
| Duration | 1.59 second |
| Software version | ydata-profiling v0.0.dev0 |
| Download configuration | config.json |
russian
Text
| Distinct | 995 |
|---|---|
| Distinct (%) | 99.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
Length
| Max length | 19 |
|---|---|
| Median length | 13 |
| Mean length | 6.117 |
| Min length | 1 |
Characters and Unicode
| Total characters | 6117 |
|---|---|
| Distinct characters | 43 |
| Distinct categories | 7 ? |
| Distinct scripts | 3 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 990 ? |
|---|---|
| Unique (%) | 99.0% |
Sample
| 1st row | и |
|---|---|
| 2nd row | в |
| 3rd row | не |
| 4th row | он |
| 5th row | на |
| Value | Count | Frequency (%) |
| знать | 2 | 0.2% |
| много | 2 | 0.2% |
| что | 2 | 0.2% |
| мало | 2 | 0.2% |
| пора | 2 | 0.2% |
| тот | 1 | 0.1% |
| это | 1 | 0.1% |
| весь | 1 | 0.1% |
| а | 1 | 0.1% |
| с | 1 | 0.1% |
| Other values (987) | 987 |
Most occurring characters
| Value | Count | Frequency (%) |
| о | 645 | 10.5% |
| т | 526 | 8.6% |
| а | 484 | 7.9% |
| е | 395 | 6.5% |
| с | 364 | 6.0% |
| и | 345 | 5.6% |
| н | 339 | 5.5% |
| ь | 316 | 5.2% |
| р | 306 | 5.0% |
| в | 263 | 4.3% |
| Other values (33) | 2134 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 6106 | |
| Decimal Number | 3 | < 0.1% |
| Uppercase Letter | 3 | < 0.1% |
| Space Separator | 2 | < 0.1% |
| Open Punctuation | 1 | < 0.1% |
| Other Punctuation | 1 | < 0.1% |
| Close Punctuation | 1 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| о | 645 | 10.6% |
| т | 526 | 8.6% |
| а | 484 | 7.9% |
| е | 395 | 6.5% |
| с | 364 | 6.0% |
| и | 345 | 5.7% |
| н | 339 | 5.6% |
| ь | 316 | 5.2% |
| р | 306 | 5.0% |
| в | 263 | 4.3% |
| Other values (24) | 2123 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 1 | |
| М | 1 | |
| Р | 1 |
Decimal Number
| Value | Count | Frequency (%) |
| 6 | 2 | |
| 3 | 1 |
Space Separator
| Value | Count | Frequency (%) |
| 2 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 1 |
Other Punctuation
| Value | Count | Frequency (%) |
| # | 1 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Cyrillic | 6106 | |
| Common | 8 | 0.1% |
| Latin | 3 | < 0.1% |
Most frequent character per script
Cyrillic
| Value | Count | Frequency (%) |
| о | 645 | 10.6% |
| т | 526 | 8.6% |
| а | 484 | 7.9% |
| е | 395 | 6.5% |
| с | 364 | 6.0% |
| и | 345 | 5.7% |
| н | 339 | 5.6% |
| ь | 316 | 5.2% |
| р | 306 | 5.0% |
| в | 263 | 4.3% |
| Other values (25) | 2123 |
Common
| Value | Count | Frequency (%) |
| 2 | ||
| 6 | 2 | |
| ( | 1 | |
| # | 1 | |
| 3 | 1 | |
| ) | 1 |
Latin
| Value | Count | Frequency (%) |
| e | 2 | |
| S | 1 |
Most occurring blocks
| Value | Count | Frequency (%) |
| Cyrillic | 6106 | |
| ASCII | 11 | 0.2% |
Most frequent character per block
Cyrillic
| Value | Count | Frequency (%) |
| о | 645 | 10.6% |
| т | 526 | 8.6% |
| а | 484 | 7.9% |
| е | 395 | 6.5% |
| с | 364 | 6.0% |
| и | 345 | 5.7% |
| н | 339 | 5.6% |
| ь | 316 | 5.2% |
| р | 306 | 5.0% |
| в | 263 | 4.3% |
| Other values (25) | 2123 |
ASCII
| Value | Count | Frequency (%) |
| 2 | ||
| e | 2 | |
| 6 | 2 | |
| ( | 1 | |
| S | 1 | |
| # | 1 | |
| 3 | 1 | |
| ) | 1 |
english
Text
| Distinct | 961 |
|---|---|
| Distinct (%) | 96.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
Length
| Max length | 130 |
|---|---|
| Median length | 39 |
| Mean length | 13.169 |
| Min length | 1 |
Characters and Unicode
| Total characters | 13169 |
|---|---|
| Distinct characters | 75 |
| Distinct categories | 9 ? |
| Distinct scripts | 4 ? |
| Distinct blocks | 4 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 924 ? |
|---|---|
| Unique (%) | 92.4% |
Sample
| 1st row | and, though |
|---|---|
| 2nd row | in, at |
| 3rd row | not |
| 4th row | he |
| 5th row | on, it, at, to |
| Value | Count | Frequency (%) |
| to | 256 | 11.0% |
| see | 33 | 1.4% |
| be | 20 | 0.9% |
| in | 20 | 0.9% |
| as | 18 | 0.8% |
| for | 16 | 0.7% |
| come | 15 | 0.6% |
| of | 14 | 0.6% |
| the | 13 | 0.6% |
| by | 12 | 0.5% |
| Other values (1240) | 1914 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 1370 | 10.4% |
| 1334 | 10.1% | |
| t | 1048 | 8.0% |
| o | 1021 | 7.8% |
| a | 788 | 6.0% |
| r | 730 | 5.5% |
| , | 673 | 5.1% |
| n | 656 | 5.0% |
| i | 616 | 4.7% |
| s | 611 | 4.6% |
| Other values (65) | 4322 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 10588 | |
| Space Separator | 1334 | 10.1% |
| Other Punctuation | 834 | 6.3% |
| Decimal Number | 191 | 1.5% |
| Close Punctuation | 75 | 0.6% |
| Open Punctuation | 75 | 0.6% |
| Uppercase Letter | 59 | 0.4% |
| Dash Punctuation | 8 | 0.1% |
| Nonspacing Mark | 5 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 1370 | |
| t | 1048 | 9.9% |
| o | 1021 | 9.6% |
| a | 788 | 7.4% |
| r | 730 | 6.9% |
| n | 656 | 6.2% |
| i | 616 | 5.8% |
| s | 611 | 5.8% |
| l | 534 | 5.0% |
| h | 350 | 3.3% |
| Other values (34) | 2864 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 673 | |
| ; | 67 | 8.0% |
| # | 65 | 7.8% |
| ' | 11 | 1.3% |
| " | 6 | 0.7% |
| … | 4 | 0.5% |
| . | 3 | 0.4% |
| ! | 2 | 0.2% |
| ? | 2 | 0.2% |
| : | 1 | 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 28 | |
| 1 | 24 | |
| 9 | 24 | |
| 6 | 20 | |
| 4 | 20 | |
| 7 | 20 | |
| 5 | 18 | |
| 8 | 15 | |
| 2 | 13 | |
| 0 | 9 | 4.7% |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 49 | |
| R | 3 | 5.1% |
| M | 3 | 5.1% |
| G | 2 | 3.4% |
| A | 1 | 1.7% |
| I | 1 | 1.7% |
Space Separator
| Value | Count | Frequency (%) |
| 1334 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 75 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 75 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 8 |
Nonspacing Mark
| Value | Count | Frequency (%) |
| ́ | 5 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 10604 | |
| Common | 2517 | 19.1% |
| Cyrillic | 43 | 0.3% |
| Inherited | 5 | < 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 1370 | |
| t | 1048 | 9.9% |
| o | 1021 | 9.6% |
| a | 788 | 7.4% |
| r | 730 | 6.9% |
| n | 656 | 6.2% |
| i | 616 | 5.8% |
| s | 611 | 5.8% |
| l | 534 | 5.0% |
| h | 350 | 3.3% |
| Other values (22) | 2880 |
Common
| Value | Count | Frequency (%) |
| 1334 | ||
| , | 673 | |
| ) | 75 | 3.0% |
| ( | 75 | 3.0% |
| ; | 67 | 2.7% |
| # | 65 | 2.6% |
| 3 | 28 | 1.1% |
| 1 | 24 | 1.0% |
| 9 | 24 | 1.0% |
| 6 | 20 | 0.8% |
| Other values (14) | 132 | 5.2% |
Cyrillic
| Value | Count | Frequency (%) |
| о | 6 | |
| к | 5 | |
| и | 5 | |
| н | 3 | 7.0% |
| а | 3 | 7.0% |
| м | 3 | 7.0% |
| е | 3 | 7.0% |
| в | 3 | 7.0% |
| р | 2 | 4.7% |
| ч | 2 | 4.7% |
| Other values (8) | 8 |
Inherited
| Value | Count | Frequency (%) |
| ́ | 5 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 13117 | |
| Cyrillic | 43 | 0.3% |
| Diacriticals | 5 | < 0.1% |
| Punctuation | 4 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 1370 | 10.4% |
| 1334 | 10.2% | |
| t | 1048 | 8.0% |
| o | 1021 | 7.8% |
| a | 788 | 6.0% |
| r | 730 | 5.6% |
| , | 673 | 5.1% |
| n | 656 | 5.0% |
| i | 616 | 4.7% |
| s | 611 | 4.7% |
| Other values (45) | 4270 |
Cyrillic
| Value | Count | Frequency (%) |
| о | 6 | |
| к | 5 | |
| и | 5 | |
| н | 3 | 7.0% |
| а | 3 | 7.0% |
| м | 3 | 7.0% |
| е | 3 | 7.0% |
| в | 3 | 7.0% |
| р | 2 | 4.7% |
| ч | 2 | 4.7% |
| Other values (8) | 8 |
Diacriticals
| Value | Count | Frequency (%) |
| ́ | 5 |
Punctuation
| Value | Count | Frequency (%) |
| … | 4 |
part of speech
Categorical
| Distinct | 37 |
|---|---|
| Distinct (%) | 3.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| noun | |
|---|---|
| verb | |
| adjective | |
| adverb | |
| preposition | 37 |
| Other values (32) |
Length
| Max length | 26 |
|---|---|
| Median length | 4 |
| Mean length | 5.885 |
| Min length | 3 |
Characters and Unicode
| Total characters | 5885 |
|---|---|
| Distinct characters | 24 |
| Distinct categories | 5 ? |
| Distinct scripts | 3 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 20 ? |
|---|---|
| Unique (%) | 2.0% |
Sample
| 1st row | conjunction |
|---|---|
| 2nd row | preposition |
| 3rd row | particle |
| 4th row | pronoun |
| 5th row | preposition |
Common Values
| Value | Count | Frequency (%) |
| noun | 374 | |
| verb | 232 | |
| adjective | 127 | 12.7% |
| adverb | 112 | 11.2% |
| preposition | 37 | 3.7% |
| pronoun | 36 | 3.6% |
| misc | 12 | 1.2% |
| conjunction | 12 | 1.2% |
| cardinal number | 11 | 1.1% |
| particle | 7 | 0.7% |
| Other values (27) | 40 | 4.0% |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| noun | 378 | |
| verb | 234 | |
| adjective | 129 | 12.3% |
| adverb | 118 | 11.2% |
| pronoun | 40 | 3.8% |
| preposition | 39 | 3.7% |
| particle | 19 | 1.8% |
| number | 18 | 1.7% |
| cardinal | 16 | 1.5% |
| conjunction | 15 | 1.4% |
| Other values (15) | 43 | 4.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 984 | |
| e | 698 | |
| o | 588 | |
| r | 497 | |
| v | 481 | |
| u | 456 | |
| b | 373 | 6.3% |
| a | 309 | 5.3% |
| i | 283 | 4.8% |
| d | 268 | 4.6% |
| Other values (14) | 948 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 5806 | |
| Space Separator | 51 | 0.9% |
| Other Punctuation | 26 | 0.4% |
| Open Punctuation | 1 | < 0.1% |
| Close Punctuation | 1 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| n | 984 | |
| e | 698 | |
| o | 588 | |
| r | 497 | |
| v | 481 | |
| u | 456 | |
| b | 373 | 6.4% |
| a | 309 | 5.3% |
| i | 283 | 4.9% |
| d | 268 | 4.6% |
| Other values (10) | 869 |
Space Separator
| Value | Count | Frequency (%) |
| 51 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 26 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 1 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 5805 | |
| Common | 79 | 1.3% |
| Cyrillic | 1 | < 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| n | 984 | |
| e | 698 | |
| o | 588 | |
| r | 497 | |
| v | 481 | |
| u | 456 | |
| b | 373 | 6.4% |
| a | 309 | 5.3% |
| i | 283 | 4.9% |
| d | 268 | 4.6% |
| Other values (9) | 868 |
Common
| Value | Count | Frequency (%) |
| 51 | ||
| , | 26 | |
| ( | 1 | 1.3% |
| ) | 1 | 1.3% |
Cyrillic
| Value | Count | Frequency (%) |
| с | 1 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 5884 | |
| Cyrillic | 1 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| n | 984 | |
| e | 698 | |
| o | 588 | |
| r | 497 | |
| v | 481 | |
| u | 456 | |
| b | 373 | 6.3% |
| a | 309 | 5.3% |
| i | 283 | 4.8% |
| d | 268 | 4.6% |
| Other values (13) | 947 |
Cyrillic
| Value | Count | Frequency (%) |
| с | 1 |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
| russian | english | part of speech | |
|---|---|---|---|
| 0 | и | and, though | conjunction |
| 1 | в | in, at | preposition |
| 2 | не | not | particle |
| 3 | он | he | pronoun |
| 4 | на | on, it, at, to | preposition |
| 5 | я | I | pronoun |
| 6 | что | what, that, why | сonjunction, pronoun |
| 7 | тот | that | adjective, pronoun |
| 8 | быть | to be | verb |
| 9 | с | with, and, from, of | preposition |
| russian | english | part of speech | |
|---|---|---|---|
| 990 | художник | painter, artist | noun |
| 991 | знак | sign | noun |
| 992 | завод | factory | noun |
| 993 | кулак | fist | noun |
| 994 | использовать | to use, utilize, make use of | verb |
| 995 | стакан | glass | noun |
| 996 | пахнуть | to smell | verb |
| 997 | отсюда | from here | adverb |
| 998 | рот | mouth | noun |
| 999 | пора | it's time;at times, now and then(See #279) | misc |