Overview

Dataset statistics

Number of variables5
Number of observations189
Missing cells188
Missing cells (%)19.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.5 KiB
Average record size in memory40.7 B

Variable types

URL1
Categorical2
DateTime1
Text1

Alerts

notes has constant value ""Constant
source is highly imbalanced (81.6%)Imbalance
notes has 188 (99.5%) missing valuesMissing
url has unique valuesUnique

Reproduction

Analysis started2023-09-12 08:35:44.234938
Analysis finished2023-09-12 08:35:45.846111
Duration1.61 second
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

url
URL

UNIQUE 

Distinct189
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
http://abrahadesta.wordpress.com/
 
1
http://www.ocha-eth.org/
 
1
http://www.medhin.org/
 
1
http://www.mediaethiopia.com/
 
1
http://www.mediaethiopia.com/blog/
 
1
Other values (184)
184 
ValueCountFrequency (%)
http://abrahadesta.wordpress.com/ 1
 
0.5%
http://www.ocha-eth.org/ 1
 
0.5%
http://www.medhin.org/ 1
 
0.5%
http://www.mediaethiopia.com/ 1
 
0.5%
http://www.mediaethiopia.com/blog/ 1
 
0.5%
http://www.mereja.com/ 1
 
0.5%
http://www.mesfinwoldemariam.org/ 1
 
0.5%
http://www.meskelsquare.com/ 1
 
0.5%
http://www.nazret.com/ 1
 
0.5%
http://www.nazret.com/news/view_amharic.php?feed=5&how=paged&what=all 1
 
0.5%
Other values (179) 179
94.7%
ValueCountFrequency (%)
http 173
91.5%
https 16
 
8.5%
ValueCountFrequency (%)
nazret.com 8
 
4.2%
www.cafpde.org 3
 
1.6%
www.hrw.org 3
 
1.6%
www.ethpress.gov.et 2
 
1.1%
web.worldbank.org 2
 
1.1%
www.tzta.ca 2
 
1.1%
www.twitter.com 2
 
1.1%
www.aeup.org 2
 
1.1%
www.aigaforum.com 2
 
1.1%
www.torproject.org 2
 
1.1%
Other values (134) 161
85.2%
ValueCountFrequency (%)
/ 127
67.2%
/blog/index.php 7
 
3.7%
/index.html 2
 
1.1%
/index.htm 2
 
1.1%
/tzta/english.htm 1
 
0.5%
/doc 1
 
0.5%
/ethiopia/ 1
 
0.5%
/public/english/region/afpro/addisababa/ethiopia.htm 1
 
0.5%
/external/country/ETH/index.htm 1
 
0.5%
/research-publications/speaksafe-media-workers-toolkit-safer-online-and-mobile-practices 1
 
0.5%
Other values (45) 45
 
23.8%
ValueCountFrequency (%)
174
92.1%
blog=12 1
 
0.5%
blog=13 1
 
0.5%
blog=14 1
 
0.5%
blog=15 1
 
0.5%
blog=16 1
 
0.5%
blog=7 1
 
0.5%
blog=9 1
 
0.5%
c=ethiop&t=africa 1
 
0.5%
feed=5&how=paged&what=all 1
 
0.5%
Other values (6) 6
 
3.2%
ValueCountFrequency (%)
188
99.5%
ethiopia 1
 
0.5%

category_code
Categorical

Distinct15
Distinct (%)7.9%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
NEWS
65 
HUMR
45 
POLR
32 
ECON
13 
ANON
Other values (10)
26 

Length

Max length5
Median length4
Mean length4
Min length3

Characters and Unicode

Total characters756
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)2.1%

Sample

1st rowCULTR
2nd rowNEWS
3rd rowMISC
4th rowMISC
5th rowNEWS

Common Values

ValueCountFrequency (%)
NEWS 65
34.4%
HUMR 45
23.8%
POLR 32
16.9%
ECON 13
 
6.9%
ANON 8
 
4.2%
CULTR 7
 
3.7%
XED 5
 
2.6%
MISC 3
 
1.6%
HOST 3
 
1.6%
MILX 2
 
1.1%
Other values (5) 6
 
3.2%

Length

2023-09-12T09:35:45.960445image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
news 65
34.4%
humr 45
23.8%
polr 32
16.9%
econ 13
 
6.9%
anon 8
 
4.2%
cultr 7
 
3.7%
xed 5
 
2.6%
misc 3
 
1.6%
host 3
 
1.6%
milx 2
 
1.1%
Other values (5) 6
 
3.2%

Most occurring characters

ValueCountFrequency (%)
N 95
12.6%
R 86
11.4%
E 85
11.2%
S 72
9.5%
W 65
8.6%
O 56
7.4%
U 54
7.1%
H 51
6.7%
M 50
6.6%
L 42
5.6%
Other values (11) 100
13.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 756
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 95
12.6%
R 86
11.4%
E 85
11.2%
S 72
9.5%
W 65
8.6%
O 56
7.4%
U 54
7.1%
H 51
6.7%
M 50
6.6%
L 42
5.6%
Other values (11) 100
13.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 756
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 95
12.6%
R 86
11.4%
E 85
11.2%
S 72
9.5%
W 65
8.6%
O 56
7.4%
U 54
7.1%
H 51
6.7%
M 50
6.6%
L 42
5.6%
Other values (11) 100
13.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 756
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 95
12.6%
R 86
11.4%
E 85
11.2%
S 72
9.5%
W 65
8.6%
O 56
7.4%
U 54
7.1%
H 51
6.7%
M 50
6.6%
L 42
5.6%
Other values (11) 100
13.2%
Distinct6
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
Minimum2014-04-15 00:00:00
Maximum2018-04-10 00:00:00
2023-09-12T09:35:46.098584image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-09-12T09:35:46.235584image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)

source
Categorical

IMBALANCE 

Distinct5
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
citizenlab
178 
CIPIT
 
4
OONI
 
4
BBC
 
2
defenddefenders
 
1

Length

Max length15
Median length10
Mean length9.7195767
Min length3

Characters and Unicode

Total characters1837
Distinct characters20
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st rowcitizenlab
2nd rowcitizenlab
3rd rowcitizenlab
4th rowcitizenlab
5th rowcitizenlab

Common Values

ValueCountFrequency (%)
citizenlab 178
94.2%
CIPIT 4
 
2.1%
OONI 4
 
2.1%
BBC 2
 
1.1%
defenddefenders 1
 
0.5%

Length

2023-09-12T09:35:46.386449image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-12T09:35:46.525757image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
citizenlab 178
94.2%
cipit 4
 
2.1%
ooni 4
 
2.1%
bbc 2
 
1.1%
defenddefenders 1
 
0.5%

Most occurring characters

ValueCountFrequency (%)
i 356
19.4%
e 183
10.0%
n 180
9.8%
c 178
9.7%
t 178
9.7%
z 178
9.7%
l 178
9.7%
a 178
9.7%
b 178
9.7%
I 12
 
0.7%
Other values (10) 38
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1795
97.7%
Uppercase Letter 42
 
2.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 356
19.8%
e 183
10.2%
n 180
10.0%
c 178
9.9%
t 178
9.9%
z 178
9.9%
l 178
9.9%
a 178
9.9%
b 178
9.9%
d 4
 
0.2%
Other values (3) 4
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
I 12
28.6%
O 8
19.0%
C 6
14.3%
P 4
 
9.5%
T 4
 
9.5%
N 4
 
9.5%
B 4
 
9.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 1837
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 356
19.4%
e 183
10.0%
n 180
9.8%
c 178
9.7%
t 178
9.7%
z 178
9.7%
l 178
9.7%
a 178
9.7%
b 178
9.7%
I 12
 
0.7%
Other values (10) 38
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1837
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 356
19.4%
e 183
10.0%
n 180
9.8%
c 178
9.7%
t 178
9.7%
z 178
9.7%
l 178
9.7%
a 178
9.7%
b 178
9.7%
I 12
 
0.7%
Other values (10) 38
 
2.1%

notes
Text

CONSTANT  MISSING 

Distinct1
Distinct (%)100.0%
Missing188
Missing (%)99.5%
Memory size1.6 KiB
2023-09-12T09:35:46.650959image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length18
Median length18
Mean length18
Min length18

Characters and Unicode

Total characters18
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)100.0%

Sample

1st rowReportedly blocked
ValueCountFrequency (%)
reportedly 1
50.0%
blocked 1
50.0%
2023-09-12T09:35:46.921851image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 3
16.7%
o 2
11.1%
d 2
11.1%
l 2
11.1%
R 1
 
5.6%
p 1
 
5.6%
r 1
 
5.6%
t 1
 
5.6%
y 1
 
5.6%
1
 
5.6%
Other values (3) 3
16.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 16
88.9%
Uppercase Letter 1
 
5.6%
Space Separator 1
 
5.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 3
18.8%
o 2
12.5%
d 2
12.5%
l 2
12.5%
p 1
 
6.2%
r 1
 
6.2%
t 1
 
6.2%
y 1
 
6.2%
b 1
 
6.2%
c 1
 
6.2%
Uppercase Letter
ValueCountFrequency (%)
R 1
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 17
94.4%
Common 1
 
5.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 3
17.6%
o 2
11.8%
d 2
11.8%
l 2
11.8%
R 1
 
5.9%
p 1
 
5.9%
r 1
 
5.9%
t 1
 
5.9%
y 1
 
5.9%
b 1
 
5.9%
Other values (2) 2
11.8%
Common
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 3
16.7%
o 2
11.1%
d 2
11.1%
l 2
11.1%
R 1
 
5.6%
p 1
 
5.6%
r 1
 
5.6%
t 1
 
5.6%
y 1
 
5.6%
1
 
5.6%
Other values (3) 3
16.7%

Correlations

2023-09-12T09:35:47.027912image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
category_codesource
category_code1.0000.100
source0.1001.000

Missing values

2023-09-12T09:35:45.661606image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-09-12T09:35:45.789212image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

urlcategory_codedate_addedsourcenotes
0http://abrahadesta.wordpress.com/CULTR2014-04-15citizenlabNaN
1http://aljazeera.net/NEWS2014-04-15citizenlabNaN
2http://am.wikipedia.org/MISC2014-04-15citizenlabNaN
3http://am.wikipedia.org/wiki/%E1%8B%8B%E1%8A%93%E1%8B%8D_%E1%8C%88%E1%8C%BDMISC2014-04-15citizenlabNaN
4http://amharic.voanews.com/NEWS2014-04-15citizenlabNaN
5http://ancientgebts.org/HUMR2014-04-15citizenlabNaN
6http://carpediemethiopia.blogspot.com/POLR2014-04-15citizenlabNaN
7http://citizenlab.org/NEWS2014-04-15citizenlabNaN
8http://cpj.org/NEWS2014-04-15citizenlabNaN
9http://egoportal.blogspot.com/POLR2014-04-15citizenlabNaN
urlcategory_codedate_addedsourcenotes
179https://www.citizenlab.org/NEWS2014-04-15citizenlabNaN
180https://www.dropbox.com/s/n65b3d67f82asn2/Leaked%20National%20Entrance%20Exam_English.pdf?dl=0FILE2016-05-30OONINaN
181https://www.facebook.com/JawarmdNEWS2016-05-30OONINaN
182https://www.facebook.com/pages/Addis-Neger/49967100821NEWS2014-04-15citizenlabNaN
183https://www.hrw.org/HUMR2014-04-15citizenlabNaN
184https://www.mereja.com/NEWS2016-09-09CIPITNaN
185https://www.oromiamedia.org/NEWS2016-05-30OONINaN
186https://www.privacyinternational.org/HUMR2014-04-15citizenlabNaN
187https://www.torproject.org/NEWS2014-04-15citizenlabNaN
188https://www.twitter.com/HOST2014-04-15citizenlabNaN