Subtlex - Arabic Lexical Database - Upload Word List
Beta
Upload Word List
Using Database: Subtlex
Using Unicode: Latin
86658 types117508475 tokens
Text Input:
or Upload File:
<<Maximum: 10,000 rows
Collapse/Expand All
Collapse/Expand All Help
Word Frequency
Reset
Count
Help
cnt
cnt
is the count of the word tokens in the corpus.
Current minimum value:
50
Current maximum value:
2436614
Current average value:
1356.0183
Log Count
Help
log_cnt
log_cnt
is the log10(
cnt
+1). It is the best value to use if you want to match words based on word frequency.
Current minimum value:
1.47
Current maximum value:
6.39
Current average value:
2.3379
Word frequency per million
Help
frq
frq
is the word frequency per million words and it is a standard measure independent of the corpus size. It is defined as the number of times the word appears in the Subtlex corpus divided by the total count of the Subtlex corpus words multiplied by one million
Current minimum value:
0.23
Current maximum value:
19705.2
Current average value:
10.9713
Log Frequency
Help
log_frq
log_frq
is the log10(
frq
+1).
Current minimum value:
0.09
Current maximum value:
4.3
Current average value:
0.507
Log Frequency N
Help
log_frqN
log_frqN
is the log10(
frq
+ 1/N), where N = the number of words in the corpus expressed in millions.
Current minimum value:
-0.63
Current maximum value:
4.3
Current average value:
0.2437
Zipf Scale
Help
zipf
zipf
is the log10(frq)+3, and it is a logarithmic scale with frequency values that allow for cross-linguistic studies and straightforward frequency ranking.
Current minimum value:
2.37
Current maximum value:
7.3
Current average value:
3.2447
Lemma Information
Reset
Diacritic form
Help
diac
diac
represents tokens with diacritics indicating the vowel sound
POS Tag
Help
pos
pos
is the parts of speech tag of
diac
Lemma
Help
lemma
lemma
is the basic or dictionary form of the word tokens
count of all words that have the same
lemma
Help
all_lem_cnt
all_lem_cnt
is the sum of
cnt
of all words that have the same
lemma
as any of the
lemma
s of this word
Current minimum value:
0
Current maximum value:
201
Current average value:
19.4957
frequency per million of
all_lem_cnt
Help
all_lem_frq
all_lem_frq
is the frequency per million of
all_lem_cnt
Current minimum value:
0
Current maximum value:
1.64
Current average value:
0.1631
log10(
all_lem_cnt
)
Help
all_lem_log_cnt
all_lem_log_cnt
is the log10(
all_lem_cnt
)
Current minimum value:
0
Current maximum value:
2.31
Current average value:
0.914
Orthographic Structure
Reset
Number of letters
Help
num_letters
num_letters
is the number of characters in the word.
Repeated letters
Help
rep_letter
rep_letter
is a simple
yes/no
value (
TRUE/FALSE
in the database) indicating if any letter is repeated within the word.
Orthographic Neighborhoods
Reset
Number of substitution neighbors
Help
n
n
is the number of words in the ledegreexicon that are orthographically the same save for one letter.
Current minimum value:
0
Current maximum value:
55
Current average value:
5.4763
Number of higher frequency substitution neighbors
Help
nhf
nhf
is the number of substitution neighbors with a higher
frq
than this word.
Current minimum value:
0
Current maximum value:
43
Current average value:
2.7296
Frequency of the highest frequency substitution neighbor
Help
frq_hf_s
frq_hf_s
is the highest
frq
among the set of substitution neighbors of this word.
Current minimum value:
0
Current maximum value:
19705.2
Current average value:
106.3392
Highest frequency substitution neighbor
Help
hf_s
hf_s
is the word that is the highest frequency substitution neighbor of this word.
List of higher-frequency substitution neighbors
Help
hf_s_list
hf_s_list
is a comma-separated list of substitution neighbors. The list is ordered by descending frequency, with the position of this word indicated by 'XXX'.
Number of positions with higher frequency substitution neighbors
Help
hpf
hpf
is the number of positions across the word at which substitution neighbors with higher frequency (
frq
) than the item itself can be formed.
Current minimum value:
0
Current maximum value:
6
Current average value:
1.3167
Average frequency of substitution neighbors
Help
avg_frq_ns
avg_frq_ns
is the average of the frequency (
frq
) of the substitution neighbors.
Current minimum value:
0
Current maximum value:
19705.2
Current average value:
30.6694
Number of transposed-letter neighbors
Help
n_tl
n_tl
is the number of neighbors found by transposing two adjacent letters.
Current minimum value:
0
Current maximum value:
7
Current average value:
0.3976
Frequency of the highest frequency transposed-letter neighbor
Help
frq_hf_tl
frq_hf_tl
is the frequency (
frq
) of the highest frequency transposed-letter neighbor.
Current minimum value:
0
Current maximum value:
19705.2
Current average value:
6.9563
Highest frequency transposed-letter neighbor
Help
hf_tl
hf_tl
is the word that is the highest frequency transposed-letter neighbor.
List of higher-frequency transposed-letter neighbors
Help
hf_tl_list
hf_tl_list
is a comma-separated list of transposed-letter neighbors. The list is ordered by descending frequency, with the position of this word indicated by 'XXX'.
Number of addition-letter neighbors
Help
n_a
n_a
is the number of words found by adding a letter.
Current minimum value:
0
Current maximum value:
47
Current average value:
1.4097
Frequency of the highest frequency addition-letter neighbor
Help
frq_hf_a
frq_hf_a
is the frequency (
frq
) of the highest frequency addition-letter neighbor.
Current minimum value:
0
Current maximum value:
19705.2
Current average value:
11.6146
Highest frequency addition-letter neighbor
Help
hf_a
hf_a
is the word that is the highest frequency addition-letter neighbor.
List of higher frequency addition-letter neighbors
Help
hf_a_list
hf_a_list
is a comma-separated list of addition-letter neighbors. The list is ordered by descending frequency, with the position of this word indicated by 'XXX'.
Number of deletion-letter neighbors
Help
n_d
n_d
is the number of words that can be found by deleting a letter.
Current minimum value:
0
Current maximum value:
5
Current average value:
1.4097
Frequency of the highest frequency deletion-letter neighbor
Help
frq_hf_d
frq_hf_d
is the frequency (
frq
) of the highest frequency deletion-letter neighbor.
Current minimum value:
0
Current maximum value:
19705.2
Current average value:
107.1428
Highest frequency deletion-letter neighbor
Help
hf_d
hf_d
is the word that is the highest frequency deletion-letter neighbor.
List of higher-frequency deletion-letter neighbors
Help
hf_d_list
hf_d_list
is a comma-separated list of deletion-letter neighbors. The list is ordered by descending frequency, with the position of this word indicated by 'XXX'.
Levenshtein's distance
Help
Lev_N
Lev_N
is to the level of similarity in terms of number of deletions, insertions, or substitutions required to transform one word into another. The cost of each operation is 1 (cost of transposition-letters is 2). The metric is calculated as a mean of the 20 nearest neighbors (OLD20).
Current minimum value:
0
Current maximum value:
2
Current average value:
0.9623
Bigram frequency (BOF) options
Reset
Sum of bigram frequency
Help
sbof
sbof
Current minimum value:
0
Current maximum value:
458092
Current average value:
71619.944
Mean bigram frequency
Help
mbof
mbof
Current minimum value:
0
Current maximum value:
143930
Current average value:
14069.3515
bigram: 1
Token Bigram 1
bgram1
Token bigram frequency
bgram_frq1
min:
0
max:
143930
bigram: 2
Token Bigram 2
bgram2
Token bigram frequency
bgram_frq2
min:
0
max:
143930
bigram: 3
Token Bigram 3
bgram3
Token bigram frequency
bgram_frq3
min:
0
max:
143930
bigram: 4
Token Bigram 4
bgram4
Token bigram frequency
bgram_frq4
min:
0
max:
143930
bigram: 5
Token Bigram 5
bgram5
Token bigram frequency
bgram_frq5
min:
0
max:
143930
bigram: 6
Token Bigram 6
bgram6
Token bigram frequency
bgram_frq6
min:
0
max:
143930
bigram: 7
Token Bigram 7
bgram7
Token bigram frequency
bgram_frq7
min:
0
max:
143930
bigram: 8
Token Bigram 8
bgram8
Token bigram frequency
bgram_frq8
min:
0
max:
143930
bigram: 9
Token Bigram 9
bgram9
Token bigram frequency
bgram_frq9
min:
0
max:
143930
bigram: 10
Token Bigram 10
bgram10
Token bigram frequency
bgram_frq10
min:
0
max:
46400.9
bigram: 11
Token Bigram 11
bgram11
Token bigram frequency
bgram_frq11
min:
0
max:
46400.9
bigram: 12
Token Bigram 12
bgram12
Token bigram frequency
bgram_frq12
min:
0
max:
40163.6
bigram: 13
Token Bigram 13
bgram13
Token bigram frequency
bgram_frq13
min:
0
max:
22859.2
bigram: 14
Token Bigram 14
bgram14
Token bigram frequency
bgram_frq14
min:
0
max:
0
Trigram frequency (TOF) options
Reset
Sum of trigram frequency
Help
stof
stof
Current minimum value:
0
Current maximum value:
52799.6
Current average value:
3710.1582
Mean Trigram frequency
Help
mtof
mtof
Current minimum value:
0
Current maximum value:
15073.9
Current average value:
874.8941
trigram: 1
Token Trigram 1
trigram1
Token trigram frequency
tgram_frq1
min:
0
max:
19539.8
trigram: 2
Token Trigram 2
trigram2
Token trigram frequency
tgram_frq2
min:
0
max:
19539.8
trigram: 3
Token Trigram 3
trigram3
Token trigram frequency
tgram_frq3
min:
0
max:
19539.8
trigram: 4
Token Trigram 4
trigram4
Token trigram frequency
tgram_frq4
min:
0
max:
19539.8
trigram: 5
Token Trigram 5
trigram5
Token trigram frequency
tgram_frq5
min:
0
max:
19539.8
trigram: 6
Token Trigram 6
trigram6
Token trigram frequency
tgram_frq6
min:
0
max:
19539.8
trigram: 7
Token Trigram 7
trigram7
Token trigram frequency
tgram_frq7
min:
0
max:
10795.4
trigram: 8
Token Trigram 8
trigram8
Token trigram frequency
tgram_frq8
min:
0
max:
5993.37
trigram: 9
Token Trigram 9
trigram9
Token trigram frequency
tgram_frq9
min:
0
max:
3447.48
trigram: 10
Token Trigram 10
trigram10
Token trigram frequency
tgram_frq10
min:
0
max:
1592
trigram: 11
Token Trigram 11
trigram11
Token trigram frequency
tgram_frq11
min:
0
max:
1934.89
trigram: 12
Token Trigram 12
trigram12
Token trigram frequency
tgram_frq12
min:
0
max:
107.71
trigram: 13
Token Trigram 13
trigram13
Token trigram frequency
tgram_frq13
min:
0
max:
0