Subtlex - Arabic Lexical Database - Set Constraints
Beta
Set Constraints
Using Database: Subtlex
Using Unicode: Latin
86658 types117508475 tokens
Collapse/Expand All
Collapse/Expand All Help
Word frequency
Reset
Word frequency per million
Help & Constraints
frq
frq
is the word frequency per million words and it is a standard measure independent of the corpus size. It is defined as the number of times the word appears in the Subtlex corpus divided by the total count of the Subtlex corpus words multiplied by one million.
Current minimum value:
0.23
Current maximum value:
19705.2
Constraints
Minumum Value:
Maximum Value:
Log Frequency
Help & Constraints
log_frq
log_frq
is the log10(
frq
+1).
Current minimum value:
0.09
Current maximum value:
4.3
Constraints
Minumum Value:
Maximum Value:
Count
Help & Constraints
cnt
cnt
is the count of the word tokens in the corpus.
Current minimum value:
50
Current maximum value:
2436614
Constraints
Minumum Value:
Maximum Value:
Log Count
Help & Constraints
log_cnt
log_cnt
is the log10(
cnt
+1). It is the best value to use if you want to match words based on word frequency.
Current minimum value:
1.47
Current maximum value:
6.39
Constraints
Minumum Value:
Maximum Value:
Log Frequency N
Help & Constraints
log_frqN
log_frqN
is the log10(
frq
+ 1/N), where N = the number of words in the corpus expressed in millions.
Current minimum value:
-0.63
Current maximum value:
4.3
Constraints
Minumum Value:
Maximum Value:
Zipf Scale
Help & Constraints
zipf
zipf
is the log10(frq)+3, and it is a logarithmic scale with frequency values that allow for cross-linguistic studies and straightforward frequency ranking.
Current minimum value:
2.37
Current maximum value:
7.3
Constraints
Minumum Value:
Maximum Value:
Lemma Information
Reset
Diacritic form
Help & Constraints
diac
diac
represents tokens with diacritics indicating the vowel sound
Search Latin pattern
POS tag
Help & Constraints
pos
pos
is the parts of speech tag of
diac
Search pattern
Lemma
Help & Constraints
lemma
lemma
is the basic or dictionary form of the word tokens
Search Latin pattern
count of all words that have the same
lemma
as any of the
lemma
s of this word
Help & Constraints
all_lem_cnt
all_lem_cnt
is the sum of
cnt
of all words that have the same
lemma
as any of the
lemma
s of this word
Current minimum value:
0
Current maximum value:
201
Constraints
Minumum Value:
Maximum Value:
frequency per million of
all_lem_cnt
Help & Constraints
all_lem_frq
all_lem_frq
is the frequency per million of
all_lem_cnt
Current minimum value:
0
Current maximum value:
1.64
Constraints
Minumum Value:
Maximum Value:
log10(
all_lem_cnt
)
Help & Constraints
all_lem_log_cnt
all_lem_log_cnt
is the log10(
all_lem_cnt
)
Current minimum value:
0
Current maximum value:
2.31
Constraints
Minumum Value:
Maximum Value:
Orthographic Structure
Reset
Number of letters
Help & Constraints
num_letters
num_letters
is the number of characters in the word.
Current minimum value:
1
Current maximum value:
14
Constraints
Minumum Value:
Maximum Value:
Repeated letter
Help & Constraints
rep_letter
rep_letter
is defined as a simple
yes/no
value (
1/0
in the database) that reflects repetition of characters composing the word.
Constraints
Yes:
No:
Both:
Orthographic Neighborhoods
Reset
Number of substitution neighbors
Help & Constraints
n
n
is the number of words in the ledegreexicon that are orthographically the same save for one letter.
Current minimum value:
0
Current maximum value:
55
Constraints
Minumum Value:
Maximum Value:
Number of higher frequency substitution neighbors
Help & Constraints
nhf
nhf
is the number of substitution neighbors with a higher
frq
than this word.
Current minimum value:
0
Current maximum value:
43
Constraints
Minumum Value:
Maximum Value:
Frequency of the highest frequency substitution neighbor
Help & Constraints
frq_hf_s
frq_hf_s
is the highest
frq
among the set of substitution neighbors of this word.
Current minimum value:
0
Current maximum value:
19705.2
Constraints
Minumum Value:
Maximum Value:
List of higher-frequency substitution neighbors
Help & Constraints
hf_s_list
hf_s_list
is a comma-separated list of substitution neighbors. The list is ordered by descending frequency, with the position of this word indicated by 'xxx'.
To match a whole word in the list, put a comma(",") after your search string followed by a "%".
An example for Latin search pattern: ktb,%
Search Latin pattern
Number of positions with higher frequency substitution neighbors
Help & Constraints
hpf
hpf
is the number of positions across the word at which substitution neighbors with higher frequency (
frq
) than the item itself can be formed.
Current minimum value:
0
Current maximum value:
6
Constraints
Minumum Value:
Maximum Value:
Average frequency of substitution neighbors
Help & Constraints
avg_frq_ns
avg_frq_ns
is the average of the frequency (
frq
) of the substitution neighbors.
Current minimum value:
0
Current maximum value:
19705.2
Constraints
Minumum Value:
Maximum Value:
Number of transposed-letter neighbors
Help & Constraints
n_tl
n_tl
is the number of neighbors found by transposing two adjacent letters.
Current minimum value:
0
Current maximum value:
7
Constraints
Minumum Value:
Maximum Value:
Frequency of the highest frequency transposed-letter neighbor
Help & Constraints
frq_hf_tl
frq_hf_tl
is the frequency (
frq
) of the highest frequency transposed-letter neighbor.
Current minimum value:
0
Current maximum value:
19705.2
Constraints
Minumum Value:
Maximum Value:
List of higher-frequency transposed-letter neighbors
Help & Constraints
hf_tl_list
hf_tl_list
is a comma-separated list of transposed-letter neighbors. The list is ordered by descending frequency, with the position of this word indicated by 'xxx'.
Search Latin pattern
Number of addition-letter neighbors
Help & Constraints
n_a
n_a
is the number of words found by adding a letter.
Current minimum value:
0
Current maximum value:
47
Constraints
Minumum Value:
Maximum Value:
Frequency of the highest frequency addition-letter neighbor
Help & Constraints
frq_hf_a
frq_hf_a
is the frequency (
frq
) of the highest frequency addition-letter neighbor.
Current minimum value:
0
Current maximum value:
19705.2
Constraints
Minumum Value:
Maximum Value:
List of higher frequency addition-letter neighbors
Help & Constraints
hf_a_list
hf_a_list
is a comma-separated list of addition-letter neighbors. The list is ordered by descending frequency, with the position of this word indicated by 'xxx'.
Search Latin pattern
Number of deletion-letter neighbors
Help & Constraints
n_d
n_d
is the number of words that can be found by deleting a letter.
Current minimum value:
0
Current maximum value:
5
Constraints
Minumum Value:
Maximum Value:
Frequency of the highest frequency deletion-letter neighbor
Help & Constraints
frq_hf_d
frq_hf_d
is the frequency (
frq
) of the highest frequency deletion-letter neighbor.
Current minimum value:
0
Current maximum value:
19705.2
Constraints
Minumum Value:
Maximum Value:
List of higher-frequency deletion-letter neighbors
Help & Constraints
hf_d_list
hf_d_list
is a comma list of deletion-letter neighbors. The list is ordered by descending frequency, with the position of this word indicated by 'xxx'.
Search Latin pattern
Levenshtein's distance
Help & Constraints
Lev_N
Lev_N
is to the level of similarity in terms of number of deletions, insertions, or substitutions required to transform one word into another. The cost of each operation is 1 (cost of transposition-letters is 2). The metric is calculated as a mean of the 20 nearest neighbors (OLD20). It is given with two digits precision.
Current minimum value:
0
Current maximum value:
2
Constraints
Minumum Value:
Maximum Value:
Options for frequency of bigrams
Reset
Sum of bigram frequency
Help & Constraints
sbof
sbof
Current minimum value:
0
Current maximum value:
458092
Constraints
Minumum Value:
Maximum Value:
Mean bigram frequency
Help & Constraints
mbof
mbof
Current minimum value:
0
Current maximum value:
143930
Constraints
Minumum Value:
Maximum Value:
bigram: 1
Token bigram frequency
bgram_frq1
min:
0
max:
143930
Constraints
Min:
Max:
bigram: 2
Token bigram frequency
bgram_frq2
min:
0
max:
143930
Constraints
Min:
Max:
bigram: 3
Token bigram frequency
bgram_frq3
min:
0
max:
143930
Constraints
Min:
Max:
bigram: 4
Token bigram frequency
bgram_frq4
min:
0
max:
143930
Constraints
Min:
Max:
bigram: 5
Token bigram frequency
bgram_frq5
min:
0
max:
143930
Constraints
Min:
Max:
bigram: 6
Token bigram frequency
bgram_frq6
min:
0
max:
143930
Constraints
Min:
Max:
bigram: 7
Token bigram frequency
bgram_frq7
min:
0
max:
143930
Constraints
Min:
Max:
bigram: 8
Token bigram frequency
bgram_frq8
min:
0
max:
143930
Constraints
Min:
Max:
bigram: 9
Token bigram frequency
bgram_frq9
min:
0
max:
143930
Constraints
Min:
Max:
bigram: 10
Token bigram frequency
bgram_frq10
min:
0
max:
46400.9
Constraints
Min:
Max:
bigram: 11
Token bigram frequency
bgram_frq11
min:
0
max:
46400.9
Constraints
Min:
Max:
bigram: 12
Token bigram frequency
bgram_frq12
min:
0
max:
40163.6
Constraints
Min:
Max:
bigram: 13
Token bigram frequency
bgram_frq13
min:
0
max:
22859.2
Constraints
Min:
Max:
bigram: 14
Token bigram frequency
bgram_frq14
min:
0
max:
0
Constraints
Min:
Max:
Options for frequency of trigrams
Reset
Sum of trigram frequency
Help & Constraints
stof
stof
Current minimum value:
0
Current maximum value:
52799.6
Constraints
Minumum Value:
Maximum Value:
Mean trigram frequency
Help & Constraints
mtof
mtof
Current minimum value:
0
Current maximum value:
15073.9
Constraints
Minumum Value:
Maximum Value:
trigram: 1
Token trigram frequency
tgram_frq1
min:
0
max:
19539.8
Constraints
Min:
Max:
trigram: 2
Token trigram frequency
tgram_frq2
min:
0
max:
19539.8
Constraints
Min:
Max:
trigram: 3
Token trigram frequency
tgram_frq3
min:
0
max:
19539.8
Constraints
Min:
Max:
trigram: 4
Token trigram frequency
tgram_frq4
min:
0
max:
19539.8
Constraints
Min:
Max:
trigram: 5
Token trigram frequency
tgram_frq5
min:
0
max:
19539.8
Constraints
Min:
Max:
trigram: 6
Token trigram frequency
tgram_frq6
min:
0
max:
19539.8
Constraints
Min:
Max:
trigram: 7
Token trigram frequency
tgram_frq7
min:
0
max:
10795.4
Constraints
Min:
Max:
trigram: 8
Token trigram frequency
tgram_frq8
min:
0
max:
5993.37
Constraints
Min:
Max:
trigram: 9
Token trigram frequency
tgram_frq9
min:
0
max:
3447.48
Constraints
Min:
Max:
trigram: 10
Token trigram frequency
tgram_frq10
min:
0
max:
1592
Constraints
Min:
Max:
trigram: 11
Token trigram frequency
tgram_frq11
min:
0
max:
1934.89
Constraints
Min:
Max:
trigram: 12
Token trigram frequency
tgram_frq12
min:
0
max:
107.71
Constraints
Min:
Max:
trigram: 13
Token trigram frequency
tgram_frq13
min:
0
max:
0
Constraints
Min:
Max: