There are currently two corpora: 1) A large collection of written data from the Web, government sources, newspapers, and literature. 2) Subtitles kindly provided by
The first version of Subtlex available to the public was made in August, 2023. But we are always working to improve it, so please note the date of the database you are using in case you report results or find issues with the data.
You can email us at if you have any suggestions, questions, or find issues with the data. We are always trying to make it better and more accurate!
If you click the "Collapse/Expand All" button on any of the search pages, it will open up all of the sections and you'll see all the variables. Look under the variable's "Help" button to read a description of that variable.
Two separate databases are available: the written database, the subtitle database using token frequencies. You must choose one of them. All results will be based on the frequencies in the chosen source.