TAALES: Tool for the Automatic Analysis of Lexical Sophistication

Researchers have had difficulty defining and assessing lexical sophistication, but its importance in explaining comprehension, proficiency, and quality in both written and spoken tasks is indisputable. However, freely available tools to automatically assess lexical sophistication are limited.

The Tool for the Automatic Analysis of Lexical Sophistication (TAALES) was developed to offer an automatic analysis of over 130 classic and newly developed to assess the lexical features of a text. This tool is free, fast, uses a user friendly interface and can be downloaded onto the user’s hard drive. In addition, it allows for batch processing and produces an easy access output in a comma separated spread sheet file. It was designed to be a convenient and reliable tool for any researcher or educator interested in analyzing the lexical sophistication of a text.

TAALES is available here: Tool for the Automatic Analysis of Lexical Sophistication

Measures reported by TAALES

  1. Word Frequency indices are determined by the word’s occurrence in a text. These indices are calculated by dividing sum of the frequency scores for the tokens in a text by the number of tokens in that text that received a frequency score.
  2. Range indices are measures that account for how widely a word or word family is used, usually providing a count of the number of documents in which that word occurs.
  3. N-gram indices are indices that measure bigram frequency, bigram proportions, and bigram accuracy. They have been shown to be predictive of human judgments of essay quality.
  4. Academic List indices include academic level word and n-gram lists calculated from the Academic Word List (Coxhead, 2000) and the Academic Formulas List (Simpson-Vlach & Ellis, 2009). They are calculated by counting the number of tokens in the text that occur in an academic list and dividing by the number of words in the text.
  5. Word Information indices are calculated by taking the sum of scores for each token in a text that is given a word information score and then dividing by the number of tokens in the text that are given a word information score. These indices assess the psycholinguistic properties of words, including concreteness, familiarity, imageability, meaningfulness, age of acquisition.

Validation of TAALES

TAALES is a relatively new tool, but a number of studies have been conducted that test its validity in assessing lexical sophistication. One such study by Kyle and Crossley (in press) compared the unstructured writing of English language learners to that of native English speakers in order to compare their holistic lexical proficiency. The results found a significant correlation between the proficiency scores and the tools indices. In another study, TAALES was used to analyze the lexical knowledge of essay writers using the Writing Pal intelligent tutoring system. The results of this study showed that 45 indices were significantly correlated with a student’s vocabulary knowledge (Allen et al., in press). TAALES was also used to model differences between responses to different spoken assessment tasks in the TOEFL iBT (Kyle, Crossley & McNamara, in press). Each of these studies demonstrates that TAALES can be used to reliably assess lexical sophistication.

References/further reading

Allen, L. K., & McNamara, D. S. (in press). You are your words: Modeling students’ vocabulary knowledge with natural language processing. In O. Santos, J. Boticario, C. Romero, M. Pechenizkiy, Agathe Merceron, Piotr Mitros, José María Luna, Cristian Mihaescu, Pablo Moreno, Arnon Hershkovitz, Sebastian Ventura, & Michel Desmarais (Eds.), Proceedings of the 8th International Conference on Educational Data Mining (EDM 2015). Madrid, Spain.

Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213–238. doi:10.2307/3587951

Kyle, K., & Crossley, S. A. (2014). Automatically assessing lexical sophistication: indices, tools, findings, and application. TESOL Quarterly., 49, 757-786.

Kyle, K., Crossley, S. A., McNamara, D. S. (2015). Construct validity in TOEFL iBT speaking tasks: Insights from natural language processing. Language Testing, 33, 319-340.

Simpson-Vlach, R., & Ellis, N. C. (2010). An academic formulas list: New methods in phraseology research. Applied Linguistics, 31, 487–512. doi:10.1093/applin/amp058