The CEFR (Common European Framework of Reference for Languages: Learning, Teaching, Assessment) aims to provide a comprehensive learning, teaching, and assessment method that can be used for all European languages. Indicating the level of learners of foreign languages in Europe and beyond, the CEFR facilitates the assessment of a person's language proficiency.

By now, most are familiar with the six reference levels (A1, A2, B1, B2, C1, and C2) used for this purpose. The EDIA CEFR tagger is able to measure the readability of texts on the CEFR scale on a more granular level. That’s why we use a 9 point CEFR scale (A1, A2, A2+, B1, B1+, B2, B2+, C1, C2).

Why does the CEFR matter?

CEFR levels are the foundation for a communicative approach to (foreign) language acquisition, teaching, and certification. Although the CEFR levels represent a widely supported approach, the availability and quality of educational content labelled with CEFR levels are limited. That's because the highly laborious, error-prone labelling process is performed manually (save for some exceptions). This results in several practical obstacles regarding publishing, teaching, and learning:

Content creators (publishers, authors, and teachers) struggle to use consistent criteria for checking a text's difficulty level.
Schools and teachers have trouble finding and/or creating appropriate texts for their students.

Frequently asked questions

What does make the EDIA CEFR solution unique?

Most CEFR taggers look at the individual words in a text to assess the readability. These taggers use a dictionary in which each word is measured from A1 to C2. The readability of the text is concluded by combining the readability of individual words in the text. The words in the dictionary are given a rating without context. This method detects the right readability level in approximately 60% of cases. In essence, this approach is simply counting words. A text is so much more than the sum of the individual words, and individual words do not make the text.

How does the EDIA tagger work?

Our algorithm looks at many aspects of the text, such as parts of words1, the words themselves, word combinations, sentence structure and the structure of the entire text. It does so by methods of deep learning and pre-trained language models. Instead of a mathematical formula that calculates based on a dictionary using single definitions of words, we believe that advanced neural networks are superior. Such algorithms are capable of taking into account every aspect of the text, not just the words. Our algorithm is effective in more than 90% of the cases.

How is the EDIA tagger made?

Our CEFR tagger is trained by taking texts from a variety of sources and reading levels. Each text is evaluated by multiple language experts and the combined evaluations for our dataset. The CEFR tagger is trained in such a way that 20% of the texts remain unknown to our algorithm. This 20% is used to check the quality of our algorithm. We repeated this with a different 20% of the total texts to make sure the results are not a fluke (i.e. the tagger is good at predicting CEFR for this set of examples, but does not generalize to other texts).

How can your CEFR ‘experts’ be right?

The texts are evaluated by experts, but how do we know these so-called experts are right? The fact that experts agree on something is the key here, science builds upon the model of scientific consensus. That means that if a large majority of independent experts believe that something is true, it is considered to be true. And we work with the same principle. Furthermore, our machine learning is capable of reproducing such classification of texts to a high level. Validation of results is a crucial component of the scientific method.

How can the EDIA CEFR tagger be better than humans?

Our algorithm is more accurate and consistent than human experts. How can this be? Let’s consider an example. Radiologists assess your health by analysing MRI scans. In some cases, machine learning has shown to be better at this task. That is because the machine learning is trained by the knowledge of many experts, not just one. Machine learning is trained using a huge dataset, one that is impossible to digest by humans. Machine learning doesn’t need a coffee break and doesn’t have a bad night’s sleep. As you can see, there are many similarities between this example to our technology, as it is trained by experts on a large, validated dataset and outperforms humans in CEFR classification.

With which software is the EDIA CEFR tagger compatible?

The EDIA CEFR tagger is available in the following content management systems (CMS) and editors: Microsoft Word, Google Docs, Alfresco, PublishOne, FontoXML, EDIA Papyrus, etc. In principle, EDIA’s CEFR tagger can be integrated into any system that your organization uses.

How can you activate EDIA CEFR tagger in your workflow?

The EDIA CEFR tagger is available in the form of an API that you can easily activate in one of the software solutions mentioned above. All you need to get started is a valid license key which can be requested through the EDIA sales team.

How much time can I save by using EDIA's CEFR tagger?

On average it takes a human expert 5 minutes to label one piece of content. For example, to classify 1,000 content items would take more than 10 workdays using manual tagging (excluding hours required to find the grading experts). By using the EDIA automated CEFR tagger it would take approximately 10 minutes resulting in a 90% time saving for your organization, as well as significantly increased efficiency.

WHAT IS CEFR classification?