Content metadata: why keyword extraction requires automated labelling

ediacustom.jpg

Keywords are no science but an art. There is no such thing as 'the right keyword,' as we're talking about a core concept incorporated into a piece of content in the broadest form. Texts don't necessarily need to contain an exact keyword. For example, if the term 'European Union' is used several times, 'European Commission' may be a suitable keyword even though the writer never uses the term.

Despite this fluid definition, keywords should be understandable to those who try to find the right ones. That's where automated labelling comes in.

Why should you use automated labelling?

When teachers and students use keywords to find specific materials on the internet or in a learning repository, a full-text search doesn't always suffice. But if content is labelled at a very granular level, keywords might do the trick. This type of labelling can only be done in an automated way, as you're not simply attributing keywords to a book. Rather, you're labelling paragraphs so your target audience can search content in an easy, accessible way. It's impossible to guarantee consistency when tackling such a detailed, refined task manually.

Keyword extraction and automated labels

In the case of CEFR, you use data that experts have annotated in the past. Since keyword extraction is an art, it requires a different, less scientific approach. You should use an existing taxonomy to train a machine learning model, so it will ultimately be able to recognise a variety of concepts and terms — which it can then distil from every text.

Sometimes, you'll be dealing with words that have several meanings. In these events, the model should learn to identify which meaning applies depending on the context — a concept that's also referred to as 'disambiguation.' A machine learning model requires state-of-the-art AI technology to achieve solid results in this regard.

Benefits of automation

Once you validate the AI model, it almost becomes an objective measure. So, you'll benefit from consistency, which is extremely valuable when dealing with the somewhat elusive concept of keywords. And your target audience will be better able to find content — which is, of course, what it's all about.

Want to know what other labels you can use for educational purposes? In our next blog post, we will discuss topic classification.

NEWS ITEMEmailmarketing