Ashley Sheridan​.co.uk

How Readable is Your Content?

Posted on

Tags:

The readability of your content is one of the most important accessibility issues you will encounter, but is often one of the least thought about. Readability affects everyone who browses the Web, and can mean the difference between retaining or scaring your site visitors.

What is Readability?

The readability of your content is how easily understood your content is. One of the most useful measures of this is the Flesch-Kinkaid test, a formula that scores content and produces a reading age for English text.

The exact formula is:

206.835 - 1.015 ⁢⁢ total words total sentences - 84.6 ⁢⁢ total syllables total words

The higher the score, the easier the content is to read, while lower scores indicate content that is more difficult. To give the scores some context, the values produced from running the equation can be looked up against this table (taken from the Wikipedia article on the FRES test which uses American school terms for comparisons):

ScoreSchool LevelNotes
n ≥ 905th gradeVery easy to read. Easily understood by an average 11-year-old student.
90 > n ≥ 806th gradeEasy to read. Conversational English for consumers.
80 > n ≥ 707th gradeFairly easy to read.
70 > n ≥ 608th & 9th gradePlain English. Easily understood by 13 to 15 year-old students.
60 > n ≥ 5010th to 12th gradeFairly diffficult to read.
50 > n ≥ 30CollegeDifficult to read.
n < 30College graduateVery difficult to read. Best understood by university graduates.

Why is it so Important?

Hopefully, this should be obvious. As well as their actual age, there are a plethora of other reasons why somebody might find content difficult to read, and they don't just relate to problems that are typically considered disabilities under the law. Issues like Dyslexia, or Down's syndrome, will obviously have an impact on a persons ability to learn and thus impact their reading. But what if English isn't their language? Tiredness might impact how easily a person can read too.

If your visitors aren't able to comprehend your content, then they're more likely to leave your website. If you rely on this digital footfall then you want to keep those people. Ensuring they understand you is essential in retaining them.

How Readable is Your Content?

I've put together a simple checker in Javascript that will calculate the score and from that, select the corresponding reading age from the examples in the above table.

  • Very easy to read. Easily understood by an average 11-year-old student.
  • Easy to read. Conversational English for consumers.
  • Fairly easy to read.
  • Plain English. Easily understood by 13- to 15-year-old students.
  • Fairly difficult to read.
  • Difficult to read.
  • Very difficult to read. Best understood by university graduates.

This is not perfect, and the syllable counting code could probably do with a little extra work around the syllable counting code, but it seems to do the job to within a fair degree of accuracy.

The Counting Functions

At the heart of this is the code to produce the counts for sentences, words, and syllables:

function get_sentence_count(sample_text) { let sentences = sample_text.match(/.[.?!](\s|$)/gm); if(sentences) return sentences.length; return 1; }

This is used to get the number of sentences in a piece of text using a regular expression to count, and identifies the end of a sentence with a period, exclamation mark, or question mark. If I was really going to flesh this out fully, I'd look at using other sentence endings, like the infrequently used but beautiful interrobang "‽". I end it by returning a count of 1, which assumes that there's always at least one sentence in-case the regular expression doesn't match anything. A simple check of the sample string length before attempting any text metrics will prevent false positives.

Word count is done a little more simply. I trim whitespace, replace multiple occurrences of whitespace within the sample text with a single space, and then break the string into an array of "words" using the single space as a delimiter. It's crude, but should do the job:

let all_words = sample_text.trim().replace(/\s+/gi, ' ').split(' '); let word_count = all_words.length;

This isn't in a function of its own, as I needed the list of words to loop through and count the syllables:

function get_syllables_in_text(word) { word = word.toLowerCase(); if(word.length <= 3 && !word.match(/^[aeiou].y$/)) return 1; let vowel_count = word.match(/[aiouy]+e*|e(?!d$|ly).|[td]ed|le$/g); return (vowel_count) ? vowel_count.length : 1; }

Firstly we check to see if the passed in word is 3 letters or less, and if it doesn't begin with a vowel and end in a letter y (which is a vowel in some circumstances).

After that, I find all matches for a a regular expression that looks for any of the following:

  • One or more of the letters "a, i, o, u, y", followed by zero or more letter e's
  • The letter "e" followed by anything that *isn't* a "d" and end of the word, or *isn't* an "ly" and any other character
  • Either the letter "t" or "d" followed by "ed"
  • Or finally "le" and the end of the word

If no matches can be found, I return 1, as all words should have at least one syllable.

After that, it's just about putting it all together in a single formula:

let reading_level = (206.835 - 1.015 * (word_count / sentence_count) - 84.6 * (syllable_count / word_count) );

How Can You Use This Information?

So, you've run the test, and you have your website contents reading level. Now what? It's important to think about your target audience and the subjects you're covering.

For example, the majority of the articles on this site are technical in nature, and are aimed at other developers. One side-effect of this subject is that there are a lot of acronyms and technical terms which will influence the Flesch-Kinkaid algorithm towards the difficult end of the spectrum. As the subject requires a higher level of technical knowledge and understanding, I fully expect the content to reflect this, so I accept that the reading age will be higher than average.

However, a website that's aimed at home bakers and offers recipes would likely expect a much more varied audience. So it would be prudent to aim the content at a lower reading level, in order to make it accessible to as wide an audience as possible.

The Flesch-Kinkaid Test and Non-English Languages

The Flesch-Kinkaid algorithm was specifically created for English, so it doesn't really work for other languages. The main problem is that of average syllables in a given language, usually indicated as an average per 100 words in a given text.

This test should work with other romance languages with some tweaking to the formula seed values. However, depending on the specific language, there might be better choices specifilly created, such as the Fry Readability Graph for Spanish.

Outside of the romance languages, there are languages in which the syllable count doesn't make much sense as a metric. The AARI looks at word length of Arabic content after removing ال from the beginning of words.

Comments

Leave a comment