new drinking laws in the UK. The article from "The Sun" is titled "Sup all night". The article from "Daily Mail" is titled "Police braced for the great British binge" (see references for more detail).
The research consists of the following steps. First, I select a sample of 100 words from each article. I count the word length and frequency of the same length words putting it into the summary table and analyze the findings. Then I do the same procedure for 200 words and 400 words. The reason why I decided to split my analysis into those 3 consecutive steps is in order to see any possible changes in my statistical indicators (such as mean, median, mode). On average, they should not volatile drastically for each article when moving to a larger size sample. But they should become more accurate as in a larger-size samples random differences should smooth out.
As was noted above, for each step sample size I calculate mean, median and mode. The mean shows me what the average word length in the sample is by merely dividing the total number of letters in the sample by the total number of words. So it can be any decimal number, like 4.53. It doesn't tell me the exact number of letters in the word (as there are no words with 4.53 letters), but it gives a good estimation of distribution of letters across the words.
However, the mean could yield a bit misleading results if the data distribution is skewed to the left or right. ...