Statistics Coursework Essay, Research Paper
Investigation into some of the statistical differences between The Times and The Telegraph on a specific day
Design and Planning
The aim of this project is to compare two daily published broadsheets. The two papers that will be used are THE TIMES and THE TELEGRAPH, both purchased on the same day. A lot of data can be easily collected from a newspaper, ranging from average word length to area devoted to adverts per page.
The project will attempt to reach conclusions regarding three specific questions. In answering these questions a range of sampling methods, presentation of data, and statistical calculations will be used in order to interpret and evaluate the data and come to a valid conclusion, drawing together all the data.
Each question will be presented and it will be explained what statistical methods will be involved in drawing conclusions for these questions.
Question 1:
+ How does the font size of the headline text affect the length of the article?
This involves comparing two sets of data:
+ Font Size of Headline text: A sheet was printed from Microsoft Word that had various font sizes in the Times New Roman font, the standard font for the two papers, printed on it. This was used as a guideline when compiling all the data.
+ Length of column of each article : In The Times and The Telegraph there is a standard column width and simply measuring the vertical length of all the columns in the article gives a suitably accurate indication of the length of the article
To make any calculations accurate enough to draw a valid conclusion at least twenty sets of data from each paper will need to be collected. As each page has approximately three articles on it and both newspapers have roughly thirty pages as systematic sample of every 4 pages will provide enough data to support any conclusion.
The best ways to find out if the size of the headline text affects the length of the article is to draw a scatter diagram and find the line of best fit and to use Spearman s rank correlation coefficient.
Question 2:
+ What is the most common type of advertisement and how much space is given to each? Compare how this differs in the two newspapers.
This involves collecting two sets of data:
+ Number of times a pre-defined type of advert occurs : This will be done simply by looking through the paper and making a tally chart.
+ Area devoted to each pre-defined advert type : Whilst making the tally chart the area of each advert will also be recorded in centimeters squared. All these results can then be added up to give the total area devoted to each advert type.
To make any calculations accurate enough to draw a valid conclusion at least twenty sets of data from each paper will need to be collected. The only fair way to do this is to collect data from the whole of both papers, as this gives a much better picture of how much advert space there is and will provide at least twenty sets of data from each paper.
The best way to compare the data collected is to draw two sets of comparative pie charts. One set comparing the type of advert and the other comparing the area devoted to each type.
Question 3:
+ What is the dispersion and averages of the number of words in each article and how do they differ between the two newspapers?
This involves collecting one set of data:
+ The number of words : This will be done by counting the number of words in the first sentence as this usually gives a good indication of the depth of the article. The data will be collected in a grouped frequency table.
To make any calculations accurate enough to draw a valid conclusion at least twenty sets of data from each paper will need to be collected. Therefore to collect the right amount of data fifty samples in total over the two papers should be taken in the style of a stratified random sample, distributing the amount of samples proportionally between the two papers. A page number should then be randomly generated and the first article from that page sampled.
The best way to compare these two sets of data will be to use standard deviation, mean deviation, the quartile ranges, the averages (mean, median, mode), and histograms with box and whisker diagrams.
Collection, selection, presentation, analysis and interpretation and evaluation of data
Question 1:
To make the calculations accurate enough to draw a valid conclusion twenty sets of data from each paper was collected. As each page has approximately three articles on it and both newspapers have roughly thirty pages as systematic sample of every 4 pages was used to provide enough data to support any conclusion. As with all continuous data the column length will have a maximum and minimum error which will mean that errors in the data are possible, however these errors will not noticeably affect any of the statistical calculations.
THE TIMES THE TELEGRAPH
Font Size Column Length (cm) Font Size Column Length (cm)
72 57 90 45
36 11 48 20
24 16 36 17
48 59 48 20
72 68 72 46
36 20 20 5
28 20 36 30
72 34 72 75
36 14 30 24
80 36 28 16
72 49 48 42
36 21 24 6
28 20 36 22
48 35 72 38
36 18 28 14
90 83 72 34
24 8 90 80
60 46 28 19
36 18 90 104
72 34 72 67
It was found that not every page had 3 articles on it so not as many samples were collected as was hoped, but luckily twenty samples were still collected anyway.
Scatter diagrams
Firstly a scatter diagram was drawn for each of the two sets of data. This consists of laying out each of the measures along one of the axes of the grid, then considering each item in turn. The two measures for that item act exactly like an ordered pair and thus like coordinates of a point on the grid. Each item considered is thereby linked to one point on the grid and that point can be plotted in the normal way. From the scatter of points that is built up a pattern can be identified, a line of best fit simplifies this trend. To plot this line a special point was plotted, (Average of Font Size, Average of Column Length). These were then compared.
Spearmans Rank Correlation Coefficient
Another method of finding the relationship between two sets of data is to use Spearmans Rank Correlation Coefficient. Each distribution must first be put into an order of merit. Each item being considered has two ranks allocated to it and the difference between these two ranks can be found. If the symbol d is used to represent this difference then the coefficient of rank correlation can be written as:
where n is the number of items in the distribution.
If two or more measures in one distribution are equal it is convenient, though not mathematically justifiable, to allocate them a rank which is the average of the ranks which they would have occupied if they had been different. For example, if the third and fourth measures in a distribution are equal they would both be allocated the rank 3.5 or if the fifth, sixth, and seventh are equal they would be allocated the rank 6.
The easiest way to represent this data and to calculate the correlation is to put the data in a tabular form.
The Times
Font Size Column Length (cm) RankFont Size Rank Column Length d
72 57 5 4 1 1
36 11 13.5 19 -5.5 30.25
24 16 19.5 17 2.5 6.25
48 59 9.5 3 -7.5 56.25
72 68 5 2 3 9
36 20 13.5 13 0.5 0.25
28 20 17.5 13 4.5 20.52
72 34 5 9.5 -4.5 20.25
36 14 13.5 18 -4.5 20.25
80 36 2 7 -5 25
72 49 5 5 0 0
36 21 13.5 11 2.5 6.25
28 20 17.5 13 4.5 20.25
48 35 9.5 8 1.5 2.25
36 18 13.5 15.5 -2 4
90 83 1 1 0 0
24 8 19.5 20 -0.5 0.25
60 46 8 6 2 4
36 18 13.5 15.5 -2 4
72 34 5 9.5 -4.5 20.25
n=20 = 250
By applying the above equation the following calculations provide a measurement of the relationship between the two distributions:
The Telegraph
Font Size Column Length (cm) RankFont Size Rank Column Length d
90 45 2 6 -4 16
48 20 9 13.5 -4.5 20.25
36 17 13 16 -3 9
48 20 9 13.5 -4.5 20.25
72 46 6 5 1 1
20 5 20 20 0 0
36 30 13 10 3 9
72 75 6 3 3 9
30 24 15 11 4 16
28 16 17 17 0 0
48 42 9 7 2 4
24 6 19 19 0 0
36 22 13 12 1 1
72 38 6 8 -2 4
28 14 17 18 -1 1
72 34 6 9 -3 9
90 80 2 2 0 0
28 19 17 15 2 4
90 104 2 1 1 1
72 67 6 4 2 4
n=20 = 128.5
By applying the above equation the following calculations provide a measurement of the relationship between the two distributions:
Interpretation and evaluation:
Scatter diagrams:
The distribution of the points plotted on the scatter diagrams can give an indication of the relation between the two characteristics being measured. The lines of best fit both follows a straight line and this shows that that the measures are directly proportional. The lines on both diagrams both have similar angles, roughly 45., this shows that the relationship between headline size and article length is very strong in both newspapers. Despite the good lines of best fit, on both diagrams as the headline size and article length increase the points deviate further from the line of best fit. This may be show that, although the article length does increase as the headline size increases, as the values get higher there is less of a strong relationship between the two measures as there is when both of them are small. This means that as one value increases so does the other but it may increases by more or less proportionally to its original size.
Spearmans Rank Correlation Coefficient:
Despite knowing that both diagrams reveal a strong correlation there is no easy way of knowing which one has the strongest correlation, nor does it provide a measurement of how closely these measures approximate to the lines of best fit. The type of measure that is used for this purpose is called the coefficient of correlation and it is assessed on a scale which runs from +1 through zero to -1. A coefficient of correlation of +1 means that the two distributions match each other perfectly and this would correspond to a scatter diagram where all of the points plotted lie along the leading diagonal of the grid. A coefficient of correlation of -1 would correspond to a pair of distributions where the measures are in completely the opposite order, that is, the first in one distribution is last in the other and so on.
As with the scatter diagrams Spearmans Rank shows that both newspapers have a strong relationship between headline size and article length. But it reveals that this relationship is stronger in The Telegraph than it is in The Times. However Spearmans Rank can be deceptive as it only considers the rank of the distributions not the actual value that the scatter diagram does.
To conclude, both measures show that there is a strong relationship between the size of the font of the headline text and the length of the article. This makes logical sense and would reasonably be expected in both newspapers. However there is no particular reason that The Telegraph should have a stronger relationship than The Times and this may be just what the papers were like on that specific day.
Question 2:
To make any calculations accurate enough to draw a valid conclusion at least twenty sets of data from each paper was needed. The only fair way to do this was to collect data from the whole of both papers, as this gives a much better picture of how much advert space there is and provides at least twenty sets of data from each paper.
The Times The Telegraph
Advert Type Area ( ) Advert Type Area ( )
Holiday 442 Holiday 170
Computer 468 Holiday 400
Alcohol 493 Holiday 425
Car 2088 Phone 672
Computer 775 Bank/Insurance/ Money 250
Bank/Insurance/ Money 408 Holiday 250
Holiday 12 Computer 858
Bank/Insurance/ Money 170 Car 2088
Car 988 Bank/Insurance/ Money 1015
Holiday 544 Electrical Appliances 2088
Fashion 918 Car 950
Electrical Appliances 825 Education 160
Phone 116 Furniture 2088
Car 2088 Computer 2052
Electrical Appliances 825 Car 540
Car 900 Holiday 168
Computer 2088 Computer 832
Car 412.5 Holiday 544
Car 928 Car 2088
Books 400 Bank/Insurance/ Money 450
Computer 400 Bank/Insurance/ Money 450
Holiday 425 Computer 881
Cinema 912.5 Education 425
Bank/Insurance/ Money 280 Phone 180
Car 240
Computer 693
Bank/Insurance/ Money 476
Bank/Insurance/ Money 425
Both newspapers had more than enough adverts within them to support any valid conclusions. Despite the fact that The Telegraph has four more adverts in it than The Times this will not affect any statistical calculations.
Firstly two tables were drawn up, one to show the frequency of the type of adverts and the other to show the area devoted to each specific type of advert. From these two tables two sets of comparative pie charts were drawn. One comparing the type of adverts in The Times and The Telegraph and the other comparing the area devoted to each of these type of adverts in The Times and The Telegraph.
Comparative pie charts allow you to compare not only the percentage components but also the totals of the components, the areas of the pie charts must be proportional to the totals of the components.
Type of Advert
Advert Frequency
Type Times ( ) Telegraph ( )
Holiday 4 6
Computer 4 5
Car 6 5
Bank/Insurance/Money 3 6
Phone 1 2
Alcohol 1 0
Fashion 1 0
Electrical Appliances 2 1
Book 1 0
Cinema 1 0
Education 0 2
Furniture 0 1
=24 =28
Letting , be the radii of the pie charts to represent The Times and The Telegraph, then if equals 4cm then:
=
The angles in the pie chart that will represent each type of advert can be calculated by:
Dividing n by n and multiplying by 360.
e.g. Holiday in The Times –
Area devoted to each advert type
Advert Area ( )
Type Times ( ) Telegraph ( )
Holiday 1423 1957
Computer 3731 5316
Car 6476.5 5906
Bank/Insurance/Money 858 3066
Phone 116 852
Alcohol 493 0
Fashion 918 0
Electrical Appliances 1650 2088
Book 400 0
Cinema 912.5 0
Education 0 585
Furniture 0 2088
=16978 =21858
Letting , be the radii of the pie charts to represent The Times and The Telegraph, then if equals 4cm then:
=
The angles in the pie chart that will represent each type of advert can be calculated by:
Dividing n by n and multiplying by 360.
e.g. Car in The Telegraph –
Interpretation and evaluation:
Type
It can clearly be seen from the data that The Telegraph has four more adverts than The Times. In The Times it can be seen that the adverts for cars are the most frequent, whereas in The Telegraph adverts for holidays and for bank/insurance/money are the most common. Holiday, computer, car and bank/insurance/money adverts are the four most common type of adverts in both papers, with the other categories only occurring once or twice. This could be expected, considering the type of newspapers that are being sampled. The Times and The Telegraph have a certain type of reader and these adverts are obviously aimed specifically at these readers. Also the four most common adverts are advertising products/services that involve the most amount of money, therefore it is plausible that it is more profitable for the paper to advertise these type of adverts as competition will rise the price of advertising.
What the comparative pie charts allow you to do is to compare the percentage of the total adverts each advert type represents. The charts show that if a certain advert has an equal frequency in The Times and The Telegraph it has a higher percentage of the total in The Times than The Telegraph. This is shown clearly by the fact that the car adverts in The Times take up a higher percentage of the total adverts than the holiday and bank/insurance/money adverts do in The Telegraph despite them having the same frequency. It is also worth noticing that The Times has a wider range of adverts than The Telegraph. In both cases the four most common adverts take up roughly three quarters of the chart which again shows the readers the papers are aimed at and that the size of the market for these adverts is larger than the rest.
The less frequent adverts can be affected by the contents of the newspapers on that day, which may explain why adverts in one paper do not occur in the other. It is also possible that these adverts may not have such a large market with the readers or that large amounts of advertising is not economically viable.
Area
Looking at the comparative pie charts for the area of the adverts for each type it reveals that despite certain advert types occurring frequently they do not necessarily cover a large area. This is shown by the holiday and electrical appliances categories in The Telegraph holiday represents 21.4% of the type of adverts whilst only covering 9% of the total area devoted to adverts, whereas electrical appliances represents only 3.6% of the type of adverts whilst it covers 9.6% of the total area devoted to adverts. This may be that certain types of adverts do not occur frequently but require more space while some frequent adverts don t need a lot of spaces. The car category covers the largest area in both The Times