Statistics Definition and Formulas

 Statistics 

Statistics is concerned with the data collected for a specific purpose. We take decisions about the data by analysing and interpreting them. We know that the methods of representing the data in graphical and tabular forms, this representing show the critical properties and characteristics of the data. In this chapter we also study Measure of central tendency Mean (arithmetic mean), median and mode are three measures of central tendency. The measure of central tendency gives us an idea of where the data are concentrated but for proper analysis of the data, we must also know how much the data is scattered or spread around the measure of central tendency and how they are collected. 

measure of central tendency

Mean

The number of observations divided by the sum of observations is called the mean and It is denoted by \(\overline{x}\).

\(Mean (\overline{x}) = \frac{The \space sum \space of \space observations}{The \space number \space of \space observations}\)

Mean of ungrouped frequency distribution-

Let's understand by example,

Ex- Find the mean for the following frequency distribution?

xi 20 40 60 80 100
fi 2 12 14 8 4


xi fi fixi
20 2 40
40 12 480
60 14 840
80 8 640
100 4 400
\(\sum f_i = 40\) \(\sum f_ix_i = 2400\)

\(Mean (\overline{x}) = \frac{\sum f_ix_i}{\sum f_i}\)

\(Mean (\overline{x}) = \frac{2400}{40}\)

= 60

Mean of Grouped frequency distribution-

Ex- Find the mean for the following frequency distribution?

Intervals 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50
fi 3 28 42 20 7


Intervals fi xi fixi
0 - 10 3 \(\frac {0+10}{2}\) = 5 15
10 - 20 28 \(\frac {10+20}{2}\) = 15 420
20 - 30 42 \(\frac {20+30}{2}\) = 25 1050
30 - 40 20 \(\frac {30+40}{2}\) = 35 700
40 - 50 7 \(\frac {40+50}{2}\) = 45 315

\(Mean (\overline{x}) = \frac{\sum f_ix_i}{\sum f_i}\)

\(Mean (\overline{x}) = \frac{2500}{100}\)

= 25

Median

To find the median, first, arrange the data in ascending and descending order. Then solve by the following rules-

(I). If 'n' is odd-

\(Median (M) = (\frac {n+1}{2})^{th} \space term \)

(II). If 'n' is even-

\(Median (M) = \frac {(\frac {n}{2})^{th} \space term + (\frac {n}{2})^{th} \space term}{2} \)

Where 'n' is the number of observations.

ex- Median of 5, 2, 10, 15, 20, 25, 3?

solution- In ascending order- 2, 3, 5, 10, 15, 20, 25

n = 7 (odd)

\(Median (M) = (\frac {n+1}{2})^{th} \space term \)

\(Median (M) = (\frac {7+1}{2})^{th} \space term \)

\(Median (M) = (\frac {8}{2})^{th} \space term \)

\(Median (M) = 4^{th} \space term \)

M = 10

Median of ungrouped frequency distribution-

Let's understand by an example,

Ex- Find the median for the following frequency distribution?

xi 5 7 9 10 12 15
fi 8 6 2 2 2 6


xi fi Cifi(cumulative frequency)
5 8 8
7 6 8+6 = 14
9 2 14+2 = 16
10 2 16+2 = 18
12 2 18+2 = 20
15 6 20+6 = 26

N = \(\sum f_i \) = 26

\(\frac {N}{2} = \frac {26}{2} = 13 \) (Cumulative frequency exactly greater than '13')

Median(M) = 7

Median of Grouped frequency distribution-

\(M = l + ( \frac {\frac {N}{2} - C}{f})\times h \)

l = Lower limit of the median class

N = \(\sum f_i \)

C = The cumulative frequency of the class preceding the median class

f = The frequency of the median class

h = class interval

Ex- Find the median for the following frequency distribution?

Interval 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50
fi 8 30 40 12 10


Interval fi Cifi
0 - 10 8 8
10 - 20 30 38
20 - 30 40 78
30 - 40 12 90
40 - 50 10 100

N = \(\sum f_i \) = 100

\(\frac {N}{2} = \frac {100}{2} = 50 \) (Cumulative frequency exactly greater than '50')

Median class = (20 - 30)

h = 30-20 = 10

\(M = l + ( \frac {\frac {N}{2} - C}{f})\times h \)

\(M = 20 + ( \frac {50 - 38}{40})\times 10 \)

\(M = 20 + ( \frac {12}{4}) \)

M = 23

Mode

The observation in data whose frequency is greater is called the mode and it is represented by 'z'.

ex- Mode of 5, 2, 2, 5, 20, 5, 3?

solution- Frequency of 5 is three which is higher so,

z = 5

Mode of ungrouped frequency distribution-

xi 20 40 60 80 100
fi 2 12 14 8 4

z = 60 (highest frequency 14)

Mode of grouped frequency distribution-

\(Z = l + ( \frac {f_1 - f_0}{2f_1 - f_0 - f_2})\times h \)

where, l = Lower limit of mode class

f0 = The frequency of the class preceding the mode class

f1 = Frequency of mode class

f2 = The frequency of the class following the mode class

h = class interval

Ex- Find the mode for the following frequency distribution?

Interval 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50
fi 4 7 13 9 3

Mode class = 20 - 30 (highest frequency)

\(Z = l + ( \frac {f_1 - f_0}{2f_1 - f_0 - f_2})\times h \)

\(Z = 20 + ( \frac {13 - 7}{2 \times 13 - 7 - 9})\times 10 \)

Z = 20 + 6

Z = 26

Relation between mean, median and mode

Mode = 3 Median - 2 Mean 

Z = 3M - 2\(\overline{x}\)

Some other results from the above relation

Z - \(\overline{x}\) = 3(M - \(\overline{x}\))

Z - M = 2(M - \(\overline{x}\))

Quartiles

\(Q_i = l + (\frac {i \frac {N}{4} - C}{f}) \times h \)

Where, l = Lower limit of quartile class

f = frequency of quartile class

C = The frequency of the class preceding the quartile class 

N = \(\sum f_i \)

h = class interval

i = 1, 2, 3 (possible value of 'i')

Deciles 

\(D_i = l + (\frac {i \frac {N}{10} - C}{f}) \times h \)

Where, l = Lower limit of decile class

f = frequency of decile class

C = The frequency of the class preceding the decile class 

i = 1, 2, 3, ......... 9 (possible value of 'i')

Percentiles

\(P_i = l + (\frac {i \frac {N}{100} - C}{f}) \times h \)

Where, l = Lower limit of percentile class

f = frequency of percentile class

C = The frequency of the class preceding the percentile class 

i = 1, 2, 3, ............. 99 (possible value of 'i')

Ex- Find Q1, D1 and P1 for the given distribution?

Interval 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50
fi 8 30 40 12 10


Interval fi Cifi
0 - 10 8 8
10 - 20 30 38
20 - 30 40 78
30 - 40 12 90
40 - 50 10 100

\( N = f_i = 100\)

\( \frac {N}{2} = 50 \)

\(Q_i = l + (\frac {i \frac {N}{4} - C}{f}) \times h \)

\(Q_1 = 20 + (\frac {1 \frac {100}{4} - 38}{40}) \times 10 \)

\(Q_1 = 20 - \frac {13}{4} \)

\(Q_1 = \frac {67}{4} \)

\(D_i = l + (\frac {i \frac {N}{10} - C}{f}) \times h \)

\(D_1 = 20 + (\frac {1 \frac {100}{10} - 38}{40}) \times 10 \)

   = 20 - 7

   = 13

\(P_i = l + (\frac {i \frac {N}{100} - C}{f}) \times h \)

\(P_1 = 20 + (\frac {1 \frac {100}{100} - 38}{40}) \times 10 \)

\(P_1 = 20 - \frac {37}{4} \)

\(P_1 = \frac {43}{4} \)

Measures of Dispersion

The spread of the terms of a series from the mean is called dispersion.

i. Range

The difference between the maximum and minimum value is called Range.

ex- Range of 6, 4, 2, 3, 8, 4, 7?

Range = 8 - 2 = 6

Range coefficient

\(Range \space coefficient = \frac {maximum \space value - minimum \space value}{maximum \space value + minimum \space value } \)

\(Range \space coefficient = \frac {8 - 2}{8 + 2} \)

 = 0.6

ii. Quartile Deviation

\(Quartile \space Deviaton = \frac {Q_3 - Q_1}{2} \)

Coefficient of Quartile Deviation

\(Quartile \space Deviaton \space Coefficient = \frac {Q_3 - Q_1}{Q_3 + Q_1} \)

iii. Mean Deviation

I. Mean deviation with respect to the Mean

Ex- Find Mean deviation with respect to the Mean of the following questions?

a. Individual series

qus- 4, 7, 8, 9, 10, 12, 13, 17 

Solution- 

xi \(|x_i - \overline {x}|\)
4 6
7 3
8 2
9 1
10 0
12 2
13 3
17 7

\( \overline{x} = \frac{80}{8} = 10 \)

\( Mean \space Deviation = \frac {\sum |x_i - \overline{x}|}{n} \)

\( Mean \space Deviation = \frac {24}{8} \)

  = 3

b. Ungrouped frequency distribution-

xi 20 40 60 80 100
fi 2 12 14 8 4

we know that,

\(Mean( \overline {x}) = 60 \)

xi fi \(|x_i - \overline {x}|\) \(f_i|x_i - \overline {x}|\)
20 2 40 80
40 12 20 240
60 14 0 0
80 8 20 160
100 4 40 160

\( Mean \space Deviation = \frac {\sum f_i |x_i - \overline{x}|}{N} \)

\( Mean \space Deviation = \frac {640}{40} \)

  = 16

c. Grouped frequency distribution-

Interval 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50
fi 3 28 42 20 7

we know that,

\(Mean (\overline {x}) = 25 \)

Interval xi fi \(|x_i - \overline {x}|\) \(f_i|x_i - \overline {x}|\)
0 - 10 0 + 10 = 5 3 20 60
10 - 20 10 + 20 = 15 28 10 280
20 - 30 20 + 30 = 25 42 0 0
30 - 40 30 + 40 = 35 20 10 200
40 - 50 40 + 50 = 45 7 20 140

\( Mean \space Deviation = \frac {\sum f_i |x_i - \overline{x}|}{N} \)

\( Mean \space Deviation = \frac {680}{100} \)

  = 6.8

II. Mean Deviation formula with respect to median

a. Individual series

\( Mean \space Deviation = \frac {\sum |x_i - M|}{n} \)

b. frequency distribution-

\( Mean \space Deviation = \frac {\sum f_i |x_i - M|}{N} \)

III. Mean Deviation formula with respect to Mode

a. Individual series

\( Mean \space Deviation = \frac {\sum |x_i - Z|}{n} \)

b. frequency distribution-

\( Mean \space Deviation = \frac {\sum f_i |x_i - Z|}{N} \)

Standard Deviation

The square root of the arithmetic mean of the square of the deviation obtained from the Arithmetic mean of different variable values is called standard deviation. It is expressed by \(\sigma\).

\(\sigma = \sqrt {\frac {\sum (x_i - \overline {x})^2}{n}}\)

Variance

Variance = (Standard Deviation)2

\( V = \sigma^2 \)

Coefficient of Standard Deviation

\( Coefficient \space of \space standard \space deviation = \frac {\sigma}{\overline{x}} = \frac {Standard \space Deviation}{Mean} \)

Coefficient of variance

\( Coefficient \space of \space variance = \frac {\sigma}{\overline{x}} \times 100 \)