MCom I Semester Statistical Analysis Variance Study Material Notes

MCom I Semester Statistical Analysis Variance Study Material Notes

MCom I Semester Statistical Analysis Variance Study Material Notes: Meaning and Definition of Analysis of Variance Assumptions in Analysis of Variance The Technique of Analysis of Variance One Way Analysis of Variance Anova Table Short Cut Method Anova Table Coding Method Analysis of Variance in Two Way Classification Calculation Procedure to Two Way Analysis of Variance  Important Points to Be Remembered in Relation to Two-way Analysis of Variance :

MCom I Semester Statistical Analysis Variance Study Material Notes
MCom I Semester Statistical Analysis Variance Study Material Notes

MCom I Semester Business Environment Devaluations Study Material Notes

Analysis of Variance

In the previous two chapters (Test of Significance-Large Samples: Test of Significance—Small Samples), we discussed the methods of determining whether two samples have come from the same universe or from two universes that are significantly different from each other. The difference between two sample means can be studied through the standard error of the difference of the means of the two samples or through Student’s t-test but the difficulty arises when we have to examine the significance of the difference between more than two sample means at one end and the same time. Suppose, there are three types of fertilizers and each type of fertilizer is applied to four plots. We may be interested in finding out whether the effects of these fertilizers on the yields are significantly different or in other words, whether the samples have come from the same universe. The answer to this problem is provided by the technique of analysis of variance.

The present chapter explains the technique of the analysis of variance-an elegant and versatile statistical technique.

First of all, the analysis of variance technique was introduced by R.A. Fisher in 1923. Later on, Prof. Nedcor and several others also contributed to the development of this technique. It was mainly used in agricultural research, that is why, its language is loaded with such agricultural terminology as blocks (referring to land) and treatments (referring to populations or samples) which are differentiated in terms of seed, fertilizers or cultivation methods. Now, analysis of variance finds applications in a large number of experimental designs in natural sciences as well as in social sciences. In fact, it has come to acquire a place of great prominence in statistical analysis.

Meaning and Definition of Analysis of Variance

The analysis of variance frequently referred to by the contraction ANOVA which is a statistical technique specially designed to test the hypothesis whether the means of several samples have significant differences or not. In other words, it tests whether the ‘n’ samples can be considered as having been drawn from the same population, or more precisely, from populations having the same means. A test of significance for two sample means, discussed in an earlier chapter is the simplest case of analysis of variance. This technique is a direct extension of the test of significance of the differences between several sample means. The analysis of variance is a name derived from the fact that this analysis is based on the comparison of variances estimated from various sources.

The analysis of variance is essentially a procedure for testing the difference between different groups of data for homogeneity. It is a method of analyzing the variance to which a response is subject to its various components corresponding to the various sources of variation. There may be variation between the samples or/and there may be variation within the sample items. Thus, it is a procedure by Which the variation is analyzed into its various components corresponding to the amours sources of variation. Some important definitions are as follows:

Analysis Variance Study Material

(1) … the separation of the variance ascribable to one group of causes from the variance ascribable to other groups.                                            -Ronald A. Fisher

(2) ‘The analysis of variance is essentially a procedure for testing the differences between different groups of data for homogeneity.                      -Yule and Kendall

(3) “The analysis of variance is essentially a method of analyzing the variance to which a response is subject, into its various components corresponding to the sources of variation which can be identified.”                                                          -Owen L. Davies

Conclusion: Thus, analysis of variance is a statistical technique, with the help of which total variation, is partitioned into variation caused by each set of independent factors and homogeneity of several sample means is tested. It should be remembered that the analysis of variance test discussed here is not intended to serve the ultimate purpose of testing for the significance of the difference between two sample variances: rather its purpose is to test for the significance of the difference among same means. They do this via the mechanism of the F-test. The basis of this test is the ratio of two variances (i) between samples, and (ii) within samples. The sum of the two variances is however, the total variance. The purpose of considering variances of these two types is to find out the influence of different forces working on them. The differences between group sample means are due to the influence of the matter experimented as well as the sampling variations between the samples.

Examples for Applications of Analysis of Variance : Analysis of variance furnishes a technique for testing simultaneously the significance of the difference among several means. An agronomist may like to know whether there is any significant difference in the average acre yield of soybeans if three kinds of fertilizers are used in three identical plots of land or whether yield per acre will be the same if four different varieties of wheat are sown in different identical plots. A dairy farm may like to test whether there is significant difference between the quality and quantity of milk yield from herds of different breeds. A business manager may like to find out whether there is any difference in the average service life of several kinds of light bulbs. For example, while using the analysis of variance technique to test whether four kinds of fertilizers used on four identical plots of land, will give the same average yield, the null hypothesis will be that “mean yields of four plots are the same”. To test this hypothesis the four kinds of fertilizers will be used in different plots of land. This is called ‘treatment, and the difference between the mean yield is called ‘effect’. This difference can be due to the error of the experiment as well as due to the difference in the quality of fertilizers. The analysis of variance is a technique to segregate the effects caused by different factors.

Assumptions in Analysis of Variance The following are the underlying assumptions in the use of analysis of variance technique :

(1) Normality: The populations from which the various samples are selected are normally distributed. However, if samples are large, the assumption of normality may not be considered necessary.

(2) Independence: Each of the samples is a simple random sample and independent of other samples.

(3) Homogeneity: Population from which the samples are drawn have same means (“1 = 12 = Uz = … = Hn and same variances (g = os = o3 = … = )

The Technique of Analysis of Variance

For the sake of clarity, the technique of analysis of variance has been discussed separately for (1) One-way classification, and (2) Two-way classification.

(1) One-way Analysis of Variance

or

The technique of Analysis of Variance in One-way Classification

The simplest situation in which the analysis of variance technique can be applied is the one-way analysis of variance. Under one-way classification, the influence of any one factor is considered. For example, the influence of the application of one or more types of fertilizers may be considered on several pieces of land.

The method to be employed utilizes a comparison between the variance computed in two different ways: One variance will be computed as the variance between the samples and the second will be computed as the variance within the samples. The theory of the test is that if all the samples came from the same universe, these two estimates should be relatively close in value, if the samples came from the different universes, the variance estimates should be relatively far apart. Thus, the total variance can be divided into two components viz.,

Total variance = Variance between Samples + Variance within Samples

For this variance ratio (F) is found out. The following three methods are generally used for one-way analysis of variance : (a) Direct method, (b) Short-cut method, (c) Coding method. We now explain all these methods one by one.

Analysis Variance Study Material

(a) Direct Method of One-way Analysis of Variance

The steps involved in carrying out the analysis of variance by the direct method are as follows:

(1) Null Hypothesis: The null hypothesis used in the analysis of variance is that the arithmetic means of populations from which the k samples were randomly drawn are equal to one another. Thus

Moil, = H2 = M3 = … = My

(2) Computation of Variance between the Samples: The variance between samples (groups) measures the differences between the sample mean of each group. The variance between samples takes into account the random variations from Observation to observation. It also measures differ from one group to another. For computing variance between the samples, we take the total of the squares of the deviations of the means of various samples from the grand average and divide this total by the degrees of freedom. Thus, the steps in calculating variance between samples will be :

(a) Calculate the mean of each sample i.e., X1, X2, …,

(b) Calculate the grand mean (mean of the sample means) X, pronounced as X double bar’. Its value is obtained as follows:

The above formula for the computation of X is used only when the size of each sample is equal. If the size of each sample is not equal then we will compute the grand mean (X) as follows:

Analysis Variance Study Material

(c) Take the deviations between the means of the various samples and the grand mean. Compute the square of such deviations which may be multiplied by the number of items in the corresponding sample and then obtain their total. This is known as the sum of the squares of deviations for variance between the samples. Symbolically, this can be written.

Sum of Squares of Deviations for Variance between Samples

The total of this has been called by Pitman as ‘Squriance’ and by Kendall as ‘Deviance’. Note: The sum of squares between samples is denoted by SSB or SSC. (d) Computation of Mean Squares (MSB or MSC). Divide the total obtained in step (c) by the number of degrees of freedom (v). The degrees of freedom will be one less than the number of samples (v = k-1). If there are 6 samples then the degrees of freedom will be 6 -1 = 5.

i MSB = SSB = (k – 1)

(3) Computation of Variance Within Samples: The variance (or sum of squares) within samples measures those inter-sample variations which are due to chance only. It is denoted by SSW or SSE. The variance within samples (groups) measures variability around the mean of each group. Since the variability is not affected by group differences it can be considered a measure of the random variation of values within a group. For computing the variance within the samples, we take the total of the sum of squares of the deviations of various items from the mean value of the respective samples and divide this total by the degrees of freedom. In short, the steps in calculating variance within the samples will be :

(a) Calculate the mean value of each sample i.e., X, X, X3, etc. (since we have already computed the mean value of each sample in computation of variance between the samples; Hence, there is no need to calculate the X1, X2, X3, etc., again).

(b) Obtain the deviations of the values of the sample items for all the samples from the corresponding means of the samples. Square these deviations and obtain the total which gives the sum of squares within the samples. Symbolically, we can write

Note: The sum of squares within the samples is denoted by SSW or SSE. (c) Computation of Mean Squares (MSW) : Divide the total obtained in step (b) by the number of degrees of freedom. The degrees of freedom is equal to the total number of items in all the sam minus the number of samples i.e., d.f. = N-k. where N = Total number of items in all the samples k = Number of samples.

MSW = SSW – (N-k)

(4) Calculation of F-Ratio: The F-coefficient is used to judge whether the difference between the two variances (i.e.. between and within-sample var significant or just due to fluctuations of sampling. F-coefficient or the Varia Ratio is the ratio which the greater variance bears to the smaller variance. hou words, this ratio is worked out as under:

F_Greater Variance

Smaller Variance

Generally, the variance between the sample means has greater value comparison to the variance within the sample means. Hence, we put the variance between the sample means as the numerator. If in any case, the variance within the sample means has greater value then its value will be used as numerator because the value of F will always be greater than unity. Here, it should be noted that some scholars have suggested that there is no need to distinguish the greater variance and smaller variance. We should always divide the value of MSB (Variance between the sample means) by the value of MSW or MSE (Variance within the sample means) and if the such obtained value of F is less than 1 then there is no need to compare the calculated value of F with the corresponding table value of F. In such a case, the difference will always be insignificant. Although we can use any of the above two concepts, but we have adopted the first concept i.e., greater variance has been put as numerator.

(5) Setting up Analysis of Variance Table : For the sake of convenience the information obtained through various steps stated above can be put in a table called the analysis of variance table, generally abbreviated ANOVA. The specimen of ANOVA table is given below:

For a check, the sum of squares of deviations for total variance (SST) can Also he worked out by adding the squares of deviations when the deviations for the individual items in all the samples have been taken from the grand mean. This total should be equal to the total of SSC and SSE.

SST = SSC (SSB) + SSE (SSW)

The degrees of freedom for total variance will be equal to the number of items in all the samples minus one i.e., d./. = N – 1.

(6) Comparison between the Calculated Value of F with the Table Value of F: R.A. Fisher and George W. Nedcor have worked out the limits of chance sample occurrences of various combinations of the degrees of freedoms. These ratios (or the table values of F) at various levels of significance are available in tables prepared by them. By comparing the observed values of F with the table values, we can conclude whether the difference between the samples could have arisen due to chance fluctuations. Generally, we take 5% level of significance.

If the calculated value of F is greater than the table value, the difference is significant and the null hypothesis is rejected. If the calculated value of F is less than the table value of F, the difference is not significant, i.e., the differences have arisen due to fluctuations of sampling.

Remember able Points for F-Table : As we know that we can look into the F-table at 1% or 5% level of significance. In the absence of any clear information, we take 5% level of significance. The second main problem is to decide the v, and V2. In this regard, d.f. related to greater variance is always treated as v, and d.f. related to smaller variance as v2.

Illustration 1.

The table below gives the yields in quintals of four plots each of three varieties of wheat. Is there a significant difference between the mean yield of the three varieties? (Fo.05 for v = 2, v. = 9 is 4.26)

 

Conclusion: The table value of F at 5% level of significance for vi = 9 and V2 = 2 IS 19.38 while the calculated value of F is 2.555. The calculated value of F is less than the table value. Hence, our hypothesis is true, i.e., there is no significant difference between the yields of the three varieties of sugarcane.

Note: Generally, the value of MSB is greater than MSW but in this question the value of MSW is greater. Hence, to find the value of F we divide the value of MSW by MSB. On the other hand, if we divide the MSB by MSW then the value of F will be 0.39. As we know if the calculated value of F is less than 1 then there is no need to compare the calculated value of F with the table value of F. In such a case, it is assumed that there is no significant difference in mean values of samples.

Analysis Variance Study Material

(b) Short-cut Method

The above method of calculating the sum of squares for variance between samples and variance within samples is a very long one. There is a short-cut method also which is generally used for this purpose and is very convenient method particularly when means of the samples and/or grand mean happen to be in fractions and not in whole numbers. This method reduces considerably the calculation work. The various steps involved in short-cut method are as under:

(i) Total of sample items (EX, SX2,…, EXR) and sum of the squares of items (EX}, {X3,…, EXK) are obtained. T?

(ii) Correction Factor =*where To = (EX, + 2X+…+Xx)? N = number of total items, is calculated.

(iii) Find out the square of all the item values one by one and then take its total. Subtract the correction factor from this total and the result is the sum of squares of deviations for total variance. Symbolically, we can write

Total S.S. (SST) = [EX; + EX; + EX; + … + EX1 – C.F.

(iv) Obtain the square of each sample total and divide such squared value of each sample by the number of items in the concerning sample and take the total of the result thus obtained. Subtract the correction factor from this total and the result is the sum of squares of deviations for variance between the samples. Symbolically, we can write

Conclusion: The table value of F for vi = 2 are The table value of F for v = 2 and v, = 11 at 5% level is 3.98. Value of Fi.e., 1.624 is less than the table value 3.98 at 5% significance level. Hence we accept the nn och hose we accept the null hypothesis (H) and conclude that the average life time of the three brands of tires are equal.

Analysis of Variance in Two-way Classification

In one-way classification, we have studied the influence of one factor on different sample groups. Under two-way classification, we will discuss the intended of two factors. When it is believed that two independent factors might have an effect on the response variable of interest, it is possible to design the test so that all analysis of variance can be used to test for the effects of the two factors simultaneously Such a test is called a two-factor analysis of variance. With the help of two-lacto analysis of variance, we can test two sets of hypothesis with the same data at the same time. In such a case, the data are classified according to the two diferent factors. For example, a Tea company can analyse its sales on the basis of four salesmen and three seasons winter, summer and rainy season, or an educationist can analyse three methods of teaching a subject in four different ways. This sort of classification is based on two criteria, and its analysis of variance is called two way analysis of variance.

The total variance in a two-way classification is splitted into three parts viz., variation in the column means factor A. variation in the row means factor B and residual variation. Residual variation is the sampling variations besides the two factors considered. Infact, residual variation is the total variation minus the variation among the means of rows and the means of columns. Symbolically :

Total Sum of Squares (SST)

Sum of Squares Between Columns (SSC) + Sum of Squares

Between Rows (SSR) + Sum of Squares as the Residual (SSE)

Calculation Procedure of Two-Way Analysis of Variance

The computation procedure of two-way analysis of variance is somewhat different than the one followed while dealing with problem of one-way classification. The various steps involved in preparing the analysis of variance table in such a case are as under:

(1) Use the coding device, if the same simplifies the task;

(2) Take the total of the values of individual items (or their coded value as the case may be) in all the samples and call it 7;

(3) Work out the correction factor as under: T2 Correction Factor CF =

(4) Computation of Total Sum of Squares : Find out the square of all the item values (or their coded value as the case may be) one by one and then take its total. Subtract the correction factor from this total to obtain the sum of squares of deviations for total variance. Symbolically, we can write this as S.S.T. = [X X{ + X3 + … +EX] – or C.F.

(5) Computation of Sum of Squares between Column (SSC): Take the total of the different columns and then obtain the square of the total of each column. After that the square of the total of each column is divided by the number of items

 

 

Analysis Variance Study Material

chetansati

Admin

https://gurujionlinestudy.com

Leave a Reply

Your email address will not be published.

Previous Story

How to lose weight fast and keep it off

Next Story

Download UPSC CDS PDF Books Notes