![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Factorial validity Factorial validity is concerned with the internal structure of tests. It is most frequently used to examine the validity of questionnaires. Many questionnaires are multidimensional, that is they tap a number of different but related dimensions of the property being measured. For example, the Exercise Motivations Inventory - 2, comprises 14 dimensions, each one a different motive for exercising (weight management, stress management, strength and endurance, and so on). Each dimension has a number of items. The statistical technique for assessing factorial validity is called, not surprisingly, factor analysis. What follows is a very basic introduction to factor analysis where I try to present the technique from a conceptual perspective and avoid its mathematics (you will be pleased to hear). Factor analysis is an advanced and sophisticated correlational technique (or rather, set of techniques) used to determine how the items making up a multidimensional instrument are actually related to the different dimensions being measured. The dimensions are referred to as factors or latent variables and the questionnaire items as manifest or observed variables. The dimensions are called latent because they are not directly observable; they are hypothetical constructs just as we discussed in the previous section. For instance, we cannot directly observe someone's self-confidence. What we can observe, though, are their responses to a set of items and from that we can infer their level of confidence. We will examine factor analysis using the Subjective Exercise Experiences Scale (SEES) as an example. This is a three factor questionnaire designed by McAuley and Courneya (1994) to measure exercise-induced feeling states, the three factors being psychological well-being, psychological distress and fatigue. Typically, we call each dimension of a questionnaire a subscale. Each subscale in the SEES has four items and respondents are asked to indicate on a 7-point scale the extent to which they are experiencing each feeling at that point in time. More recently, together with two former students (Markland, Emberton & Tallon, 1997), I modified and validated the SEES for use with children and we'll use data from that study for the example. Before we go on though, I should point out that factor analysis is not restricted to the analysis of questionnaire-type data. For example, some of the earliest studies in sport and exercise science used factor analytical techniques to examine the different dimensions of physical fitness. The factor structure of an instrument can be expressed graphically, with circles or ellipses representing factors, squares or oblongs representing observed items, arrows going from the factors to the items to show which items belong to which factor, and curved double-headed arrows to represent correlations among factors. Here is the intended structure of the SEES:
This is a model of the structure of the SEES and indicates that the first four items are meant to tap positive well-being, the second four psychological distress and the final four fatigue (of course they don't come in that order in the questionnaire itself, they are all mixed up). Equally importantly, though, the picture shows that the first four items are not meant to tap psychological distress or fatigue, the second four are not meant to tap positive well-being or fatigue, and the final four are not meant to tap psychological well-being or psychological distress. Notice that this is an example of convergent and discriminant validity: the items should converge on their intended factor and be discriminated from their non-intended factors. So the ideal we aim for with a multidimensional instrument like this is that it has what is called a simple factor structure; that is, the items only measure the dimension they are supposed to measure. Finally, the curved arrows represent the correlations among the factors. Although it does not indicate this on the diagram, in this case, well-being is meant to be negatively correlated with distress and fatigue while distress and fatigue are positively correlated. Now, imagine that we had a load of people fill in the SEES after an exercise bout so we obtained scores from them on each item. Imagine too that we were to correlate all the scores on the items with one another. If the instrument does have a simple factor structure, we'd hope that items would be strongly related (correlated) with other items from the same subscale but only weakly correlated, or even better, uncorrelated with items from the other subscales. So the correlation matrix should look something like this:
We ought to have big correlations among items tapping the same factor but small or ideally zero correlations between items from different factors. This is the basis of factor analysis. However, with this technique we go a step further. Factor analysis assumes that the reason that the correlations take this pattern is because the items are observed indicators of a smaller set of latent variables (in this case, three). Factor analysis, therefore, seeks to determine what this underlying structure is and which items are related to which latent variables. If our model is correct, the factor structure should show that the first four items are related to one factor (which we call psychological well-being), the second four to a second factor, and so on. Factor analysis gives us a factor matrix which, somewhat like the correlation matrix above, should show big relationships for items with their intended factors and weak or zero relationships with their non-intended factors. Here is the factor matrix for some of the data from Markland, Emberton and Tallon (1997). In this case the SEES was completed by 13-14 year old boys and girls following a game of rounders during a PE lesson.
The numbers are called factor loadings and can be interpreted just like correlation coefficients. You can see that the SEES, with this sample at least, does have a reasonably simple factor structure with moderate to strong loadings for items on their intended factors and much weaker loadings on their non-intended factors. So it seems that the factor structure of the modified SEES holds up quite well in 13-14 year old kids. Notice, though that the items great and positive from the well-being scale do load to a certain extent on their non-intended distress scale. They load negatively, but we would expect that (high scores on these items indicates less distress). Structurally, it's the absolute size of the loadings that matters, not so much the direction. Items that load with similar strengths on more than one factor are referred to as ambiguous items and are problematic because they undermine the ideal of a simple factor structure. Normally one eliminates such ambiguous items for the sake of clarity. In this case we retained the items for reasons explained below. Exploratory versus confirmatory factor analysis Traditional factor analytical approaches are described as exploratory. This is because they literally involve an exploration of the data. The researcher, or rather in recent years his or her computer software, explores the correlations among items to reveal the underlying latent variables. For example, the technique was originally devised to try to determine the factors underlying intelligence. Researchers would have large numbers of people complete a battery of intelligence test items and then perform exploratory factor analyses to try to find out what underpinned intelligence. In this way they found that intelligence has different components, such as verbal reasoning and visuo-spatial abilities. To begin with these researchers did not know what they were going to find and the latent variables were given names after performing the analysis based on an interpretation of what the items that loaded on them seemed to represent.. In most situations these days, though, you already know what the factor structure of a test should be, either based on previous research, as in our case with the SEES, or based on theory. In recent years techniques have been developed, and are becoming increasingly widely used, to directly test the predicted factor structure of an instrument to see how well the hypothesised model fits the data. This approach is called confirmatory factor analysis (CFA), because one sets out to confirm that the structure of an instrument is as you expect it to be. Sport and exercise psychologists have been among the first to apply this approach on a regular basis and so you are quite likely to come across it in the research papers you read. Furthermore, CFA is complex and being fairly new and still under development there is a great deal of debate about how best to apply it, and it is frequently misused. For these reasons, I'll give you some general guidelines to help you evaluate CFA studies should you come across them. Don't worry if you find this difficult. It is! The biggest single issue in CFA, and the one most widely misunderstood and abused, is the assessment of how well a model fits the data. The principle means of doing this is to examine the chi square goodness of fit statistic. To put it as simply as possible, this tests whether the correlations (or more correctly, the covariances) among the items that you would expect to see if the model is correct are significantly different to the observed correlations (covariances) in the data collected. Since you want no difference, you are looking for a non-significant chi square. This probably rather turns on its head the way you've looked at statistics so far. Whereas you normally want to reject the null hypothesis, in CFA you want to accept it. The problem is that for various reasons it is usually difficult to get a non-significant chi square even if the model is good. Consequently a number of other approaches to assessing model fit have been developed. One popular but rather controversial method is to examine the ratio of chi square to the degrees of freedom for the model. As a general rule, ratios close to 1.0 indicate a very good fit. Personally, I would normally view a ratio of greater than 2.0 as indicating a poor fit. Another way to assess fit is to examine the Goodness of Fit Index (GFI). This tells us the amount of variance in the data explained by the model (it's a bit like R square in regression analysis). It can range from 0 to 1.0 and values less than .90 indicate a poor fit. Ideally, I like to see a GFI of .95 or more. Less than .90 is certainly weak. Another class of indices is referred to as incremental fit indices. There are a number of these, the most popular being the Normed Fit Index (NFI), the Non-normed Fit Index (NNFI) and the Comparative Fit Index (CFI). Like the GFI, they can range from 0 - 1.0 and should normally be at least .95 to indicate a good fit. Yet another popular index is the Root Mean Square Error of Approximation (RMSEA). This is a rather more complicated index to evaluate but basically it should be less than .05. Anything greater than .08 is poor. Finally, a commonly reported statistic is the Standardised Root Mean Square Residual (SRMR). This is the average of the differences between the covariances expected if the model is correct and the covariances in the data, and should be .06 or less to indicate a good fit. Generally speaking, in order to indicate a good fit all the indices should be at acceptable levels. Therefore, researchers should report a range of fit assessments and should always include chi square and the degrees of freedom. Be wary of authors who only report one or two indices and claim a good fit for their models; it probably means that the other indices indicated a poor fit so they ignored them! The fit of the SEES model shown above (13-14 year olds following a PE lesson) was very good. The fit statistics were: Chi square = 44.27, df = 32, p > .05; GFI = .92; NNFI = .95; CFI = .97; SRMR = .06. If you should ever read the paper describing this study, and you are really sharp, you'll notice that the factor loadings shown above don't match the factor loadings reported in paper. That's because the loadings shown above were from an exploratory factor analysis that I did for illustrative purposes for this lesson. Those in the paper are from a CFA where you don't get loadings on non-intended factors because you specify that they are zero anyway. That's not cheating, as it might sound to you. It is in fact what you are testing in a CFA: a set of simultaneous null hypotheses. It turned out that using the CFA approach those two apparently problematic item loadings (great and positive on distress) could be assumed to be not significantly different to zero so they are not ambiguous after all. This illustrates that CFA is a more powerful approach than exploratory factor analysis.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||