Nonparametric Tests
Contents
- What Are Nonparametrics?
- Power-Efficiency
- Sign Test for Location
- Wilcoxon Sign Ranks Tests for Location
- Wilcoxon-Mann-Whitney Test
- Two-sample Median Test
- Two-sample Sign Test
- Two-Sample Wilcoxon Signed Ranks Test
- The Spearman Rank Correlation
What Are Nonparametrics?
Statistical inference involves drawing samples from populations and making decisions about those populations from the sample data. Statistical techniques, known as parametric methods, make a number of assumptions about the nature of the sampled populations. The term “parametric” implies assumptions about parameters. Assumptions such as the population is normally distributed, or that two populations have the same variance are typical assumptions. An additional assumption is that the data are measured with at least an interval level of measurement.
Unfortunately, these assumptions are not always appropriate, or cannot be always supported or validated. The good news is that a field of statistical techniques has evolved known as nonparametrics. Nonparametrics methods do not have the same restrictions placed on them as do their parametric counterparts. Some people use the term "distribution free" tests to describe nonparametrics, although this is not a completely accurate description, there are underlying distributional requirements, but the requirements are much less restrictive.
No statistical test is assumption free. For example, all statistical tests require random samples drawn from the target population and independence of the sampled units and all tests require assumptions made on the level of measurement.
Nonparametric methods are useful for analysis of nominal or ordinal data. They are also useful whenever questions occur concerning the underlying assumptions of a counterpart parametric procedure for interval or ratio data. In general, parametric procedures will have nonparametric counterparts, although the hypothesis tested will not always be exactly the same. For example, a parametric two-sample test for differences in means, may have a counterpart nonparametric test which is a two-sample test for differences in medians.
Ordinal data are often analyzed with nonparametric methods using ranks of the data as opposed to the raw scores themselves. This provides a level of freedom from the underlying concern of how numerical assignment is made. This also requires some thought as to what the analyses are really telling you. When the data from two samples are combined and the raw scores are converted to ranks (1 to n1 + n2), and the two groups are then separated, what does a differences in mean ranks tell you? What are the inferences that can be made about the sampled populations? These are topics of discussion needed as we enter the world of nonparametric statistical methods.
Power-Efficiency
The researcher's goal is to use statistical tests that provide the most power for the hypothesis tested with a given sample size. This assumes that the assumptions of a test can be met with the data in question. When parametric tests are compared to nonparametric tests, with the same sample size, the power of these tests will not be the same. Generally, the assumptions made with parametric tests, give parametric tests more power than their counterpart nonparametric tests, for the same sample size. This is generally true, but is not always the case.
Assume test B is to be compared against test A for a given sample size. Let us also assume that test A is the most powerful test for the given application. Power-efficiency is concerned with how much larger the sample size is needed to be for test B to be as powerful as test A, when the significance level and sample size of A is held constant. The following is an equation for power-efficiency.
For example, if test B requires a sample of size 25 when test A requires a sample of size 20, for the same level of power, at the same significance level, then the power-efficiency may be calculated as follows.
This power-efficiency measure is useful for comparing the power for a given sample size and significance level. Another useful measure for comparing tests is known as the asymptotic relative efficiency. This has some advantages over the power-efficiency calculation in that it is generally independent of the level of significance. The disadvantage is that, it is base\d on the limit where n increases toward infinity, although many of the tests reach their asymptotic relative efficiency with moderate sample sizes. Asymptotic relative efficiency may be calculated as follows.
For example, if the Mann-Whitney U test were to be used for data that might be analyzed with the two-sample t-test for means, the asymptotic relative efficiency is 3/, or approximately 95.5%. If a sample of size 95 were used to analyze data with a t-test, a sample of approximately 100 would be needed for the Mann-Whitney U to have equivalent power. The high power-efficiency suggests that the Mann-Whitney U is an excellent alternative to the t-test, especially since it does not require the restrictive underlying assumptions of the t-test.
Sign Test for Location
Purpose
The sign test for location is employed in those cases in which we wish to test a hypothesis related to the median of a particular variable. The sign test for location tests the same hypothesis as the Wilcoxon Signed-Ranks test (discussed next), but is much less restrictive. Generally, all that is required to run the sign test is that the data must have been measured on at least an ordinal scale, although only the direction above or below the hypothesized median need be recorded.
Assumptions
- The sample has been randomly drawn from the population (critical).
- The measurement is at least ordinal (critical).
- The underlying property studied is continuous.
- The probability of values falling on the hypothesized median is low (critical for non-directional tests).
Methods
Using values falling above or below the hypothesized median, the test is essentially a one-sample binomial test. The expected proportion falling above or below the tested median is 0.50. Both exact and approximate tests may be performed. The exact test uses exact probabilities generated from the binomial distribution. The approximate test uses the normal approximation of the binomial. The exact test is preferred when available.
Values falling on the hypothesized median are ignored with nondirectional hypothesis tests. With directional hypothesis tests, values falling on the hypothesized median may be used.
Other Notes
- Given the discrete nature of the binomial distribution, type I and II error levels may not be available at an exact and preferred level.
- Care must be taken when lack of resolution generates a large number of values that fall on the median. For nondirectional tests, having a large number of values fall on the median may produce misleading results. In these situations, the number of values falling above and below the median may not be equal.
Wilcoxon Sign Ranks Tests for Location
Purpose
The Wilcoxon Signed-Ranks Test for location, like the sign test for location, is employed in those cases in which we wish to test a hypothesis related to the median of a particular variable. This test uses information on the deviations from the median, not just the signs. Because of this, greater power will be obtained from the Wilcoxon test.
Assumptions
- The sample has been randomly drawn from the population (critical).
- The measurement is at least ordinal (critical).
- The underlying property studied is continuous.
- The probability of values falling on the median is low (critical).
- The distribution is symmetrical about the population median (critical).
Methods
The Wilcoxon Sign-Ranks Test for location takes the deviations from the hypothesized median and ranks them in absolute magnitude, keeping the signs. If the null hypothesis is true, the sum of the ranks above the hypothesized median (positive ranks) should equal the sum of the ranks below the hypothesized median (negative ranks). The test compares the sum of positive and negative ranks to determine significance.
Other Notes
- Care must be taken when lack of resolution generates a large number of values that fall on the hypothesized median. Having a large number of values fall on the median may produce misleading results.
- The positive and negative scales must be symmetrical for this test to be valid. This means that a score of "2" must be of the same magnitude as a "-2" in absolute difference.
Wilcoxon-Mann-Whitney Test
Purpose
The Mann-Whitney U test is a test that allows us to test the hypothesis that two samples were drawn from equal populations, with particular sensitivity to differences in the central tendencies (medians) of the two groups. This test compares two groups measured on at least an ordinal scale.
Assumptions
- The samples have been randomly drawn from two independent populations (critical).
- The data are measured on at least an ordinal scale (critical).
- The underlying property is continuous.
- When testing for differences in medians, the distributions have the same shape and spread.
Methods
The combined data from two groups are ranked from low to high. When ties are obtained, the average rank is used. Differences in mean ranks are then evaluated for statistical significance.
Other Notes
- The power-efficiency approaches 3/, or 95.5 percent as n increases, and is close to 95 percent for moderate sample sizes. This is an excellent alternative to the independent two-sample t-test. There are some situations in which the power-efficiency is greater than 1, meaning the test is more powerful than the t-test in those situations.
- This test is also useful when no measurement gauge is available, yet items may be sorted in order of magnitude of the property studied. An analysis may be conducted by simply using the rank values as the data.
- The test is most sensitive to differences in medians, yet does have sensitivity to shape and spread as well.
Two-sample Median Test
Purpose
The two-sample median test allows us to test the hypothesis that two independent samples were drawn from populations with equal medians. This test compares two groups measured on at least an ordinal scale.
Assumptions
- The samples have been randomly drawn from two independent populations (critical).
- The data are measured on at least an ordinal scale (critical).
- The underlying property is continuous.
Methods
The median is generated from the combined data from the two groups. The proportion of group one, that is, above the median should be equal to the proportion of group two, that is, above the median if the null hypothesis is true. This reduced the tests to a two-sample proportion test and the significance may be found with the Fisher Exact test for proportions.
Other Notes
- Should some of the data be truncated or "off-the-scale," there is no alternative to the test for medians and this test should be used in these situations, even if the data are measured at an interval scale.
- The median test has power-efficiency of about 95 percent with small sample sizes when compared against the two-sample t-test for interval level data. As sample size increases, the asymptotic efficiency approaches 2/ or 63%.
Two-sample Sign Test
Purpose
The sign test for matched pairs is used to compare two dependent groups for differences in location. The test looks only at differences within each pair of observations and determines which item in the pair is greater in some underlying property. It is, therefore, useful when only the direction of the differences can be or is recorded. This test has very few underlying assumptions. This test compares two dependent groups measured on at least an ordinal scale.
Assumptions
1. The samples have been randomly drawn from two dependent populations either through matching or repeated measures (critical).
2. The underlying distribution is continuous.
3. The proportions of positive and negative signs follow a binomial distribution (critical).
Methods
As with the sign test for location, this test is essentially a one-sample binomial test in which the number of positive differences that would be expected, have a proportion of 0.50. Ties are ignored in the analysis for nondirectional tests. Exact binomial tests may be performed.
Other Notes
- This test can be performed by matching or taking repeated measures.
Two-Sample Wilcoxon Signed Ranks Test
Purpose
The Wilcoxon Signed-Ranks test is used to assess the difference between two dependent populations. The samples may be generated through repeated measures or matching. This test is similar to the Wilcoxon test previously covered for single sample data, and may be considered a more powerful test for paired data than the Sign test. This is because the test uses the ranks of the differences, not just the direction of the differences in its calculations. This test compares two dependent groups measured on at least an ordinal scale.
Assumptions
- The samples have been randomly drawn from two dependent populations either through matching or repeated measures (critical).
- The data are measured on at least an ordinal scale (critical).
- The underlying distribution is continuous.
Methods
The Wilcoxon Signed-Ranks test generates the differences in each pair. The absolute values of these differences are then ranked, and the original sign of the differences is kept. The test uses the sum of the ranks of the lower frequency sign as a test statistic.
Other Notes
- The Wilcoxon Signed-Ranks test has an asymptotic efficiency of 3/ or 95.5 percent when compared with the paired t-test. For small sample sizes, the power-efficiency is near 95 percent.
The Spearman Rank Correlation
Purpose
The Spearman rank-order correlation coefficient, rs, is a measure of association between two variables measured on at least an ordinal scale. The Spearman correlation coefficient may be used in place of the Pearson correlation for interval data when questions arise concerning the underlying assumptions.
Assumptions
- The sample has been randomly drawn from a bivariate (two-dimensional) population (critical).
- The data are measured on at least an ordinal scale (critical).
- The underlying variables are continuous.
Methods
The correlation coefficient is generated by separately ranking two dependent variables and then calculating the standard Pearson correlation on the ranks. The significance may be evaluated with the standard Pearson significance test. Exact tables are also available for small sample sizes.
Other Notes
- The Spearman correlation coefficient may be useful in situations in which the underlying relationship is nonlinear.
- As an option, the data may be collected by simply recording the rank order of the observations.
- The null hypothesis for testing the significance of the Spearman rs may be stated as "There is no association between the factors." The null hypothesis would not be stated that the population parameter is equal to zero. It is possible to have a population parameter equal to zero, yet not have the factors independent.
- The Kendall rank-order correlation coefficient ( Kendall's tau), may be used in the same situations. Kendall's tau is calculated differently, although they are both based on ranks. Kendall's tau uses the agreements and disagreements in the ordering of the ranks. Both measures are similar in power in detection of association.