Statistical inference involves drawing samples from populations and making decisions about those populations from the sample data. Statistical techniques, known as parametric methods, make a number of assumptions about the nature of the sampled populations. The term “parametric” implies assumptions about parameters. Assumptions such as the population is normally distributed, or that two populations have the same variance are typical assumptions. An additional assumption is that the data are measured with at least an interval level of measurement.
Unfortunately, these assumptions are not always appropriate, or cannot be always supported or validated. The good news is that a field of statistical techniques has evolved known as nonparametrics. Nonparametrics methods do not have the same restrictions placed on them as do their parametric counterparts. Some people use the term "distribution free" tests to describe nonparametrics, although this is not a completely accurate description, there are underlying distributional requirements, but the requirements are much less restrictive.
No statistical test is assumption free. For example, all statistical tests require random samples drawn from the target population and independence of the sampled units and all tests require assumptions made on the level of measurement.
Nonparametric methods are useful for analysis of nominal or ordinal data. They are also useful whenever questions occur concerning the underlying assumptions of a counterpart parametric procedure for interval or ratio data. In general, parametric procedures will have nonparametric counterparts, although the hypothesis tested will not always be exactly the same. For example, a parametric two-sample test for differences in means, may have a counterpart nonparametric test which is a two-sample test for differences in medians.
Ordinal data are often analyzed with nonparametric methods using ranks of the data as opposed to the raw scores themselves. This provides a level of freedom from the underlying concern of how numerical assignment is made. This also requires some thought as to what the analyses are really telling you. When the data from two samples are combined and the raw scores are converted to ranks (1 to n1 + n2), and the two groups are then separated, what does a differences in mean ranks tell you? What are the inferences that can be made about the sampled populations? These are topics of discussion needed as we enter the world of nonparametric statistical methods.
The researcher's goal is to use statistical tests that provide the most power for the hypothesis tested with a given sample size. This assumes that the assumptions of a test can be met with the data in question. When parametric tests are compared to nonparametric tests, with the same sample size, the power of these tests will not be the same. Generally, the assumptions made with parametric tests, give parametric tests more power than their counterpart nonparametric tests, for the same sample size. This is generally true, but is not always the case.
Assume test B is to be compared against test A for a given sample size. Let us also assume that test A is the most powerful test for the given application. Power-efficiency is concerned with how much larger the sample size is needed to be for test B to be as powerful as test A, when the significance level and sample size of A is held constant. The following is an equation for power-efficiency.
For example, if test B requires a sample of size 25 when test A requires a sample of size 20, for the same level of power, at the same significance level, then the power-efficiency may be calculated as follows.
This power-efficiency measure is useful for comparing the power for a given sample size and significance level. Another useful measure for comparing tests is known as the asymptotic relative efficiency. This has some advantages over the power-efficiency calculation in that it is generally independent of the level of significance. The disadvantage is that, it is base\d on the limit where n increases toward infinity, although many of the tests reach their asymptotic relative efficiency with moderate sample sizes. Asymptotic relative efficiency may be calculated as follows.
For example, if the Mann-Whitney U test were to be used for data that might be analyzed with the two-sample t-test for means, the asymptotic relative efficiency is 3/, or approximately 95.5%. If a sample of size 95 were used to analyze data with a t-test, a sample of approximately 100 would be needed for the Mann-Whitney U to have equivalent power. The high power-efficiency suggests that the Mann-Whitney U is an excellent alternative to the t-test, especially since it does not require the restrictive underlying assumptions of the t-test.
The sign test for location is employed in those cases in which we wish to test a hypothesis related to the median of a particular variable. The sign test for location tests the same hypothesis as the Wilcoxon Signed-Ranks test (discussed next), but is much less restrictive. Generally, all that is required to run the sign test is that the data must have been measured on at least an ordinal scale, although only the direction above or below the hypothesized median need be recorded.
Using values falling above or below the hypothesized median, the test is essentially a one-sample binomial test. The expected proportion falling above or below the tested median is 0.50. Both exact and approximate tests may be performed. The exact test uses exact probabilities generated from the binomial distribution. The approximate test uses the normal approximation of the binomial. The exact test is preferred when available.
Values falling on the hypothesized median are ignored with nondirectional hypothesis tests. With directional hypothesis tests, values falling on the hypothesized median may be used.
The Wilcoxon Signed-Ranks Test for location, like the sign test for location, is employed in those cases in which we wish to test a hypothesis related to the median of a particular variable. This test uses information on the deviations from the median, not just the signs. Because of this, greater power will be obtained from the Wilcoxon test.
The Wilcoxon Sign-Ranks Test for location takes the deviations from the hypothesized median and ranks them in absolute magnitude, keeping the signs. If the null hypothesis is true, the sum of the ranks above the hypothesized median (positive ranks) should equal the sum of the ranks below the hypothesized median (negative ranks). The test compares the sum of positive and negative ranks to determine significance.
The Mann-Whitney U test is a test that allows us to test the hypothesis that two samples were drawn from equal populations, with particular sensitivity to differences in the central tendencies (medians) of the two groups. This test compares two groups measured on at least an ordinal scale.
The combined data from two groups are ranked from low to high. When ties are obtained, the average rank is used. Differences in mean ranks are then evaluated for statistical significance.
The two-sample median test allows us to test the hypothesis that two independent samples were drawn from populations with equal medians. This test compares two groups measured on at least an ordinal scale.
The median is generated from the combined data from the two groups. The proportion of group one, that is, above the median should be equal to the proportion of group two, that is, above the median if the null hypothesis is true. This reduced the tests to a two-sample proportion test and the significance may be found with the Fisher Exact test for proportions.
The sign test for matched pairs is used to compare two dependent groups for differences in location. The test looks only at differences within each pair of observations and determines which item in the pair is greater in some underlying property. It is, therefore, useful when only the direction of the differences can be or is recorded. This test has very few underlying assumptions. This test compares two dependent groups measured on at least an ordinal scale.
1. The samples have been randomly drawn from two dependent populations either through matching or repeated measures (critical).
2. The underlying distribution is continuous.
3. The proportions of positive and negative signs follow a binomial distribution (critical).
As with the sign test for location, this test is essentially a one-sample binomial test in which the number of positive differences that would be expected, have a proportion of 0.50. Ties are ignored in the analysis for nondirectional tests. Exact binomial tests may be performed.
The Wilcoxon Signed-Ranks test is used to assess the difference between two dependent populations. The samples may be generated through repeated measures or matching. This test is similar to the Wilcoxon test previously covered for single sample data, and may be considered a more powerful test for paired data than the Sign test. This is because the test uses the ranks of the differences, not just the direction of the differences in its calculations. This test compares two dependent groups measured on at least an ordinal scale.
The Wilcoxon Signed-Ranks test generates the differences in each pair. The absolute values of these differences are then ranked, and the original sign of the differences is kept. The test uses the sum of the ranks of the lower frequency sign as a test statistic.
The Spearman rank-order correlation coefficient, rs, is a measure of association between two variables measured on at least an ordinal scale. The Spearman correlation coefficient may be used in place of the Pearson correlation for interval data when questions arise concerning the underlying assumptions.
The correlation coefficient is generated by separately ranking two dependent variables and then calculating the standard Pearson correlation on the ranks. The significance may be evaluated with the standard Pearson significance test. Exact tables are also available for small sample sizes.