- Sheldon Cooper: "You know, it just occurred to me: if there are an infinite number of parallel universes, in one of then there's probably a Sheldon who doesn't believe parallel universes exist."
Tldr: No time to read all the background? Jump here to the conclusion.
In order to understand the results of a regression analysis - to be able to understand scientific statistical evaluations at all -
the concept of hypothesis testing must be internalized. Hypothesis testing is a topic that is sometimes not well understood even
at an advanced level. Hypothesis tests are used to test whether specific properties from a single sample (can) also apply to the
population. A property (also called a parameter) could be, for example, the mean value or a regression coefficient.
Introduction to Hypothesis Tests
The overall aim of statistics is to draw conclusions about a population with the help of a sample from this population. We therefore
want to make statements about a population, but are limited in reality as we only ever have a fraction of the relevant data.
1) Of course, you could combine all the samples into a single large sample, but
even then you could form another sample from the population and then you would be faced with the same problem again.
One example: We want to investigate how the age of working people influences their income.
Our theory: the older a person is, the higher their income, e.g. due to experience. So if we now go out to collect our data, we will
never be able to ask all working people in practice, but will have to make do with a (representative) sample.
The problem: To what extent does the result from our analysis also apply to the population, i.e. all existing professionals? Perhaps,
by chance, we have particularly successful people in the sample who are already earning a lot of money at a young age.
Which estimated regression line is now the "correct" one? Which parameters
and
are the "correct" coefficients? Or vice versa, given calculated coefficients
and
,
how much can we trust these results? Perhaps the sample was a statistical outlier.
Notes
2) If we continue to increase the sample by looking at more and more data, so that the sample gets closer and
closer to the population, then our measured coefficients
and
will approach the “true” coefficients
and
(keyword: unbiasedness). If the sample corresponded to the population, then
=
and
=
.
We use hypothesis tests to answer these questions.
The Nature of Chance (or: Confidence Intervals and Standard Errors)
First, we clarify that chance has an influence on every data sample. If you have collected certain
data in the first sample, these data points will look completely different in a second sample.
Accordingly, chance also plays a role in the observed parameters of the sample, e.g. calculated regression
coefficients or mean values. In the case of a regression analysis, different values of e.g.
(from the population of all possible
) are realized depending on the sample.
Under the central limit theorem, a distribution of the calculated
values would look something like this:
The dispersion of
is also called the standard error (SE) - this will become important later. In general, the
standard error refers to the standard deviation of a parameter. The standard deviation itself refers
to the spread of the actual data points.
Mathematically, we write confidence intervals for the parameter
like this:
-
: is the standard error of
.
It is defined as with n as the sample size, the standard deviation of the regressor X and the standard deviation of the residuals from the regression. -
: is a critical value of the T-distribution - more on this here.
Here, t is to be interpreted as a factor that scales the standard error. The larger t, the larger the confidence interval with a constant standard error SE.
Hypotheses
- Lloyd Christmas: "So you're telling me there's a chance?"
Hypothesis testing starts with the hypotheses. There are several types of hypothesis tests, but we will initially
limit ourselves to the so-called two-tailed hypothesis test. A distinction is made between a null hypothesis
and an alternative hypothesis
. In the null hypothesis, we assume the opposite of the relationship that we want to prove.
Important: The aim of a statistical regression analysis is therefore to reject the null hypothesis.
We can do this if a measured
is statistically significant - i.e. lies outside a confidence interval. In order to show the influence of
a variable X on Y in a regression analysis, we must prove that the effect of X on Y is real and not the result
of mere chance.
So we cannot say that a
from
is "true" or "correct", but we can say that a calculated
deviates so extremely from
of
that the null hypothesis seems very unlikely. As a result, we reject our null hypothesis
. There then appears to be a real effect of X on Y in the form of
that is more than mere chance. This is the case if
is outside the confidence interval around Z, where Z corresponds to the value from:
:
*Why only very likely?
Now ATTENTION: Because we cannot say with absolute certainty that our null hypothesis
is "wrong", as extreme
-values can also arise by chance!! We even accept a certain probability of being wrong, i.e. wrongly rejecting
(type I errors).
Significance Level Alpha
- Vegeta: "It's over 9000!"
Extreme values of
which lie outside a confidence interval, can occur by chance. However, they are so rare that we accept a
self-defined probability of being wrong and falsely rejecting
. This is also called alpha error or type I error. The conclusion is drawn that there is a statistically
significant correlation or effect, although this is not actually the case.
The golden question now: When do we (wrongly) reject
and what probability of error do we accept for this?
Short answer: We discard
with a 5% probability of error if our calculated parameter
lies outside a 95% confidence interval.
Long answer: It depends...
Up to now, we have always talked about the 95% confidence interval, for which we accept a 5% probability of error.
However, we can determine ourselves the error probability alpha with which we want to discard
. Alpha defines the probability with which we falsely discard
. This is also called "type I error".
If we want to accept a lower probability of error, e.g. alpha = 1%, then this means that we will only incorrectly reject
in 1% of all cases - sounds good, right? However, this has a crucial consequence: it becomes "more difficult" to reject
, because the confidence interval becomes larger - 99%! This becomes apparent when looking at the equation of the
confidence interval: the value t changes. The lower alpha, the more difficult it is to reject
. This is because even more extreme
-values are then needed to say that an effect of X on Y is real and not just a coincidence.
The Test Statistic T and Critical Values t
So far, we have only ever considered the critical value t in the formula for confidence intervals as a factor before the standard error. There, t increases or decreases the bounds of the confidence interval. Since t implicitly determines the limits of the symmetrical confidence interval, we can also consider t individually in order to draw conclusions about the confidence interval. Now, while confidence intervals give us an idea of where the true parameter lies, we can also use the so-called test statistic T to test a specific hypothesis (e.g. whether ). T is an aggregate value from our data that forms the basis for our decision to assess how closely our sample data matches . T is defined in the context of linear regression as:
- : is the calculated regression coefficient and represents the observed slope of the linear regression line from the present sample.
- : is the assumption from the null hypothesis and = 0 most of the time.
-
: is the standard error of
.
It is defined as , with n as sample size, as standard deviation of the regressor X and as standard deviation of the residuals from the regression.
Since T depends on the parameters of the sample, T is also subject to random fluctuations and follows the so-called t-distribution:
What are "critical values"? Critical values t are certain points in the t-distribution that define the limits for significance levels. For example, the critical values t correspond to the 2.5% quantile and the 97.5% quantile for a 5% significance level and a 95% confidence interval. Critical values t are therefore quantiles of the t-distribution and define the range in which the test statistic T will fall with a probability of (1-alpha), here corresponding to (1 - alpha) = 95%. Common significance levels, their associated confidence intervals and critical values look like this:
Significance level | Critical value t |
Confidence interval |
---|---|---|
5% | |1.96| | 95% |
1% | |2.576| | 99% |
0.01% | |3.291| | 99.99% |
Calculating the test statistic T and comparing it with the critical values t basically checks where T lies within the t-distribution. If T lies outside the range defined by the critical values t, this indicates an unlikely result under the assumption of . As a result, we reject the null hypothesis (with an error probability ≤ alpha, because extreme test statistics of T can also occur by chance).
The limits of the confidence interval are directly related to the test statistic T. They correspond to the range for which the test statistic T is within the critical values t of the t-distribution. Visually this is the blue striped area under the bell curve in the plot above. Therefore, the critical values from the table above can be inserted into the limits of the confidence interval to obtain the corresponding (1-alpha) confidence interval.
P-values and the Significance Level Alpha (or: Where is the star?)
In addition to confidence intervals and test statistics, there is a third way of testing null hypotheses and statistical
significance, namely using p-values. A p-value is the probability for a test statistic T' that is even more extreme than
the calculated test statistic T that is available from the actual sample. The p-value is therefore the error probability
value at which
is rejected:
In scientific regression tables, in addition to regression coefficients, you also occasionally see asterisks: * , * * , * * * .
These stand for the p-values:
- * means p ≤ 0.05 or p ≤ 5%
- ** means p ≤ 0.01 or p ≤ 1%
- *** means p ≤ 0.0001 or p ≤ 0.01%
Conclusion
- Chance plays a decisive role in the characteristics of a sample.
- Accordingly, chance also influences calculated parameters such as regression coefficients or , which are based on the sample.
- The null hypothesis defines the assumption that we want to reject and states that a measured effect is random.
- The alternative hypothesis is the opposite of and shows that there is a real effect in the data.
- can be rejected with 5% error probability if T>|1.96| or is outside the 95% confidence interval or the p-value is less than 5% or 0.05. The parameter is then statistically significant.
- Alpha is the error probability that we accept to falsely reject . It is the probability for type I errors.
- The standard error SE is the dispersion of a parameter, e.g. of the slope of the regression line.
- A confidence interval specifies a range in which the true values of a parameter lie with a certain probability (e.g. 95%).
- The test statistic T is a value that results from the sample data and is therefore also influenced by chance. This is shown by the t-distribution. T is the basis for our decision on , whether to reject or not.
- The critical value t is a quantile of the distribution of the test statistic T. A calculated test statistic T that is more extreme than a critical value t only occurs with a probability ≤ alpha. As a result is rejected with the error probability alpha.
- The p-value is the probability of a T that is even more extreme than the measured T from a given sample.