Hypothesis Testing in Linear Regression

Table of contents

1. Introduction to Hypothesis Tests
2. The Nature of Chance (or: Confidence Intervals and Standard Errors)
3. Hypotheses
4. Significance Level Alpha
5. The Test Statistic T and Critical Values t
6. P-values and the Significance Level Alpha (or: Where is the star?)
7. Conclusion

- Sheldon Cooper: "You know, it just occurred to me: if there are an infinite number of parallel universes, in one of then there's probably a Sheldon who doesn't believe parallel universes exist."

The Big Bang Theory, season 4, episode 5

Tldr: No time to read all the background? Jump here to the conclusion.

In order to understand the results of a regression analysis - to be able to understand scientific statistical evaluations at all - the concept of hypothesis testing must be internalized. Hypothesis testing is a topic that is sometimes not well understood even at an advanced level. Hypothesis tests are used to test whether specific properties from a single sample (can) also apply to the population. A property (also called a parameter) could be, for example, the mean value or a regression coefficient.

Introduction to Hypothesis Tests

The overall aim of statistics is to draw conclusions about a population with the help of a sample from this population. We therefore want to make statements about a population, but are limited in reality as we only ever have a fraction of the relevant data.
One example: We want to investigate how the age of working people influences their income.
Our theory: the older a person is, the higher their income, e.g. due to experience. So if we now go out to collect our data, we will never be able to ask all working people in practice, but will have to make do with a (representative) sample.
The problem: To what extent does the result from our analysis also apply to the population, i.e. all existing professionals? Perhaps, by chance, we have particularly successful people in the sample who are already earning a lot of money at a young age.

linear regression on different samples
Which estimated regression line is now the "correct" one? Which parameters ${\hat{β}}_{0}$ and ${\hat{β}}_{1}$ are the "correct" coefficients? Or vice versa, given calculated coefficients ${\hat{β}}_{0}$ and ${\hat{β}}_{1}$ , how much can we trust these results? Perhaps the sample was a statistical outlier.

Notes

1) Of course, you could combine all the samples into a single large sample, but even then you could form another sample from the population and then you would be faced with the same problem again.
2) If we continue to increase the sample by looking at more and more data, so that the sample gets closer and closer to the population, then our measured coefficients ${\hat{β}}_{0}$ and ${\hat{β}}_{1}$ will approach the “true” coefficients $β_{0}$ and $β_{1}$ (keyword: unbiasedness). If the sample corresponded to the population, then ${\hat{β}}_{0}$ = $β_{0}$ and ${\hat{β}}_{1}$ = $β_{1}$ .

We use hypothesis tests to answer these questions.

The Nature of Chance (or: Confidence Intervals and Standard Errors)

First, we clarify that chance has an influence on every data sample. If you have collected certain data in the first sample, these data points will look completely different in a second sample. Accordingly, chance also plays a role in the observed parameters of the sample, e.g. calculated regression coefficients or mean values. In the case of a regression analysis, different values of e.g. ${\hat{β}}_{1}$ (from the population of all possible $β_{1}$ ) are realized depending on the sample. Under the central limit theorem, a distribution of the calculated ${\hat{β}}_{1}$ values would look something like this:

Distribution of regression coefficients — $β_{1}$ corresponds to the true parameter from the population, while ${\hat{β}}_{1}$ is calculated from a sample.

All possible values of the regression coefficient

β_{1}

, i.e. 100%, lie below the curve in the figure above. Confidence intervals are used to express how "normal" or "expected" a given parameter is. In our case, this parameter is a regression coefficient such as

β_{1}

. We now say: In the so-called 95% confidence interval, 95% of all "normal"

{\hat{β}}_{1}

-values lie around a certain basic value of

β_{1}

. We now simply call this certain basic value of the regression coefficient Z. So if we repeatedly create samples and calculate a linear regression with the respective data points, in 95% of cases the regression coefficients

{\hat{β}}_{1}

lie around Z (the specific regression coefficient

β_{1}

The dispersion of

{\hat{β}}_{1}

is also called the standard error (SE) - this will become important later. In general, the standard error refers to the standard deviation of a parameter. The standard deviation itself refers to the spread of the actual data points.
Mathematically, we write confidence intervals for the parameter

{\hat{β}}_{1}

like this:

[Z - t_{\frac{α}{2}} \cdot SE ({\hat{β}}_{1}) \leq Z \leq Z + t_{\frac{α}{2}} \cdot SE ({\hat{β}}_{1})]

$SE ({\hat{β}}_{1})$ : is the standard error of $β_{1}$ .
It is defined as $SE ({\hat{β}}_{1}) = \frac{σ_{ε}}{σ_{x} \cdot \sqrt{n}}$ with n as the sample size, $σ_{x}$ the standard deviation of the regressor X and $σ_{ε}$ the standard deviation of the residuals from the regression.
$t$ : is a critical value of the T-distribution - more on this here.
Here, t is to be interpreted as a factor that scales the standard error. The larger t, the larger the confidence interval with a constant standard error SE.

\begin{array}{l} The lower bound of the symmetrical confidence interval is & {\hat{β}}_{1}^{min} = Z - t_{\frac{α}{2}} \cdot SE ({\hat{β}}_{1}) \\ The upper bound corresponds to: & {\hat{β}}_{1}^{max} = Z + t_{\frac{α}{2}} \cdot SE ({\hat{β}}_{1}) \end{array}

significance levels for regression coefficients — Example with 95% confidence interval, t = 1.96, α = 5%

Back to the actual problem: To express how “normal” or “expected” a calculated

{\hat{β}}_{1}

is, we look at the position of the calculated

{\hat{β}}_{1}

- is it within or outside the confidence interval around Z? If

{\hat{β}}_{1}

is outside the CI, then we speak of a statistically significant result. In the context of hypothesis testing, this Z that we have been talking about is also called the null hypothesis.

Hypotheses

- Lloyd Christmas: "So you're telling me there's a chance?"

Dumb and Dumber

Hypothesis testing starts with the hypotheses. There are several types of hypothesis tests, but we will initially limit ourselves to the so-called two-tailed hypothesis test. A distinction is made between a null hypothesis $H_{0}$ and an alternative hypothesis $H_{1}$ . In the null hypothesis, we assume the opposite of the relationship that we want to prove.
Important: The aim of a statistical regression analysis is therefore to reject the null hypothesis. We can do this if a measured ${\hat{β}}_{1}$ is statistically significant - i.e. lies outside a confidence interval. In order to show the influence of a variable X on Y in a regression analysis, we must prove that the effect of X on Y is real and not the result of mere chance.

two-tailed significance test — $β_{1}$ is set to $H_{0}$ $= 0$ because it should be shown that a variable has an effect. This is only the case if the assumption $β_{1} = 0$ is rejected.

If we can reject the null hypothesis, it means that there is very likely* a real effect of age on income. The aim of a regression analysis is to show this. Important: Null hypotheses are never confirmed or accepted, they are either rejected or not! If we cannot reject the null hypothesis, this means that the effect we have found in our data (i.e.

{\hat{β}}_{1}

) is due to mere chance.
So we cannot say that a

β_{1}

from

H_{0}

is "true" or "correct", but we can say that a calculated

{\hat{β}}_{1}

deviates so extremely from

β_{1}

H_{0}

that the null hypothesis seems very unlikely. As a result, we reject our null hypothesis

H_{0}

. There then appears to be a real effect of X on Y in the form of

β_{1}

that is more than mere chance. This is the case if

{\hat{β}}_{1}

is outside the confidence interval around Z, where Z corresponds to the value from:

H_{0}

{\hat{β}}_{1} < {\hat{β}}_{1}^{min} = {β_{1}}^{H_{0}} - t_{\frac{α}{2}} \cdot SE ({\hat{β}}_{1}) or {\hat{β}}_{1} > {\hat{β}}_{1}^{max} = {β_{1}}^{H_{0}} + t_{\frac{α}{2}} \cdot SE ({\hat{β}}_{1})

*Why only very likely?
Now ATTENTION: Because we cannot say with absolute certainty that our null hypothesis

H_{0}

is "wrong", as extreme

{\hat{β}}_{1}

-values can also arise by chance!! We even accept a certain probability of being wrong, i.e. wrongly rejecting

H_{0}

(type I errors).

Significance Level Alpha

- Vegeta: "It's over 9000!"

Dragon Ball Z

Extreme values of ${\hat{β}}_{1}$ which lie outside a confidence interval, can occur by chance. However, they are so rare that we accept a self-defined probability of being wrong and falsely rejecting $H_{0}$ . This is also called alpha error or type I error. The conclusion is drawn that there is a statistically significant correlation or effect, although this is not actually the case.
The golden question now: When do we (wrongly) reject $H_{0}$ and what probability of error do we accept for this?

Short answer: We discard $H_{0}$ with a 5% probability of error if our calculated parameter ${\hat{β}}_{1}$ lies outside a 95% confidence interval.

Long answer: It depends...
Up to now, we have always talked about the 95% confidence interval, for which we accept a 5% probability of error. However, we can determine ourselves the error probability alpha with which we want to discard $H_{0}$ . Alpha defines the probability with which we falsely discard $H_{0}$ . This is also called "type I error".
If we want to accept a lower probability of error, e.g. alpha = 1%, then this means that we will only incorrectly reject $H_{0}$ in 1% of all cases - sounds good, right? However, this has a crucial consequence: it becomes "more difficult" to reject $H_{0}$ , because the confidence interval becomes larger - 99%! This becomes apparent when looking at the equation of the confidence interval: the value t changes. The lower alpha, the more difficult it is to reject $H_{0}$ . This is because even more extreme ${\hat{β}}_{1}$ -values are then needed to say that an effect of X on Y is real and not just a coincidence.

The Test Statistic T and Critical Values t

So far, we have only ever considered the critical value t in the formula for confidence intervals as a factor before the standard error. There, t increases or decreases the bounds of the confidence interval. Since t implicitly determines the limits of the symmetrical confidence interval, we can also consider t individually in order to draw conclusions about the confidence interval. Now, while confidence intervals give us an idea of where the true parameter lies, we can also use the so-called test statistic T to test a specific hypothesis $H_{0}$ (e.g. whether $β_{1} = 0$ ). T is an aggregate value from our data that forms the basis for our decision to assess how closely our sample data matches $H_{0}$ . T is defined in the context of linear regression as: $T = \frac{{\hat{β}}_{1} - {β_{1}}^{H_{0}}}{SE ({\hat{β}}_{1})}$

${\hat{β}}_{1}$ : is the calculated regression coefficient and represents the observed slope of the linear regression line from the present sample.
${β_{1}}^{H_{0}}$ : is the assumption from the null hypothesis $H_{0}$ and = 0 most of the time.
$SE ({\hat{β}}_{1})$ : is the standard error of $β_{1}$ .
It is defined as $SE ({\hat{β}}_{1}) = \frac{σ_{ε}}{σ_{x} \cdot \sqrt{n}}$ , with n as sample size, $σ_{x}$ as standard deviation of the regressor X and $σ_{ε}$ as standard deviation of the residuals from the regression.

The test statistic T is a value that indicates how far a calculated parameter is from a hypothetical value in standard error units. A calculated parameter is, for example, the regression coefficient

{\hat{β}}_{1}

and the hypothetical base value is in the null hypothesis.

H_{0} : β_{1} = 0

. We thus check whether the parameter

{\hat{β}}_{1}

deviates significantly from this hypothetical value.

Since T depends on the parameters of the sample, T is also subject to random fluctuations and follows the so-called t-distribution: Distribution of t-test

What are "critical values"? Critical values t are certain points in the t-distribution that define the limits for significance levels. For example, the critical values t correspond to the 2.5% quantile and the 97.5% quantile for a 5% significance level and a 95% confidence interval. Critical values t are therefore quantiles of the t-distribution and define the range in which the test statistic T will fall with a probability of (1-alpha), here corresponding to (1 - alpha) = 95%. Common significance levels, their associated confidence intervals and critical values look like this:

Significance level	Critical value t	Confidence interval
5%	\|1.96\|	95%
1%	\|2.576\|	99%
0.01%	\|3.291\|	99.99%

Calculating the test statistic T and comparing it with the critical values t basically checks where T lies within the t-distribution. If T lies outside the range defined by the critical values t, this indicates an unlikely result under the assumption of

H_{0}

. As a result, we reject the null hypothesis (with an error probability ≤ alpha, because extreme test statistics of T can also occur by chance).
The limits of the confidence interval are directly related to the test statistic T. They correspond to the range for which the test statistic T is within the critical values t of the t-distribution. Visually this is the blue striped area under the bell curve in the plot above. Therefore, the critical values from the table above can be inserted into the limits of the confidence interval to obtain the corresponding (1-alpha) confidence interval.

P-values and the Significance Level Alpha (or: Where is the star?)

In addition to confidence intervals and test statistics, there is a third way of testing null hypotheses and statistical significance, namely using p-values. A p-value is the probability for a test statistic T' that is even more extreme than the calculated test statistic T that is available from the actual sample. The p-value is therefore the error probability value at which $H_{0}$ is rejected:

p-value and significance level alpha

In scientific regression tables, in addition to regression coefficients, you also occasionally see asterisks: * , * * , * * * . These stand for the p-values:

* means p ≤ 0.05 or p ≤ 5%
** means p ≤ 0.01 or p ≤ 1%
*** means p ≤ 0.0001 or p ≤ 0.01%

If asterisks appear in the table, this means that the corresponding variable in the model is statistically significant. The null hypothesis

H_{0}

was therefore rejected and the effect of the variable is real and not due to chance (but taking into account the corresponding error probability alpha).

Conclusion

Chance plays a decisive role in the characteristics of a sample.
Accordingly, chance also influences calculated parameters such as regression coefficients ${\hat{β}}_{0}$ or ${\hat{β}}_{1}$ , which are based on the sample.
The null hypothesis $H_{0}$ defines the assumption that we want to reject and states that a measured effect is random.
The alternative hypothesis $H_{1}$ is the opposite of $H_{0}$ and shows that there is a real effect in the data.
$H_{0}$ can be rejected with 5% error probability if T>|1.96| or ${\hat{β}}_{1}$ is outside the 95% confidence interval or the p-value is less than 5% or 0.05. The parameter is then statistically significant.
Alpha is the error probability that we accept to falsely reject $H_{0}$ . It is the probability for type I errors.
The standard error SE is the dispersion of a parameter, e.g. of $β_{1}$ the slope of the regression line.
A confidence interval specifies a range in which the true values of a parameter lie with a certain probability (e.g. 95%).
The test statistic T is a value that results from the sample data and is therefore also influenced by chance. This is shown by the t-distribution. T is the basis for our decision on $H_{0}$ , whether to reject $H_{0}$ or not.
The critical value t is a quantile of the distribution of the test statistic T. A calculated test statistic T that is more extreme than a critical value t only occurs with a probability ≤ alpha. As a result $H_{0}$ is rejected with the error probability alpha.
The p-value is the probability of a T that is even more extreme than the measured T from a given sample.

Hypothesis Tests and Linear Regression