Hypothesis Tests and Linear Regression

Table of contents

- Sheldon Cooper: "You know, it just occurred to me: if there are an infinite number of parallel universes, in one of then there's probably a Sheldon who doesn't believe parallel universes exist."

The Big Bang Theory, season 4, episode 5


Tldr: No time to read all the background? Jump here to the conclusion.

In order to understand the results of a regression analysis - to be able to understand scientific statistical evaluations at all - the concept of hypothesis testing must be internalized. Hypothesis testing is a topic that is sometimes not well understood even at an advanced level. Hypothesis tests are used to test whether specific properties from a single sample (can) also apply to the population. A property (also called a parameter) could be, for example, the mean value or a regression coefficient.

Introduction to Hypothesis Tests

The overall aim of statistics is to draw conclusions about a population with the help of a sample from this population. We therefore want to make statements about a population, but are limited in reality as we only ever have a fraction of the relevant data.
One example: We want to investigate how the age of working people influences their income.
Our theory: the older a person is, the higher their income, e.g. due to experience. So if we now go out to collect our data, we will never be able to ask all working people in practice, but will have to make do with a (representative) sample.
The problem: To what extent does the result from our analysis also apply to the population, i.e. all existing professionals? Perhaps, by chance, we have particularly successful people in the sample who are already earning a lot of money at a young age.

linear regression on different samples
Which estimated regression line is now the "correct" one? Which parameters β^0 and β^1 are the "correct" coefficients? Or vice versa, given calculated coefficients β^0 and β^1 , how much can we trust these results? Perhaps the sample was a statistical outlier.

Notes

1) Of course, you could combine all the samples into a single large sample, but even then you could form another sample from the population and then you would be faced with the same problem again.
2) If we continue to increase the sample by looking at more and more data, so that the sample gets closer and closer to the population, then our measured coefficients β^0 and β^1 will approach the “true” coefficients β0 and β1 (keyword: unbiasedness). If the sample corresponded to the population, then β^0 = β0 and β^1 = β1 .


We use hypothesis tests to answer these questions.

The Nature of Chance (or: Confidence Intervals and Standard Errors)

First, we clarify that chance has an influence on every data sample. If you have collected certain data in the first sample, these data points will look completely different in a second sample. Accordingly, chance also plays a role in the observed parameters of the sample, e.g. calculated regression coefficients or mean values. In the case of a regression analysis, different values of e.g. β^1 (from the population of all possible β1 ) are realized depending on the sample. Under the central limit theorem, a distribution of the calculated β^1 values would look something like this:

Distribution of regression coefficients
β1 corresponds to the true parameter from the population, while β^1 is calculated from a sample.
All possible values of the regression coefficient β1 , i.e. 100%, lie below the curve in the figure above. Confidence intervals are used to express how "normal" or "expected" a given parameter is. In our case, this parameter is a regression coefficient such as β1 . We now say: In the so-called 95% confidence interval, 95% of all "normal" β^1 -values lie around a certain basic value of β1 . We now simply call this certain basic value of the regression coefficient Z. So if we repeatedly create samples and calculate a linear regression with the respective data points, in 95% of cases the regression coefficients β^1 lie around Z (the specific regression coefficient β1 ).

Hypothesentest mit Signifikanzniveau
The dispersion of β^1 is also called the standard error (SE) - this will become important later. In general, the standard error refers to the standard deviation of a parameter. The standard deviation itself refers to the spread of the actual data points.
Mathematically, we write confidence intervals for the parameter β^1 like this:

[ Z - tα2 SE(β^1) Z Z + tα2 SE(β^1) ]
  • SE(β^1) :   is the standard error of β1 .
    It is defined as SE(β^1) = σ ε σ x n with n as the sample size, σx the standard deviation of the regressor X and σε the standard deviation of the residuals from the regression.
  • t :   is a critical value of the T-distribution - more on this here.
    Here, t is to be interpreted as a factor that scales the standard error. The larger t, the larger the confidence interval with a constant standard error SE.

The lower bound of the symmetrical confidence interval is β^1min = Z - tα2 SE(β^1) The upper bound corresponds to: β^1max = Z + tα2 SE(β^1)

significance levels for regression coefficients
Example with 95% confidence interval, t = 1.96, α = 5%
Back to the actual problem: To express how “normal” or “expected” a calculated β^1 is, we look at the position of the calculated β^1 - is it within or outside the confidence interval around Z? If β^1 is outside the CI, then we speak of a statistically significant result. In the context of hypothesis testing, this Z that we have been talking about is also called the null hypothesis.

Hypotheses

- Lloyd Christmas: "So you're telling me there's a chance?"

Dumb and Dumber

Hypothesis testing starts with the hypotheses. There are several types of hypothesis tests, but we will initially limit ourselves to the so-called two-tailed hypothesis test. A distinction is made between a null hypothesis H0 and an alternative hypothesis H1 . In the null hypothesis, we assume the opposite of the relationship that we want to prove.
Important: The aim of a statistical regression analysis is therefore to reject the null hypothesis. We can do this if a measured β^1 is statistically significant - i.e. lies outside a confidence interval. In order to show the influence of a variable X on Y in a regression analysis, we must prove that the effect of X on Y is real and not the result of mere chance.

two-tailed significance test
β1 is set to H0 =0 because it should be shown that a variable has an effect. This is only the case if the assumption β1 =0 is rejected.
If we can reject the null hypothesis, it means that there is very likely* a real effect of age on income. The aim of a regression analysis is to show this. Important: Null hypotheses are never confirmed or accepted, they are either rejected or not! If we cannot reject the null hypothesis, this means that the effect we have found in our data (i.e. β^1 ) is due to mere chance.
So we cannot say that a β1 from H0 is "true" or "correct", but we can say that a calculated β^1 deviates so extremely from β1 of H0 that the null hypothesis seems very unlikely. As a result, we reject our null hypothesis H0 . There then appears to be a real effect of X on Y in the form of β1 that is more than mere chance. This is the case if β^1 is outside the confidence interval around Z, where Z corresponds to the value from: H0 :

β^1 < β^1min = β1H0 - tα2 SE(β^1) or β^1 > β^1max = β1H0 + tα2 SE(β^1)
*Why only very likely?
Now ATTENTION: Because we cannot say with absolute certainty that our null hypothesis H0 is "wrong", as extreme β^1 -values can also arise by chance!! We even accept a certain probability of being wrong, i.e. wrongly rejecting H0 (type I errors).

Significance Level Alpha

- Vegeta: "It's over 9000!"

Dragon Ball Z

Extreme values of β^1 which lie outside a confidence interval, can occur by chance. However, they are so rare that we accept a self-defined probability of being wrong and falsely rejecting H0 . This is also called alpha error or type I error. The conclusion is drawn that there is a statistically significant correlation or effect, although this is not actually the case.
The golden question now: When do we (wrongly) reject H0 and what probability of error do we accept for this?

Short answer: We discard H0 with a 5% probability of error if our calculated parameter β^1 lies outside a 95% confidence interval.

Long answer: It depends...
Up to now, we have always talked about the 95% confidence interval, for which we accept a 5% probability of error. However, we can determine ourselves the error probability alpha with which we want to discard H0 . Alpha defines the probability with which we falsely discard H0 . This is also called "type I error".
If we want to accept a lower probability of error, e.g. alpha = 1%, then this means that we will only incorrectly reject H0 in 1% of all cases - sounds good, right? However, this has a crucial consequence: it becomes "more difficult" to reject H0 , because the confidence interval becomes larger - 99%! This becomes apparent when looking at the equation of the confidence interval: the value t changes. The lower alpha, the more difficult it is to reject H0 . This is because even more extreme β^1 -values are then needed to say that an effect of X on Y is real and not just a coincidence.

The Test Statistic T and Critical Values t

So far, we have only ever considered the critical value t in the formula for confidence intervals as a factor before the standard error. There, t increases or decreases the bounds of the confidence interval. Since t implicitly determines the limits of the symmetrical confidence interval, we can also consider t individually in order to draw conclusions about the confidence interval. Now, while confidence intervals give us an idea of where the true parameter lies, we can also use the so-called test statistic T to test a specific hypothesis H0 (e.g. whether β1 =0 ). T is an aggregate value from our data that forms the basis for our decision to assess how closely our sample data matches H0 . T is defined in the context of linear regression as: T = β^1 - β1H0 SE(β^1)

  • β^1 :   is the calculated regression coefficient and represents the observed slope of the linear regression line from the present sample.
  • β1H0 :   is the assumption from the null hypothesis H0 and = 0 most of the time.
  • SE(β^1) :   is the standard error of β1 .
    It is defined as SE(β^1) = σ ε σ x n , with n as sample size, σx as standard deviation of the regressor X and σε as standard deviation of the residuals from the regression.
The test statistic T is a value that indicates how far a calculated parameter is from a hypothetical value in standard error units. A calculated parameter is, for example, the regression coefficient β^1 and the hypothetical base value is in the null hypothesis. H0 : β1 =0 . We thus check whether the parameter β^1 deviates significantly from this hypothetical value.

Since T depends on the parameters of the sample, T is also subject to random fluctuations and follows the so-called t-distribution: Distribution of t-test

What are "critical values"? Critical values t are certain points in the t-distribution that define the limits for significance levels. For example, the critical values t correspond to the 2.5% quantile and the 97.5% quantile for a 5% significance level and a 95% confidence interval. Critical values t are therefore quantiles of the t-distribution and define the range in which the test statistic T will fall with a probability of (1-alpha), here corresponding to (1 - alpha) = 95%. Common significance levels, their associated confidence intervals and critical values look like this:

Significance level Critical value t
Confidence interval
5% |1.96| 95%
1% |2.576| 99%
0.01% |3.291| 99.99%


Calculating the test statistic T and comparing it with the critical values t basically checks where T lies within the t-distribution. If T lies outside the range defined by the critical values t, this indicates an unlikely result under the assumption of H0 . As a result, we reject the null hypothesis (with an error probability ≤ alpha, because extreme test statistics of T can also occur by chance).
The limits of the confidence interval are directly related to the test statistic T. They correspond to the range for which the test statistic T is within the critical values t of the t-distribution. Visually this is the blue striped area under the bell curve in the plot above. Therefore, the critical values from the table above can be inserted into the limits of the confidence interval to obtain the corresponding (1-alpha) confidence interval.

P-values and the Significance Level Alpha (or: Where is the star?)

In addition to confidence intervals and test statistics, there is a third way of testing null hypotheses and statistical significance, namely using p-values. A p-value is the probability for a test statistic T' that is even more extreme than the calculated test statistic T that is available from the actual sample. The p-value is therefore the error probability value at which H0 is rejected:

p-value and significance level alpha

In scientific regression tables, in addition to regression coefficients, you also occasionally see asterisks: * , * * , * * * . These stand for the p-values:

  • * means p ≤ 0.05 or p ≤ 5%
  • ** means p ≤ 0.01 or p ≤ 1%
  • *** means p ≤ 0.0001 or p ≤ 0.01%
If asterisks appear in the table, this means that the corresponding variable in the model is statistically significant. The null hypothesis H0 was therefore rejected and the effect of the variable is real and not due to chance (but taking into account the corresponding error probability alpha).

Conclusion

  • Chance plays a decisive role in the characteristics of a sample.
  • Accordingly, chance also influences calculated parameters such as regression coefficients β^0 or β^1 , which are based on the sample.
  • The null hypothesis H0 defines the assumption that we want to reject and states that a measured effect is random.
  • The alternative hypothesis H1 is the opposite of H0 and shows that there is a real effect in the data.
  • H0 can be rejected with 5% error probability if T>|1.96| or β^1 is outside the 95% confidence interval or the p-value is less than 5% or 0.05. The parameter is then statistically significant.
  • Alpha is the error probability that we accept to falsely reject H0 . It is the probability for type I errors.
  • The standard error SE is the dispersion of a parameter, e.g. of β1 the slope of the regression line.
  • A confidence interval specifies a range in which the true values of a parameter lie with a certain probability (e.g. 95%).
  • The test statistic T is a value that results from the sample data and is therefore also influenced by chance. This is shown by the t-distribution. T is the basis for our decision on H0 , whether to reject H0 or not.
  • The critical value t is a quantile of the distribution of the test statistic T. A calculated test statistic T that is more extreme than a critical value t only occurs with a probability ≤ alpha. As a result H0 is rejected with the error probability alpha.
  • The p-value is the probability of a T that is even more extreme than the measured T from a given sample.

Ready to use the linear regression calculator?

Use Regression Online and focus on what really matters: your area of expertise
Interactive
Results immediately
Plot included
Established tool