Intro to Hypothesis Testing

In two minutes

Praveen Subramanian
2 min readOct 26, 2021

Hypothesis testing is a scientific way of using data to answer questions, a question from a world constructed using a set of assumptions. We collect data to disprove the assumptions, using fundamental ideas from statistics.

Here’re the steps involved:

0. Create hypotheses: a null(denoted as H₀) and an alternate(H₁/Hₐ) hypothesis.

  • The null hypothesis(H₀) states the status quo is true(i.e. there is no deviation from the assumptions).
  • The alternate hypothesis(H₁) states the opposite of the null hypothesis(i.e. there is a deviation from the assumptions)

The goal of hypothesis testing is to prove that the evidence at hand(pronounce sample) does not support our assumptions(pronounce null hypothesis)

  1. Collect a sample: The data is ideally collected using simple random sampling, making sure every data point has equal chances of being selected. This is very important.
  2. Choose a threshold: A threshold at which we are comfortable saying that the evidence at hand cannot have come from the world constructed using the null hypothesis. The threshold is known as the significance level. A critical value is calculated based on the significance level.
  3. Calculate the p-value: A p-value is calculated using a test statistic obtained from the evidence at hand. Read more about the p-value here.
  4. Conclude from the result: If the p-value is smaller than the critical value, we conclude that the assumptions(and the null hypothesis) could not have produced an evidence as or more extreme than what we saw(as the sample).

p-value is simply the probability that the evidence at hand can be observed in a world constructed using a set of assumptions

A reminder that we do not know the true descriptive statistics(mean and standard deviation, for example) of the population, we can only estimate them using the sample we collected. Once we are done with the above steps, there are four possible outcomes:

We either conclude correctly, or we make an error. There is always a possibility for making an error because we are estimating the reality based on evidence.

Here’s a brief explanation of what the significance level(also known α), Type I, and Type II errors are.

Hypothesis testing is one of the fundamental ideas of data analysis for decision-making and uses probability as its cornerstone.

--

--

Praveen Subramanian

likes coffee, books, deep conversations, sunny days, starry nights, and sometimes, solitude. writing about self-improvement and data analysis