As a data scientist, we have different methods to treat the data. But how can we know the whether our methods are doing good or our model is just guessing.
We can use a serious of test to decide whether our model is good or not including z-test, t-test and f-test.
For a very large population, it is very hard to get the distribution of the entire population. Whatever distribution of population is, the sample mean will have a normal distribution.
We can use statistics to compare whether two things are the same.
For example, we want to know if a drug takes effect on patients.
We can compare treated patients with normal patients.
If we know the mean and standard deviation of the population, we can
determine whether a sample mean fails in the range of the sampling distribution of
That is Z-test.
The distribution of Z score looks like this.
TODO: Image of Z Distribution.
If the Z-score is larger than Z critical value, we can determine that the sample is different from the population.
normally we use three alpha levels:
- alpha = 0.05 (5%) <-> z critical value = 1.65
- alpha = 0.01 (1%) <-> z critical value = 2.33
- alpha = 0.001 (0.1%) <-> z critical value = 3.08
As we can see from the Z-distribution, we can have test the treatment in two ways.
We can only test the direction of the treatment group. Like is the treatment lower than the normal group or the treatment higher than normal group.
We can also test both directions of the treatment group. For example, whether the treatment is different from the normal group, either higher or lower.
Credit to Intro to Inferential Statistics