Introduction to Hypothesis Testing
In
Statistics with problems and examples
* This article will try to breakdown the
concept of Hypothesis testing into smaller chunks, and as someone goes through
the concept of Hypothesis Testing as covered in the article, one would be able
to have a very good idea and hold over the concept of Hypothesis testing and
how and when these testing types could be put to use .
* The best part of learning this from this
articulated version of Hypothesis Testing is that one can get to have a good
understanding of the concept of Hypothesis Testing is by going through the
article segment by segment as one may get to notice each of the discussed
concepts .
* One of the main reasons why many of the
students consider the topic of Hypothesis Testing to be difficult is that many
of the students consider the case studies of Hypothesis Testing to be very much
different from the various other concepts discussed in within the topic of
Statistics as the testing methods of the various kinds of tests within
Hypothesis Testing is much different from one another and many a students face
a lot of hard time in understanding the concepts and also differentiating each
of the concepts from one another on a wide scale where there would be cases
where there are smaller sample sizes and there are also going to be samples of
higher sample sizes and there would be cases discussed with usage of different
to different sets of parameters for each of the underlying test types
constituting the concept of Hypothesis Testing .
* And in each of the various distributions,
one is going to use the case of different types of distributions and different
techniques for arriving at a result which puts to use the various methods used
for finding the results of each of the hypothesis tests .
* So we will begin to dig deeper into the
concept of Hypothesis Testing by starting with the concept of Hypothesis
Testing .
One can state that Hypothesis Testing is a
premise or claim that we want to test so basically it is not a type of test
where one would go to some laboratory to do some tests, rather one may need to
formulate some statistical sample surveys from which the task of the
statistician or the sample analyst would be to collect the associated results
out of the data and then collect meaningful results from
the sample by testing the hypothesis.
* Next let us get to know what is Null Hypothesis.
But first let’s understand what the Null value within the hypothesis means. Null
stands for something which is zero or empty which is a widely used jargon in
the world of computer science and programming language literature .
* So, when someone is talking about the
Null Hypothesis then someone is talking about the Default hypothesis which
talks about something which is already established . The Null Hypothesis is
generally denoted by the letter capital H with a small zero as the sub-script .
H(0) is the most widely accepted value for a parameter which is accepted by
almost all the statisticians and analysts .
* And whenever someone wants to challenge
the Null and wants to read and analyse the What
Else portion of the Hypothesis then it could be said that one is interested
in the Alternate or Alternative form of Hypothesis. This Alternative Hypothesis
can be represented by the symbol (H1) or (Ha) where the sub-script for a means
that it is an alternative hypothesis form. In some statistics books, the
alternative hypothesis is also called as the Research Hypothesis and it involves
the claim to be tested .
* Lets take the principle of Gravity into consideration,
Many centuries back, Newton discovered the essence of Gravity when an apple
suddenly comes and falls over his head and did some research and gave the
postulates supporting his hypothesis upon Gravity But later Einstein comes into
the picture and gives another set of hypothesis which is an alternative from
the views that Einstein
Provided and thus they formed as the
postulates for an Alternative View of Hypothesis formulated by Einstein. And
his view of gravity was much different and complicated for generalisation as
what had been provided by Newton and thus his view of alternative hypothesis
was a lot more different than that of Newton and thus the view was eventually
accepted and gained stature in the Science journals of that time .
========================================================
* Ok now lets take a question to get a
better perspective into the idea of Hypothesis Testing with an example .
It is believed that a candy machine makes
chocolate bars that are on average 5g in weight. A worker claims that the
machine after maintenance no longer produces 5g bars. Formulate the expression
for H0 that is Null Hypothesis and H1 which stands for alternative Hypothesis.
Ho =
Null Hypothesis
H1 =
Alternative Hypothesis
Ho : mean weight of each chocolate bar is 5 grams
which is stated as the hypothesis question
H1 : mean weight of each of the chocolate bars is
not equal to 5 grams as produced by the chocolate factory
So , one can sense from both the above
given statements that the null hypothesis and the alternative hypothesis are
mathematically opposite to each other .In all of the hypothesis tests that
anyone can perform , the null hypothesis and the alternative hypothesis are
always mathematically opposite to each other .
* So now we are interested in testing the
outcomes of the test as have been provided in the null hypothesis and the
alternative hypothesis when we need to consider that the null hypothesis must
be true .
========================================================
* So what are the possible outcomes of the
test for Null Hypothesis that one can do over the hypothesis :
1) Reject the Null Hypothesis
2) Accept the Null Hypothesis
* If we reject the null hypothesis then we
mean that whatever has been provided against the data is held to be be false
which means that mean of all the weights of all the chocolate bars in the
sample is not equals to 5 grams and hence the hypothesis is rejected
* If we fail to reject the null hypothesis (the
statement - fail to reject the null hypothesis doesn't absolutely mean that we
are accepting the null hypothesis ) then one would mean that whatever statement
has been provided in the Alternative Hypothesis is false and as such the
weights of all the chocolate bars in the given sample is not equal to 5 gram
* The possible outcomes as pointed within
the hypothesis statements is just like
the opinion formulated in the court of law and then one can either reject the hypothesis
( H0 ) or else one can fail to reject the null hypothesis ( H1 ) .
* So now after conducting the test and then
finding out the possible outcomes of the test , one would try to ascertain the
Test Statistic in the following manner .
* Test Statistic :
The test statistic is calculated from
sample data and can be used to decide whether to reject the null hypothesis or
fail to reject the null hypothesis. As an example in the case of the Candy-Bar
factory , may be one may start sampling 50 chocolate bars in the factory and
from the factory we would be doing a statistical descriptive analysis of the
data and get average value of the amount of chocolate bars present within the
sample and get the value of the test statistic for the data .
Then one can determine Statistically, the
significant value for the data which means how to arrive at a decision whether
to reject the null hypothesis or fail to reject the null hypothesis.
Suppose say one guy draws a sample on
Monday consisting of 50 bars and finds an average of 5.12 grams which is
technically not equivalent to 5 grams ,
similarly another guy draws a sample of 50 bars which is again not
equivalent to exactly 5 grams but is roughly around 5.72 grams and similarly on
Friday the average count is noticed to be 6.53 grams then one can deduce from
the above three calculations that the values are so much statistically
different from each other and as such values are very much different from each
other and also very much distant from the null hypothesis set at average price
of 5 grams per candy bar .
Therefore, comparing the obtained prices as
given for Monday is 5.12 which is pretty much closer to the value of the
average weight of 5 gms , then the next average weight is that of 5.72 grams
which is a bit distant from the accepted null hypothesis accepted value of 5
grams and then the third reading taken for the 50 bars is found out to be 6.53
grams which is very different and distant from the hypothetical mean value of 5
grams . Therefore , one can come to a general conclusion about the data that it
is okay to reject the null hypothesis as the values are nowhere very near to
the hypothesis accepted mean value of 5 grams per chocolate bar .
So, in statistical terms by looking at the
hypothesis value and the actual values, an analyst should be readily able to
make a concrete decision when to reject the null hypothesis and when not to
reject the null hypothesis.
This is what in general the purpose of a
hypothesis test is that is a hypothesis test needs to collect the data ,
generalise the data and obtain a test statistic which would enable to make a
decision when to reject the null hypothesis and when not to reject the null
hypothesis by having a proper look at the data obtained and then ascertain the
case where the value obtained is too high and when the value obtained is too
low and when to accept the condition for null hypothesis and when to reject the
hypothesis which means that one needs to concretely decide the boundaries for
the null hypothesis and conditions where to reject or not reject from the
statistically significant data available .
========================================================
Level of Confidence
Level of confidence is also alternatively
referred to as level of significance in many terms where it is graphically
determined where to reject the null hypothesis and where to not reject the null
hypothesis. Thus it enables one to know how confident someone is in their
decision and what level of confidence expressed in percentage values.
For example if the level of confidence or
value is 99% for rejection of a null hypothesis then, everyone would accept the
decision for rejection of the hypothesis but in case if the level of confidence
is around 50% then it means that it is not a right decision to reject the null
hypothesis under the given circumstance as the statistically obtained value for
level of confidence is very less .
========================================================
Level of Significance
Basically this is called as alpha which is
numerically expressed as
( 1
- C) that is : level of significance is equals to 1 minus the level of
confidence for some sample of data .
Mathematically , if the level of confidence
is found out to be as 95% with the level of significance (alpha) is found out
to be as alpha = 1 - 0.95 , then alpha = 0.05
========================================================
From both the terms "level of
confidence" and "level of significance" one can make an
appropriate decision whether to reject or not to reject the formulated hypothesis
The analogy that needs to be drawn from
this test of hypothesis that could be also thought to be analogous to another
is when someone is accused of a crime, the first assumption to be made in
favour of the convict is that the convict is innocent and it is upto the
lawyers and the evidence that the convict is guilty and if the lawyers and the
evidences don’t prove the alternate assumption that the convict
is guilty then automatically the assumption
that the convict is innocent would be held true .
Same is the case of our example. We assumed
that the mean / average weight of one chocolate bar is around 5 grams but also
formulated an alternative assumption that the average / mean weights of the
chocolates are not the same but different from each other . To support / test
our assumption we took the case of testing / experimentation and found out
contradictory results for the data
values obtained and after the results were
analysed we came to a conclusion whether to reject the null hypothesis or not
to reject the null hypothesis .