Data Science and AI Quest: Introduction to Hypothesis Testing In Statistics with sample problems and explanations

Introduction to Hypothesis Testing

In Statistics with problems and examples

* This article will try to breakdown the concept of Hypothesis testing into smaller chunks, and as someone goes through the concept of Hypothesis Testing as covered in the article, one would be able to have a very good idea and hold over the concept of Hypothesis testing and how and when these testing types could be put to use .

* The best part of learning this from this articulated version of Hypothesis Testing is that one can get to have a good understanding of the concept of Hypothesis Testing is by going through the article segment by segment as one may get to notice each of the discussed concepts .

* One of the main reasons why many of the students consider the topic of Hypothesis Testing to be difficult is that many of the students consider the case studies of Hypothesis Testing to be very much different from the various other concepts discussed in within the topic of Statistics as the testing methods of the various kinds of tests within Hypothesis Testing is much different from one another and many a students face a lot of hard time in understanding the concepts and also differentiating each of the concepts from one another on a wide scale where there would be cases where there are smaller sample sizes and there are also going to be samples of higher sample sizes and there would be cases discussed with usage of different to different sets of parameters for each of the underlying test types constituting the concept of Hypothesis Testing .

* And in each of the various distributions, one is going to use the case of different types of distributions and different techniques for arriving at a result which puts to use the various methods used for finding the results of each of the hypothesis tests .

* So we will begin to dig deeper into the concept of Hypothesis Testing by starting with the concept of Hypothesis Testing .

One can state that Hypothesis Testing is a premise or claim that we want to test so basically it is not a type of test where one would go to some laboratory to do some tests, rather one may need to formulate some statistical sample surveys from which the task of the statistician or the sample analyst would be to collect the associated results out of the data and then collect meaningful results from

the sample by testing the hypothesis.

* Next let us get to know what is Null Hypothesis. But first let’s understand what the Null value within the hypothesis means. Null stands for something which is zero or empty which is a widely used jargon in the world of computer science and programming language literature .

* So, when someone is talking about the Null Hypothesis then someone is talking about the Default hypothesis which talks about something which is already established . The Null Hypothesis is generally denoted by the letter capital H with a small zero as the sub-script . H(0) is the most widely accepted value for a parameter which is accepted by almost all the statisticians and analysts .

* And whenever someone wants to challenge the Null and wants to read and analyse the What Else portion of the Hypothesis then it could be said that one is interested in the Alternate or Alternative form of Hypothesis. This Alternative Hypothesis can be represented by the symbol (H1) or (Ha) where the sub-script for a means that it is an alternative hypothesis form. In some statistics books, the alternative hypothesis is also called as the Research Hypothesis and it involves the claim to be tested .

* Lets take the principle of Gravity into consideration, Many centuries back, Newton discovered the essence of Gravity when an apple suddenly comes and falls over his head and did some research and gave the postulates supporting his hypothesis upon Gravity But later Einstein comes into the picture and gives another set of hypothesis which is an alternative from the views that Einstein

Provided and thus they formed as the postulates for an Alternative View of Hypothesis formulated by Einstein. And his view of gravity was much different and complicated for generalisation as what had been provided by Newton and thus his view of alternative hypothesis was a lot more different than that of Newton and thus the view was eventually accepted and gained stature in the Science journals of that time .

========================================================

* Ok now lets take a question to get a better perspective into the idea of Hypothesis Testing with an example .

It is believed that a candy machine makes chocolate bars that are on average 5g in weight. A worker claims that the machine after maintenance no longer produces 5g bars. Formulate the expression for H0 that is Null Hypothesis and H1 which stands for alternative Hypothesis.

Ho = Null Hypothesis

H1 = Alternative Hypothesis

Ho : mean weight of each chocolate bar is 5 grams which is stated as the hypothesis question

H1 : mean weight of each of the chocolate bars is not equal to 5 grams as produced by the chocolate factory

So , one can sense from both the above given statements that the null hypothesis and the alternative hypothesis are mathematically opposite to each other .In all of the hypothesis tests that anyone can perform , the null hypothesis and the alternative hypothesis are always mathematically opposite to each other .

* So now we are interested in testing the outcomes of the test as have been provided in the null hypothesis and the alternative hypothesis when we need to consider that the null hypothesis must be true .

========================================================

* So what are the possible outcomes of the test for Null Hypothesis that one can do over the hypothesis :

1) Reject the Null Hypothesis

2) Accept the Null Hypothesis

* If we reject the null hypothesis then we mean that whatever has been provided against the data is held to be be false which means that mean of all the weights of all the chocolate bars in the sample is not equals to 5 grams and hence the hypothesis is rejected

* If we fail to reject the null hypothesis (the statement - fail to reject the null hypothesis doesn't absolutely mean that we are accepting the null hypothesis ) then one would mean that whatever statement has been provided in the Alternative Hypothesis is false and as such the weights of all the chocolate bars in the given sample is not equal to 5 gram

* The possible outcomes as pointed within the hypothesis statements is just like the opinion formulated in the court of law and then one can either reject the hypothesis ( H0 ) or else one can fail to reject the null hypothesis ( H1 ) .

* So now after conducting the test and then finding out the possible outcomes of the test , one would try to ascertain the Test Statistic in the following manner .

* Test Statistic :

The test statistic is calculated from sample data and can be used to decide whether to reject the null hypothesis or fail to reject the null hypothesis. As an example in the case of the Candy-Bar factory , may be one may start sampling 50 chocolate bars in the factory and from the factory we would be doing a statistical descriptive analysis of the data and get average value of the amount of chocolate bars present within the sample and get the value of the test statistic for the data .

Then one can determine Statistically, the significant value for the data which means how to arrive at a decision whether to reject the null hypothesis or fail to reject the null hypothesis.

Suppose say one guy draws a sample on Monday consisting of 50 bars and finds an average of 5.12 grams which is technically not equivalent to 5 grams , similarly another guy draws a sample of 50 bars which is again not equivalent to exactly 5 grams but is roughly around 5.72 grams and similarly on Friday the average count is noticed to be 6.53 grams then one can deduce from the above three calculations that the values are so much statistically different from each other and as such values are very much different from each other and also very much distant from the null hypothesis set at average price of 5 grams per candy bar .

Therefore, comparing the obtained prices as given for Monday is 5.12 which is pretty much closer to the value of the average weight of 5 gms , then the next average weight is that of 5.72 grams which is a bit distant from the accepted null hypothesis accepted value of 5 grams and then the third reading taken for the 50 bars is found out to be 6.53 grams which is very different and distant from the hypothetical mean value of 5 grams . Therefore , one can come to a general conclusion about the data that it is okay to reject the null hypothesis as the values are nowhere very near to the hypothesis accepted mean value of 5 grams per chocolate bar .

So, in statistical terms by looking at the hypothesis value and the actual values, an analyst should be readily able to make a concrete decision when to reject the null hypothesis and when not to reject the null hypothesis.

This is what in general the purpose of a hypothesis test is that is a hypothesis test needs to collect the data , generalise the data and obtain a test statistic which would enable to make a decision when to reject the null hypothesis and when not to reject the null hypothesis by having a proper look at the data obtained and then ascertain the case where the value obtained is too high and when the value obtained is too low and when to accept the condition for null hypothesis and when to reject the hypothesis which means that one needs to concretely decide the boundaries for the null hypothesis and conditions where to reject or not reject from the statistically significant data available .

========================================================

Level of Confidence

Level of confidence is also alternatively referred to as level of significance in many terms where it is graphically determined where to reject the null hypothesis and where to not reject the null hypothesis. Thus it enables one to know how confident someone is in their decision and what level of confidence expressed in percentage values.

For example if the level of confidence or value is 99% for rejection of a null hypothesis then, everyone would accept the decision for rejection of the hypothesis but in case if the level of confidence is around 50% then it means that it is not a right decision to reject the null hypothesis under the given circumstance as the statistically obtained value for level of confidence is very less .

========================================================

Level of Significance

Basically this is called as alpha which is numerically expressed as

( 1 - C) that is : level of significance is equals to 1 minus the level of confidence for some sample of data .

Mathematically , if the level of confidence is found out to be as 95% with the level of significance (alpha) is found out to be as alpha = 1 - 0.95 , then alpha = 0.05

========================================================

From both the terms "level of confidence" and "level of significance" one can make an appropriate decision whether to reject or not to reject the formulated hypothesis

The analogy that needs to be drawn from this test of hypothesis that could be also thought to be analogous to another is when someone is accused of a crime, the first assumption to be made in favour of the convict is that the convict is innocent and it is upto the lawyers and the evidence that the convict is guilty and if the lawyers and the evidences don’t prove the alternate assumption that the convict

is guilty then automatically the assumption that the convict is innocent would be held true .

Same is the case of our example. We assumed that the mean / average weight of one chocolate bar is around 5 grams but also formulated an alternative assumption that the average / mean weights of the chocolates are not the same but different from each other . To support / test our assumption we took the case of testing / experimentation and found out contradictory results for the data

values obtained and after the results were analysed we came to a conclusion whether to reject the null hypothesis or not to reject the null hypothesis .

Data Science and AI Quest

Monday, March 1, 2021

Introduction to Hypothesis Testing In Statistics with sample problems and explanations

No comments:

Post a Comment

One Hot Encoding and Dummy Variables Generation upon a dataframe | Scenario - Perform One-Hot Encoding upon Un-Ordered Data in a sample dataframe and generate One-hot encoded feature variables | Conceptual Infographic Note