Data Science and AI Quest: statistics

Showing posts with label statistics. Show all posts

Monday, April 26, 2021

The Various Categories of Machine Learning Algorithms with their Interpretational learnings

Machine Learning has the three different ﬂavours depending on the algorithm and their objectives they serve . One can divide machine learning algorithms into three main groups based on the purpose :

01) Supervised Learning

02) Unsupervised Learning

03) Re-inforcement Learning

Now in this article we will learn more on each of the learning techniques in greater detail .

==================================

01) Supervised Learning

==================================

* Supervised Learning occurs when an algorithm learns from a given form of example data and associated target responses that consist of numeric values or string labels such as classes or tags , which can help in later prediction of correct responses when one is encountered with newer examples

* The supervised learning approach is similar to human learning under the guidance and mentorship of a teacher . This guided teaching and learning of a student under the aegis of a teacher is the basis for Supervised Learning

* In this process , a teacher provides good examples for the student to memorize and understand and then the student derives general rules from the speciﬁc examples

* One can distinguish between regression problems whose target is a numeric value and along with that one can make use of such regression problems whose target is a qualitative variable which is an indicator of a class or a tag as in the case of a selection criteria

* More on Supervised Learning Algorithms with examples would be discussed in later articles .

==================================

02) Unsupervisd Learning

==================================

* Unsupervised Learning occurs when an algorithm learns from plain examples without any associated response in the target variable , leaving it to the algorithm to determine the data patterns on their own

* This type of algorithm tends to restructure the data into something else , such as new features that may represent a class or a new series of uncorrelated values

* What is Unsupervised Learning ? It is a type of learning which tends to restructure the data into some new set of features which may represent a new class or a series of uncorrelated values within a data set

* Unsupervised Learning algorithms are quite useful in providing humans with insights into the meaning of the data as there are patterns which need to be found out

* Unsupervised Learning is quite useful in providing humans with insights into the meaning of the data and new useful inputs to supervised machine learning algorithms

* As a new kind of learning , Unsupervised Learning resembles the methods that humans use to ﬁgure out that certain objects or events are of the same class or characteristic or not , by observing the degree of similarity of the given objects

* Some of the recommendation systems that one may have come across over several retail websites or applications are in the form of marketing automation which are based on the type of learning

* The marketing automation algorithm derives its suggestions from what one has done in the past

* The recommendations are based on an estimation of what group of customers that one resembles the most and then inferring one's likely preferences based on that group

==================================

02) Reinforcement Learning

==================================

* Reinforcement Learning occurs when one would present the algorithm with examples that lack any form of labels as in the case of unsupervised learning .

* However , one can provide an example with some positive and negative feedback according to the solution of the algorithm proposed

* Reinforcement Learning is connected to the applications for which the algorithm must make decisions ( so the product is mostly prescriptive and not just descriptive as in the case of unsupervised learning ) and on top of that the decisions bear some consequences .

* In the human world , Reinforcement learning is mostly a process of learning by the application of trial and error method to the process of learning

* In this type of learning , initial errors and aftermath errors help a reader to learn because this type of learning is associated with a penalty and reward system which gets added each time whenever the following factors like cost , loss of time , regret , pain and so on get associated with the results that come in the form of output for any particular model upon which the set of reinforcement learning algorithms are applied

* One of the most interesting examples on reinforcement learning occurs when computers learn to play video games by themselves and then scaling up the ladders of various levels within the game on their own just by learning on their own the mechanism and the procedure to get through each of the level .

* The application lets the algorithm know the outcome of what sort of action would result in what type of result .

* One can come across a typical examplle of the implementation of a Reinforcement Learning program developed by Google's Deep Mind porgram which plays old Atari's videogames in a solo mode at https://www.youtube.com/watch?v=VieYniJORnk

* From the video , one can notice that the program is initially clumsy and unskilled but it steadily improves with better continuous training until the program becomes a champion at performance of the task

Friday, April 23, 2021

Describing the use of Statistics in Machine Learning - A full detailed article on some of the most important concepts in Statistics

Describing the use of Statistics

* Its important to skim through some of the basic statistical concepts related to probability and statistics . Along with that , we will also try to understand how these concepts can help someone to describe the information used by machine learning algorithms

* Some of the main concepts that I shall also try to cover in my articles leading to a stronger foothold over the various sections of Statistics are the following topics -- sampling, statistical distributions, statistical descriptive measures.. etc which in one way or the other are based on the concepts of algebra and probability in some or the other ways as they are nothing but more elaborate manifestations of the concepts and theorems of mathematics .

* The zist of all the learning of these concepts is not only about how to describe an event by counting the number of occurrences , but its about describing an event without counting every time how many times a particular event occurs .

* If there are some imprecision in a recording instrument that one uses , or simply because some error in the the recording procedure of a machine occurs , rather an imprecision occurs in the instrument that one uses or simply because of any random nuisance which disturbs the process of recording a given measure during the process of recording the measure occurs ... then a simple measure such as weight , will differ every time one would get a scale which would be slightly oscillating around the true weights and minimal variation scale . Now , if someone wants to perform such a small incident in a city and want to measure the weight of all the people in the city , then it is probably an impossible experiment to be conducted on such a large scale as it would involve taking the weight-wise reading of all the people in the city which is something that is practically not possible , because first of all if someone wants to perform this experiment in one go , then one has to create a big big gigantic weighing scale to mount all the people of the city in its weighing pans , which is completely an impossible task , and probably the scale may break once all the people have been mounted to the pan or otherwise the worst thing that may happen is that once all the people's weights have been measured , the experiment could render itself insignificant as the experiment once conducted would make the use of the weighing machine useless and hence the cost associated with building of such a big machine for carrying out just one task would become meaningless .

* So , the purpose of experiment might get achieved , but the cost of the built-up of the instrument would run so high that a big dent in the overall GDP of the city would get created which might cripple the city's finance and budget . On the other hand , if we take the measurement of the entire city's weight recording each person's weight one by one , then the effort and time taken for the entire activity to be completed might take some weeks or months of time . Because of the high amount of time and effort that would get consumed while managing the entire ruckus won’t suit the idea for adaptability and taking up the idea. And even if all the weights of the people residing in the city is successfully measured, there are a lot of chances that anyhow some amount for error would definitely popup making the idea of the entire process not so fruitful and fault-proof

* Having partial information is a quite complex process which is not a completely negative condition because one can use such smaller matrices for the purpose of efficient and less cumbersome calculations. Also on top of that, it is said that one cannot get a sample of what one wants to describe and learn because the event's complexity might be quite high and may probably feature a great variety of features . Another example that some users could consider while taking a case of a sample or a large case of data , is a case of Twitter tweets . All the tweets may be considered as some sample of data over where the same data could be treated as some experimental potions and minerals which are processed using several word processors , sentiment analyzers , business enhancers , spam , abusive data and all depending upon the sample of data associated with each of the text within the short frame of data that one can provide within the text section .

* Therefore it is also a good practice in sampling to sample similar data which has associated characteristics and features which will present the sample data in the form of a grouped cohesive data which fit into a proper sampling criteria. And when sampling is done carefully, one can get an idea that one can obtain a better global view of the data from the constituent samples

* In Statistics , a population refers to all the events and objects that one wants to measure and is a part of the given criteria which gives in detail the account of metrices of the population . Using the concept of random sampling , which is picking the events or objects one needs to choose one's examples according to the criteria which would determine how the data is collected ,assembled and synthesised. This is then used for feeding into machine learning algorithms which apply their inherent functions for determination of patterns and behaviour.

* Along with such determination , a probabilistic model of input data is built which is used for prediction of similar patterns from any newly input data or datasets ,Application of this concept of data generation from population's subsamples and mapping the identified patterns to map new use cases is one of subsamples the chief objectives of machine learning on the back of supported algorithms

* "Random Sampling" -- It is not the only approach for any sort of sampling . One can also apply an approach of "stratified sampling" through which one can control some aspects of the random sample in order to avoid picking too many or too few events of a certain kind .After all , it is said that a random sample is a random sample , the manner it gets picked is irrespective of the manner in which all samples would criterion themselves for picking up a sample , and there is no absolute assurance of always replicating an exact distribution of a population .

* A distribution is a statistical formulation which describes how to observe any event or a measure by ideating the probability of witnessing a certain value . Distributions are described in Mathematical formula and can be graphically described using charts such as histograms or distribution plots . The information that one wants to put over the matrix has a distribution , and one may find that the distributions of different features are related to each other . A normal distribution naturally implies variation and when dealing with numeric values , it is very important to figure out a center of variation which is essentially a value which corresponds to the statistical mean which can be calculated by summing all the values and then dividing the sum by the total number of values considered .

* Mean - This is specifically a descriptive measure which tells the users the values to expect the most from within dataset . as it is a general fact that most of the times , one can observe that the mean of a dataset is that data which generally hovers around a given data group or the entire dataset . The Mean of a dataset is the best suited data for any symmetrical and bell-shaped distribution . In cases , when the value is above the mean of the entire dataset , the distribution is similarly shaped for the values that lie below the mean . The normal distribution or the Gaussian distribution is shaped around the mean which one can find only when one is dealing with legible data which is not much skewed in any direction from the equally shaped domes of the normal distribution curve . In the real world , in most of the datasets one can find many skewed distributions that have extreme values on one side of the distribution , which influences the value of mean so much

* Median - The Median is a measure that takes the value in the middle after one orders all the observations from smallest to the largest values within the dataset . Based on the value order, the median is a less approximate measure of central approximation of data .

* Variance - The significance of mean and median data descriptors is that they describe a value within a data description around which there is some form of variation . In general, the significance of the mean and median descriptors is variation. In general , the significance of the mean and median descriptors is that they describe a value within the distribution around which there is a variation and machine learning algorithms generally do not care about such a form of variation . Most people generally refer to the term , variation as "variance" . And since , variance is a squared number there is also a root equivalent which is termed as "Standard Deviation" . Machine Learning takes into account the concept of variance in every single variable (univariate distributions) and in all the features together (multivariate distribution) to determine how such a variation impacts the response obtained .

* Statistics is an important matter in machine learning because it conveys the idea that features have a distribution pattern . Distribution of data implies variation and variation means quantification of information ... which means that more amount of variance is present in the features , then the more amount of Information can be matched to the response .

* One can use statistics to assess the quality of the feature matrix and then leverage statistical measures in order to draw a rule from the types of information to their purposes that they cater to .

Wednesday, April 21, 2021

An article on - Conditioning Chance and Probability by Bayes Theorem

Conditioning Chance & Probability by Bayes Theorem

* Probability is one of the most key important factors that takes into effect the condition of time and space but there are other measures which go hand in hand with the measures that go into calculation of probability values and that is Conditional Probability which takes into effect the chance of occurrence of one particular event with effect to occurrence of some other events that may also affect the possibility and probability of the other event .

* When one would like to estimate the probability of any given event , one may believe the probability of some value to be applicable to some values which one may calculate upon a set of possible events or situations . This term is used to express a belief of "apriori probability" which means general probability of any given event .

* For example , in the condition of a throw of a coin ... if the coin thrown is a fair coin , then it could be said that the apriori probability of occurrence of a head is around 50 percent . This means that when someone would go for tossing a coin , he already knows what is the probability of occurrence of a positive ( in other words .. desired outcome ) otherwise occurrence of a negative outcome ( in other words .. undesired outcome ) .

* Therefore , no matter how many times one would toss a coin .. whenever faced with a new toss the probability of occurrence of a heads is still 50 percent and the probability of occurrence of a tail is still 50 percent .

* But consider a situation where if someone wishes to change the context , then the subject of apriori probability is not valid anymore .. because something subtle has happened and changed the outcome as we all know there are some prerequisites and conditions that must satisfy so that the general experiment could be carried out and come to fruitition. In such a case , one can express the belief as a form of posteriori probability which is the priori probability after something has happened that would tend to modify the count or outcome of the event .

* For instance , gender estimation for a person being either a male or a female is the same which is about 50 percent in almost all of the cases . But this general assumption that any population taken into account would be having the same demography is wrong as I happened to come across my referenced article that what generally happens in a demographic population is that generally the women are the ones who tend to live longer and exceed their counterpart males in most of the cases in all of human existence .. as they are mostly the ones who tend to live longer and exceed their counterpart males in most of the factors that contribute to the general well being , and as a result of which the population demographic tilt is more towards the female gender .

Hence , putting all these factors into account that contribute to the general estimate of any population , one should not ideally take gender as a main parameter for determination of population data because this factor is tilted in age-brackets and hence an overall idea for generalisation of this factor should not be considered .

* Again , taking this factor of gender into account , the posteriori probability is different from the expected apriori one which in this example can consider gender to be the parameter for estimation of population data and thus estimate somebody's probability of gender on the belief that there are 50 percent males and 50 percent females in a given population data .

* One can view cases of conditional probability in the given manner P(y(x)) which in mathematical sense can be read as probability of the event y given the probability of occurrence of event x takes place . For the great relevance Conditional Probability plays in the concepts and studies of machine learning , learning and understanding the syntax of representation , expression and comprehension of the given equation is of great paramount importance to any newbie or virtuoso in the field of maths , statistics and machine learning . Hence , again if someone comes across a notation for conditional probability in the form P(y(x)) which can be read as the probability of event Y happening given X has already happened .

* As mentioned earlier in the above paragraph , because of its dependence on possibility of occurrence on single or multiple prior conditions , the role of conditional probability is of paramount importance for machine learning which takes into effect statistical conditions of occurrence of any event . If the apriori probability can change because of circumstances, knowing the possible circumstances can give a big push in one's chances of correctly predicting any event by observing the underlying examples - exactly what machine learning generally intends to do .

* Generally , the possibility of finding a random person's gender as a male or female is around 50 percent . But , in case one would like to take into consideration the mortal aspects and age factor of any population , we have seen that the demographic tilt is more in favour of females . If under all such conditions , one would take into consideration the female population , and then dictate a machine learning algorithm to find out the gender of the considered person on the basis of their apriori conditions like length of hair , mortality rate etc , the ML algorithm would be able to very well determine the solicited answer

An article on - Bayes Theorem application and usage

Bayes Theorem application and usage

Instance and example of usage of Bayes Theorem in Maths and Statistics :

P(B|E) = P(E|B)*P(B) / P(E)

If one reads the formula , then one will come across the following terms within the instance which can be elaborated with the help of an instance in the following manner :

* P( B | E ) - The probability of a belief(B) given a set of evidence(E) is called over here as Posterior Probability . Here , this statement tries to convey the underlying first condition that would be evaluated for going forth over to the next condition for sequential execution . In the given case , the hypothesis that is presented to the reader is whether a person is a female and given the length of her hair is sufficiently long , the subject in concern must be a girl

* P( E | B ) - In this conditional form of probability expression , it is expressed that one could be a female given the condition that the subject has sufficiently long hair . In this case , the equation translates to a form of conditional probability .

* P ( B ) - Here , the case B stands for the general probaility of being a female with a priori probability of the belief . In the given case , the probability is around 50 percent which could be also translated to a likelihood of occurrence of around 0.5 likelihood

* P(E) - This is the case of calculating the general probability of having long hair . As per general belief , in a conditional probability equation this term should be also treated as a case of priori probability which means the value for its probability estimate is available well in advance and therefore , the value is pivotal for formulation of the posterior probability

If one would be able to solve the previous problem using the Bayes Formula , then all the constituent values would be put in the given equation which would fill in the given values of the equation .

The same type of analogy is also required for estimation of a certain disease among a certain set of population where one would very likely take to calculate the presence of any particular disease within a given population . For this one needs to undergo a certain type of test which would result in producing a viable or a positive result .

Generally , it is perceived that most of the medical tests are not completely accurate and the laboratory would tell for the presence of a certain malignancy within a test which would convey a condensed result about the condition of within a test which would convey a condensed result about the condition of illness of the concerned case .

For the case , when one would like to see the number of people showing a positive response from a test is as follows :

1) Case -1 : Who is ill and who gets the correct answer from the test .

This is normally used for the case of estimation of true positives which amounts to 99 percent of the 1 percent of the population who get the illness

2) Case-2 : Who is not ill and who gets the wrong diagnosis result from the test . This group consists of 1 percent of the 99 percent of the population who would get a positive response , even though the illness hasn't been completely discovered or ascertained in the given cases . Again , this is a multiplication of 99 percent and 1 percent ; this group would correspond to the discovery of false positive cases among the given sample . In simple words , this category of grouping takes into its ambit , those patients who are actually not ill (may be fit and fine ) , but due to some aberrations or mistakes in the report which might be under the case of mis-diagnosis of a patient that , the patient is discovered

as a ill person . Under such circumstances, untoward cases of administration of wrong medicines might happen , which rather than curing the person of the given illness might inflict aggravations over the person rendering him more vulnerable to hazards , catastrophies and probably untimely death

* So going through the given cases of estimation of correct cases of Classification for a certain disease or illness could help in proper medicine administration which could help in recovery of the patient owing to right Classification of the case ; and if not then the patient would be wrongly classified in a wrong category and probably wrong medicines could get administered to the patient seeking medical assistance for his illness .

( I hope , there is some understanding clarity in the cases where the role of Bayesian Probability estimations could be put to use . As mentioned , the usage of this algorithm takes place in a wide-manner for the case of proper treatment and classification of illnesses and patients ; classification of fraudulent cases or credit card / debt card utilisation , productivity of employees at a given organisation by the management after evaluation of certain metrices :P ...... I shall try to extend the use case and applications of this theorem in later blogs and articles )

Tuesday, April 20, 2021

Exploring the World of Probability Theory in ML .. derived article with own interpretations

Exploring the World of Probability Theory in ML

* What is Probability and how can it be used? Probability is the likelihood of an event which means that Probability can help someone to determine the possibility of something to happen or not using the mathematical (Gannita Gyaana) where one can establish the possibility or likelihood of occurrence of an event in terms with the total number of possible events that could likely occur .

* The probability of an event is measured in the range from 0 (no probability that an event occurs) to the value of 1 ( a certainty that an event occurs ) which in relative terms says about the extent of any value towards the any of the extremes from the left most to the right most values .

* The probability of picking a certain suit from a deck of Cards (generally referred to as "Taash" in many Asian countries) is one of the most classic example on explanation of probabilities.

* The deck of cards contains 52 cards (joker cards excluded) which can be divided into four suits as clubs and spades which are black , and diamonds and hearts which are red in colour .

* Therefore , if one wants to determine whether the probability of picking the card is an ace , then one must consider that there are four aces of different suits .The probability of such an event can be calculated as p = 4/52 which is again evaluated to 0.077.

* Probabilities are between the values of 0 and 1 ; no probability can exceed such boundaries as everything's possibility of occurrence lies between nothing to everything and probability of not occurrence of something is always zero and the probability of occurrence of everything is always equal to 1 .

* If someone tries to do a Probability Possibility prediction for a given case of fraud detection in which one would like to see and find out the number of times a bank transaction related fraud has occurred over a given set of bank accounts or how many times fraud happens while conducting a banking transaction or how many times people get a certain disease in a particular country . So , after associating all the events , one can estimate the probability of occurrence of associating all the events , one can estimate the probability of occurrence of such forthcoming event with regards to the frequency of occurrence , mode of occurrence , time of occurrence , as well as the likely accounts which could be affected by the fraud and the conditions which are likely to affect the accounts .

The calculation for the estimation would take into consideration of counting the number of times a particular event occured and dividing the total number of events that could possibly occur for a set of operations and calculations.

* One can count the number of times the fraud happens using recorded data ( which are mostly taken from databases ) and then one would divide that figure by the total number of generic events or observations available

* Therefore , one should divide the total number of frauds by the number of transactions within a year or one can count the total number of people who fell ill during the year with respect to the population of a certain area . The result of this is a number ranging from 0 to 1 which one can use as baseline probability for a certain event under certain type of circumstances

* Counting all the occurrences of an event is not always possible for which one needs to know about the concept of sampling. Sampling is an act which is based on certain probability of expectations , which one can observe as a small part of a larger set of events or objects , yet one may not be able to infer correct probabilities for an event , as well as exact measures such as quantitative measurements or qualitative classes related to a set of objects

* Example - If one wants to track the sales of cars in a certain country , then one doesn't need to track all the sales that occur in that particular geography ... rather using a sample comprising of all the sales from new car sellers around the country , one can determine the quantitative measures such as average price of a car sold or qualitative measures such as the car model which were sold most often

Thursday, April 8, 2021

Testing MapReduce Programs - An introductory Article on Testing of MapReduce Programs for Load and Performance

Testing MapReduce Programs

* Mapper programs running on a cluster are usually complicated to debug

* The best way of debugging MapReduce programs is via usage of print statements over log setions in MapReduce programs

* But in a large application where thousands of programs may be running at any single point of time , running the execution jobs and programs over tens or thousands of nodes is preferred to be done in multiple stages

* Therefore , the most preferred mode of execution of any program is :

(a) To run the programs using small sample datasets ; this would ensure that what so ever program is running , the program is running in an efficient and robust manner . And for checking of the same , the tried and tested formula for applying the working proficiency of the program over a small dataset is done followed by applying the same over a bigger application / bigger dataset / more

number of testcases etc

(b) Expanding the unit tests to cover larger number of datasets and to run the programs over a bigger/larger cluster of network applications . As mentioned in the earlier point , the scope of execution of the testcases is enhanced by application of unit testcases over larger datasets in order to check the robustness

and performance of the system application software

(c) Ensuring that the Mapper and the Reducer functions can handle the inputs more efficiently . This means that the set of Mapper and Reducer functions created to work over the split input data would work in cohesion or in tandem with MapReduce programs desired working condition to produce serial output in desired format (text,key-value pair) etc

* Running the system application against a full dataset would likely expose more issues , which might lead to rise of undue errors , undiscovered issues , unpredictable results , undue fitting criterias and so on type of issues over the software because of which it might not be that conducive for the system analyst to put the entire full dataset to test over the software . But after all necessary

unit testcases have been checked and working criteria and pre-requisites have been fulfilled , one may put the program to be tested over bigger datasets ; by and by making the work of the MapReduce job easier to run , thereby also enhancing

speed and performance issues gradually

* it may be desirable to split the logic into many simpler Mapper and Reducer functions , chaining the programs into single Mapper functions using a facility like ChainMapper library class built within Hadoop (I am yet to explore all the scopes , ChainMapper library class built within Hadoop (I am yet to explore all the scopes ,

specifications and functionalities of the ChainMapper Library which I shall try to cover in a forthcoming session ) . This class can run a chain of Mappers followed by a Reducer function , followed again by a chain of Mapper functions within a single MapReduce Job

* More over testing of MapReduce Jobs / Execution of MapReduce Jobs / Analysis and Debugging of MapReduce Jobs would be done in later articles under appropriate headers and titles .

Thursday, April 1, 2021

Generic Summary Command for DataFrames / Matrices in R language

Generic Summary Command for Data Frames

* Below is a short guide to the results expected for the generic software commands in R

* Descriptive Summary Commands that can be applied to Dataframes are :

00) mat01

[,1] [,2] [,3] [,4]

[1,] 1 2 3 4

[2,] 5 6 7 8

[3,] 9 10 11 12

[4,] 13 14 15 16

01) max(mat01)

[1] 16

- The largest value in the entire dataframe

02) min(mat01)

[1] 1

- The smallest value in the entire dataframe

03) sum(mat01)

[1] 136

- The sum of all the values in the entire dataframe

04) fivenum(mat01)

[1] 1.0 4.5 8.5 12.5 16.0

- The summary values for the entire dataframe can be found out by using the "fivenum" command over a dataframe taken in as parameter

05) length(mat01)

[1] 16

- The length of all the columns within a dataframe can be found by using the length command over a dataframe

06) summary(mat01)

V1 V2 V3 V4

Min. : 1 Min. : 2 Min. : 3 Min. : 4

1st Qu.: 4 1st Qu.: 5 1st Qu.: 6 1st Qu.: 7

Median : 7 Median : 8 Median : 9 Median :10

Mean : 7 Mean : 8 Mean : 9 Mean :10

3rd Qu.: 10 3rd Qu.:11 3rd Qu.:12 3rd Qu.:13

Max. : 13 Max. :14 Max. :15 Max. :16

- It provides the summary for each of the columns present within a dataframe

* The list of all the summary / descriptive summary commands that work on a dataframe are listed and short

* One can always extract a single vector from a dataframe and perform a summary upon the data

* In general , it is better to use more specialised commands when dealing with the rows and columns of a dataframe