Data Science and AI Quest: procedure

Showing posts with label procedure. Show all posts

Wednesday, July 14, 2021

Simulation of Bubble Sort Operation over a List in Python : An Infographic Note for Better Understanding of Iterative Bubble Sort Process

Performing Bubble Sort Operation over a List in Python | Infographic Note on procedure for doing Bubble Sort with Code and Examples

Wednesday, April 21, 2021

An article on - Conditioning Chance and Probability by Bayes Theorem

Conditioning Chance & Probability by Bayes Theorem

* Probability is one of the most key important factors that takes into effect the condition of time and space but there are other measures which go hand in hand with the measures that go into calculation of probability values and that is Conditional Probability which takes into effect the chance of occurrence of one particular event with effect to occurrence of some other events that may also affect the possibility and probability of the other event .

* When one would like to estimate the probability of any given event , one may believe the probability of some value to be applicable to some values which one may calculate upon a set of possible events or situations . This term is used to express a belief of "apriori probability" which means general probability of any given event .

* For example , in the condition of a throw of a coin ... if the coin thrown is a fair coin , then it could be said that the apriori probability of occurrence of a head is around 50 percent . This means that when someone would go for tossing a coin , he already knows what is the probability of occurrence of a positive ( in other words .. desired outcome ) otherwise occurrence of a negative outcome ( in other words .. undesired outcome ) .

* Therefore , no matter how many times one would toss a coin .. whenever faced with a new toss the probability of occurrence of a heads is still 50 percent and the probability of occurrence of a tail is still 50 percent .

* But consider a situation where if someone wishes to change the context , then the subject of apriori probability is not valid anymore .. because something subtle has happened and changed the outcome as we all know there are some prerequisites and conditions that must satisfy so that the general experiment could be carried out and come to fruitition. In such a case , one can express the belief as a form of posteriori probability which is the priori probability after something has happened that would tend to modify the count or outcome of the event .

* For instance , gender estimation for a person being either a male or a female is the same which is about 50 percent in almost all of the cases . But this general assumption that any population taken into account would be having the same demography is wrong as I happened to come across my referenced article that what generally happens in a demographic population is that generally the women are the ones who tend to live longer and exceed their counterpart males in most of the cases in all of human existence .. as they are mostly the ones who tend to live longer and exceed their counterpart males in most of the factors that contribute to the general well being , and as a result of which the population demographic tilt is more towards the female gender .

Hence , putting all these factors into account that contribute to the general estimate of any population , one should not ideally take gender as a main parameter for determination of population data because this factor is tilted in age-brackets and hence an overall idea for generalisation of this factor should not be considered .

* Again , taking this factor of gender into account , the posteriori probability is different from the expected apriori one which in this example can consider gender to be the parameter for estimation of population data and thus estimate somebody's probability of gender on the belief that there are 50 percent males and 50 percent females in a given population data .

* One can view cases of conditional probability in the given manner P(y(x)) which in mathematical sense can be read as probability of the event y given the probability of occurrence of event x takes place . For the great relevance Conditional Probability plays in the concepts and studies of machine learning , learning and understanding the syntax of representation , expression and comprehension of the given equation is of great paramount importance to any newbie or virtuoso in the field of maths , statistics and machine learning . Hence , again if someone comes across a notation for conditional probability in the form P(y(x)) which can be read as the probability of event Y happening given X has already happened .

* As mentioned earlier in the above paragraph , because of its dependence on possibility of occurrence on single or multiple prior conditions , the role of conditional probability is of paramount importance for machine learning which takes into effect statistical conditions of occurrence of any event . If the apriori probability can change because of circumstances, knowing the possible circumstances can give a big push in one's chances of correctly predicting any event by observing the underlying examples - exactly what machine learning generally intends to do .

* Generally , the possibility of finding a random person's gender as a male or female is around 50 percent . But , in case one would like to take into consideration the mortal aspects and age factor of any population , we have seen that the demographic tilt is more in favour of females . If under all such conditions , one would take into consideration the female population , and then dictate a machine learning algorithm to find out the gender of the considered person on the basis of their apriori conditions like length of hair , mortality rate etc , the ML algorithm would be able to very well determine the solicited answer

An article on - Bayes Theorem application and usage

Bayes Theorem application and usage

Instance and example of usage of Bayes Theorem in Maths and Statistics :

P(B|E) = P(E|B)*P(B) / P(E)

If one reads the formula , then one will come across the following terms within the instance which can be elaborated with the help of an instance in the following manner :

* P( B | E ) - The probability of a belief(B) given a set of evidence(E) is called over here as Posterior Probability . Here , this statement tries to convey the underlying first condition that would be evaluated for going forth over to the next condition for sequential execution . In the given case , the hypothesis that is presented to the reader is whether a person is a female and given the length of her hair is sufficiently long , the subject in concern must be a girl

* P( E | B ) - In this conditional form of probability expression , it is expressed that one could be a female given the condition that the subject has sufficiently long hair . In this case , the equation translates to a form of conditional probability .

* P ( B ) - Here , the case B stands for the general probaility of being a female with a priori probability of the belief . In the given case , the probability is around 50 percent which could be also translated to a likelihood of occurrence of around 0.5 likelihood

* P(E) - This is the case of calculating the general probability of having long hair . As per general belief , in a conditional probability equation this term should be also treated as a case of priori probability which means the value for its probability estimate is available well in advance and therefore , the value is pivotal for formulation of the posterior probability

If one would be able to solve the previous problem using the Bayes Formula , then all the constituent values would be put in the given equation which would fill in the given values of the equation .

The same type of analogy is also required for estimation of a certain disease among a certain set of population where one would very likely take to calculate the presence of any particular disease within a given population . For this one needs to undergo a certain type of test which would result in producing a viable or a positive result .

Generally , it is perceived that most of the medical tests are not completely accurate and the laboratory would tell for the presence of a certain malignancy within a test which would convey a condensed result about the condition of within a test which would convey a condensed result about the condition of illness of the concerned case .

For the case , when one would like to see the number of people showing a positive response from a test is as follows :

1) Case -1 : Who is ill and who gets the correct answer from the test .

This is normally used for the case of estimation of true positives which amounts to 99 percent of the 1 percent of the population who get the illness

2) Case-2 : Who is not ill and who gets the wrong diagnosis result from the test . This group consists of 1 percent of the 99 percent of the population who would get a positive response , even though the illness hasn't been completely discovered or ascertained in the given cases . Again , this is a multiplication of 99 percent and 1 percent ; this group would correspond to the discovery of false positive cases among the given sample . In simple words , this category of grouping takes into its ambit , those patients who are actually not ill (may be fit and fine ) , but due to some aberrations or mistakes in the report which might be under the case of mis-diagnosis of a patient that , the patient is discovered

as a ill person . Under such circumstances, untoward cases of administration of wrong medicines might happen , which rather than curing the person of the given illness might inflict aggravations over the person rendering him more vulnerable to hazards , catastrophies and probably untimely death

* So going through the given cases of estimation of correct cases of Classification for a certain disease or illness could help in proper medicine administration which could help in recovery of the patient owing to right Classification of the case ; and if not then the patient would be wrongly classified in a wrong category and probably wrong medicines could get administered to the patient seeking medical assistance for his illness .

( I hope , there is some understanding clarity in the cases where the role of Bayesian Probability estimations could be put to use . As mentioned , the usage of this algorithm takes place in a wide-manner for the case of proper treatment and classification of illnesses and patients ; classification of fraudulent cases or credit card / debt card utilisation , productivity of employees at a given organisation by the management after evaluation of certain metrices :P ...... I shall try to extend the use case and applications of this theorem in later blogs and articles )

Friday, April 16, 2021

Static Methods in Python - Example of a Static Method in a Class in Python

Static Methods in Python

* One can use Static Methods while the class level but one may not involve the class or the constituting instances .

* Static Methods are used when one wants to process an element in relation to a class but does not need the class or its instance to perform any work .

* Example :

writing the environmental variables that go inside creation of a class , counting the number of instances of the class or changing an attribute in another class are the tasks related to a class .. such tasks can be handled by Static Methods

* Static Methods can be used to accept some values , process the values and then return the result .

* Also one could use Static Methods to accept some values , process the values and then return the result

* In this case , the involvement of neither the class nor the objects is of paramount importance .

* Static methods are written with a decorator @staticmethod above the methods

* Static Methods are called in the form of classname,method()

* In the following method , one is creating a static method "noObjects()" that counts the number of objects or instances created in MyClass . In MyClass , one can write a constructor that increments the class variable 'n' everytime an instance of the class is created . This incremented value of 'n' gets displayed by the "noObject()" method

Class Methods in Python - An example of creation of a Class Method in Python along with sample code

Class Methods in Python

* These are the set of Methods are act on class level . Class Methods are the methods which act on the class variables or static variables .

* The Class Methods can be written using @classmethod decorator above them .

* For example , 'cls.var' is the format to refer to the class variable which includes methods which can be generally called using the classname.method()

* The process which is commonly needed by all the instances of the class is handled by the class methods

* In the given example below, one can see the instance of the class which is handled by the class methods . The same program can be developed using an example class which can be used in the following manner .

* In the example , one can refer to a sample Bird Class for more insight into the description and elaboration of a Method Class . All the birds in nature have only 2 wings (as we mostly see , but there are abberations ofcourse ). Here , one can take an instance of a Bird Class . All the Birds in Nature have 2 wings , therefore one can take 'wings' as a class variable and a copy of this class variable is

available to all the instances of the Bird Class . In this Bird class , we will create a hypothetical method which applies to the functions that a Bird can operate upon and thus will make use of this method that is "fly" method (... to fly above the sky ... fly rhymes good with sky .. please accept from me a satirical High of Hi ... I am poetic too you know :P)

* So where was I .. ya I was at the forefront of creation of a class which would take into its ambit a Bird which would have some generic applicable class variables all applicable to the organism class of Birds like all Birds have a pair of wings .. that makes the count to 2 . And birds fly .. which is a method attributed to birds . These two class variables and class methods would be made use of to instantiate a generic sample class of a Bird

* So lets create a Bird with two wings and this flies too (I am sorry to know that when God created a Bird like Penguin , God forgot to add the the instance function "fly" to its class genus ... therefore I shall also keep this off the charts for penguins,kiwis and take only those birds which can fly ... up above the sky )

* Without further ado .. lets get to this class creation which would take into effect all the common features of all the instances of a Bird Class

==================================

# understanding class methods

class Bird:

# calling a class variable

wings = 2

# creating a class method @classmethod

def fly(cls,name):

print('{} flies with {} wings'.format(name,cls.wings))

#display information of 2 birds

Bird.fly('Garuda')

Bird.fly('Pigeon')

Bird.fly('Crow')

Bird.fly('HummingBird')

Bird.fly('Eagle')

==================================

Output

Garuda flies with 2 wings

Pigeon flies with 2 wings

Crow flies with 2 wings

HummingBird flies with 2 wings

Eagle flies with 2 wings

==================================

Monday, April 12, 2021

Working with Data in Machine Learning - An overview of methodology for working over Data/Datasets in Machine Learning using R and Python

Working with Data in Machine Learning

* Machine Learning is one of the most appealing subjects because it allows machines to learn from real world examples such as sales records , signals from sensors and textual datastreaming from internet and then determine what such data would imply with the help of that subject

* The most common outputs that can commence from machine learning algorithms is prediction of the future , prescriptions and prescriptive knowledge for design and build up of applications etc

* Some of the common outputs that can come from machine learning algorithms is the following : prediction of the future , prescription to act on some given knowledge or information , creation of new knowledge in terms of examples categorised by groups

* Some of the applications which are already in place and have become a reality thanks to leveraging the use of such knowledge are the following things :

01) Diagnosing hard to find diseases

02) Discovering criminal behaviour and detecting criminals in action

03) Recommending the right product to the right person

04) Filtering and classifying data from internet at an big scale

05) Driving a car autonomously etc

* The mathematical and statistical basis of machine learning makes outputting such useful results possible

* One can use Math and Statistics over such accumulated data which could enable algorithms to understand anything with a numerical basis

* In order to begin the process of working with Data , one should represent the solution to the problem in the form of a number .

* For example , if one wants to diagnose a disease using a machine learning algorithm , one can make the response to a particular learning problem a 1 or 0 (binary response) which would inform about the illness of the person . A value of 1 would indicate that the person is ill , with a value of 1 stating that the person is ill or not .

Alternatively , one can use a number between the values 0 and 1 to convey an

MapReduce Jobs Execution

MapReduce Jobs Execution

* A MapReduce job is specified by a Map program and the Reduce program along with the data sets associated with a MapReduce Job

* There is another master program that resides and runs endlessly over a NameNode which is called as the "Job Tracker" which tracks the progress of MapReduce jobs from beginning to completion stage

* Hadoop moves the Map and Reduce computation logic to all the DataNodes which are hosting a fragment of data

* Communication between the nodes is accomplished using YARN , Hadoop's native resource manager

* The master machine (NameNode) is completely aware of the data stored over each of the worker machines (DataNodes)

* The Master Machine schedules the " Map / Reduce jobs " to Task Trackers with full awareness of the data location which means that Task Trackers residing within the hierarchial monitoring architecture being thoroughly aware of the residing data and their location .

By this the job tracker would be able to fully address the issue of mapping the requisite jobs to their job queues in the form of Job/Task tracker .

For example , if "node A" contains data (x,y,z) and "node B" contains data (a,b,c) , the job tracker schedules node B to perform map or Reduce Tasks on (a,b,c) and node A would be scheduled to perform Map or Reduce tasks on (x,y,z). This helps in reduction of the data traffic and subsequent choking of the network .

* Each DataNode within the MapReduce Jobs has a master program which is called the Task Tracker

Math behind Machine Learning - An introductory article on the usage of mathematics and statistics as the foundation of Machine Learning

Math behind Machine Learning

* If one wants to implement existing machine learning algorithms from scratch or if someone wants to devise newer machine learning algorithms , then one would require a profound knowledge of probability , linear algebra , linear programming and multivariable calculus

* Along with that one may also need to translate math into a form of working code which means that one needs to have a good deal of sophisticated computing skills

* This article is an introduction which would help someone in understanding of the mechanics of machine learning and thereafter describe how to translate math basics into usable code

* If one would like to apply the existing machine learning knowledge for implementation of practical purposes and practical projects , then one can leverage the best of possibilities of machine learning over datasets using R language and Python language's software libraries using some basic knowledge of math , statistics and programming as Machine learning's core foundation is built upon skills in all of these languages

* Some of the things that can be accomplished with a clearer understanding and grasp over these languages is the following :

1) Performance of Machine Learning experiments using R and Python language

2) Knowledge upon Vectors , Variables and Matrices

3) Usage of Descriptive Statistics techniques

4) Knowledge of statistical methods like Mean , Median , Mode , Standard Deviation and other important parameters for judging or evaluating a model

5) Understanding the capabilities and methods in which Machine Learning could be put to work which would help in making better predictions etc

Thursday, April 8, 2021

Testing MapReduce Programs - An introductory Article on Testing of MapReduce Programs for Load and Performance

Testing MapReduce Programs

* Mapper programs running on a cluster are usually complicated to debug

* The best way of debugging MapReduce programs is via usage of print statements over log setions in MapReduce programs

* But in a large application where thousands of programs may be running at any single point of time , running the execution jobs and programs over tens or thousands of nodes is preferred to be done in multiple stages

* Therefore , the most preferred mode of execution of any program is :

(a) To run the programs using small sample datasets ; this would ensure that what so ever program is running , the program is running in an efficient and robust manner . And for checking of the same , the tried and tested formula for applying the working proficiency of the program over a small dataset is done followed by applying the same over a bigger application / bigger dataset / more

number of testcases etc

(b) Expanding the unit tests to cover larger number of datasets and to run the programs over a bigger/larger cluster of network applications . As mentioned in the earlier point , the scope of execution of the testcases is enhanced by application of unit testcases over larger datasets in order to check the robustness

and performance of the system application software

(c) Ensuring that the Mapper and the Reducer functions can handle the inputs more efficiently . This means that the set of Mapper and Reducer functions created to work over the split input data would work in cohesion or in tandem with MapReduce programs desired working condition to produce serial output in desired format (text,key-value pair) etc

* Running the system application against a full dataset would likely expose more issues , which might lead to rise of undue errors , undiscovered issues , unpredictable results , undue fitting criterias and so on type of issues over the software because of which it might not be that conducive for the system analyst to put the entire full dataset to test over the software . But after all necessary

unit testcases have been checked and working criteria and pre-requisites have been fulfilled , one may put the program to be tested over bigger datasets ; by and by making the work of the MapReduce job easier to run , thereby also enhancing

speed and performance issues gradually

* it may be desirable to split the logic into many simpler Mapper and Reducer functions , chaining the programs into single Mapper functions using a facility like ChainMapper library class built within Hadoop (I am yet to explore all the scopes , ChainMapper library class built within Hadoop (I am yet to explore all the scopes ,

specifications and functionalities of the ChainMapper Library which I shall try to cover in a forthcoming session ) . This class can run a chain of Mappers followed by a Reducer function , followed again by a chain of Mapper functions within a single MapReduce Job

* More over testing of MapReduce Jobs / Execution of MapReduce Jobs / Analysis and Debugging of MapReduce Jobs would be done in later articles under appropriate headers and titles .

Map Reduce Data Types and Formats ( A short explanatory Article )

Map Reduce Data Types and Formats

* MapReduce 's model of data processing which includes the following components : inputs and outputs as the Map Reduce Functions consist of Key-Value pairs

* The Map and Reduce functions in Hadoop MapReduce have the following general form which is an accepted mode of representation

Map : (K1,V1) -> list(K2,V2)

Reduce : (K1,list(V2)) -> list(K2,V3)

* The Map input key and value types ( K1,V1 ) are different from the Map output types of (K2,V2)

* However , Reduce input takes in K1 and list values of V2 (which is different in format from that of the Map input over Key1 and associated value V1) . And yet again the output for the Reduce process is a list of the key-value pair (K2 and V3) which is again different from that of Reduce Operations .

* MapReduce can process many different types of data formats which may range from text file formats to databases .

* An "input split" is a chunk of the input that is processed by a single Map function .

* Each Map process processes a single split where each of the split is divided into records and the map function processes each record in the form of a key-value pair

* Splits and Records are a part of logical processing of the records and even maps to a full file / part of a file / collection of different files etc

* In a database context , a split corresponds to a range of rows from a table / record .

Data Science and AI Quest

Wednesday, July 14, 2021

Simulation of Bubble Sort Operation over a List in Python : An Infographic Note for Better Understanding of Iterative Bubble Sort Process

Performing Bubble Sort Operation over a List in Python | Infographic Note on procedure for doing Bubble Sort with Code and Examples

Thursday, May 6, 2021

Representation of a Normal Hypothesis Equation for Regressin taking a matrix and inverse of a matrix and a multiplicable vector - short note

Wednesday, April 21, 2021

An article on - Conditioning Chance and Probability by Bayes Theorem

An article on - Bayes Theorem application and usage

Friday, April 16, 2021

Static Methods in Python - Example of a Static Method in a Class in Python

Class Methods in Python - An example of creation of a Class Method in Python along with sample code

Monday, April 12, 2021

Working with Data in Machine Learning - An overview of methodology for working over Data/Datasets in Machine Learning using R and Python

MapReduce Jobs Execution

Math behind Machine Learning - An introductory article on the usage of mathematics and statistics as the foundation of Machine Learning

Thursday, April 8, 2021

Testing MapReduce Programs - An introductory Article on Testing of MapReduce Programs for Load and Performance

Map Reduce Data Types and Formats ( A short explanatory Article )

One Hot Encoding and Dummy Variables Generation upon a dataframe | Scenario - Perform One-Hot Encoding upon Un-Ordered Data in a sample dataframe and generate One-hot encoded feature variables | Conceptual Infographic Note