This technical blog is my own collection of notes , articles , implementations and interpretation of referred topics in coding, programming, data analytics , data science , data warehousing , Cloud Applications and Artificial Intelligence . Feel free to explore my blog and articles for reference and downloads . Do subscribe , like , share and comment ---- Vivek Dash
Wednesday, July 14, 2021
Thursday, May 6, 2021
Wednesday, April 21, 2021
An article on - Conditioning Chance and Probability by Bayes Theorem
Conditioning Chance & Probability by Bayes Theorem
* Probability is one of the most key important factors that takes into effect the condition of time and space but there are other measures which go hand in hand with the measures that go into calculation of probability values and that is Conditional Probability which takes into effect the chance of occurrence of one particular event with effect to occurrence of some other events that may also affect the possibility and probability of the other event .
* When one would like to
estimate the probability of any given event , one may believe the probability
of some value to be applicable to some values which one may calculate upon a
set of possible events or situations . This term is used to express a belief of
"apriori probability"
which means general probability of any given event .
* For example , in the condition
of a throw of a coin ... if the coin thrown is a fair coin , then it could be
said that the apriori probability of occurrence of a head is around 50 percent
. This means that when someone would go for tossing a coin , he already knows
what is the probability of occurrence of a positive ( in other words .. desired
outcome ) otherwise occurrence of a negative outcome ( in other words ..
undesired outcome ) .
* Therefore , no matter how many
times one would toss a coin .. whenever faced with a new toss the probability
of occurrence of a heads is still 50 percent and the probability of occurrence
of a tail is still 50 percent .
* But consider a situation where
if someone wishes to change the context , then the subject of apriori probability is not valid
anymore .. because something subtle has happened and changed the outcome as we
all know there are some prerequisites and conditions that must satisfy so that
the general experiment could be carried out and come to fruitition. In such a
case , one can express the belief as a form of posteriori probability which is the priori probability after
something has happened that would tend to modify the count or outcome of the
event .
* For instance , gender estimation for a person being either a male or a female is the same which is about 50 percent in almost all of the cases . But this general assumption that any population taken into account would be having the same demography is wrong as I happened to come across my referenced article that what generally happens in a demographic population is that generally the women are the ones who tend to live longer and exceed their counterpart males in most of the cases in all of human existence .. as they are mostly the ones who tend to live longer and exceed their counterpart males in most of the factors that contribute to the general well being , and as a result of which the population demographic tilt is more towards the female gender .
Hence , putting all these factors into account
that contribute to the general estimate of any population , one should not
ideally take gender as a main parameter for determination of population data
because this factor is tilted in age-brackets and hence an overall idea for
generalisation of this factor should not be considered .
* Again , taking this factor of
gender into account , the posteriori probability is different from the expected apriori one which in this
example can consider gender to be the parameter for estimation of population
data and thus estimate somebody's probability of gender on the belief that
there are 50 percent males and 50 percent females in a given population data .
* One can view cases of conditional probability in the given manner P(y(x)) which in mathematical sense can be read as probability of the event y given the probability of occurrence of event x takes place . For the great relevance Conditional Probability plays in the concepts and studies of machine learning , learning and understanding the syntax of representation , expression and comprehension of the given equation is of great paramount importance to any newbie or virtuoso in the field of maths , statistics and machine learning . Hence , again if someone comes across a notation for conditional probability in the form P(y(x)) which can be read as the probability of event Y happening given X has already happened .
* As mentioned earlier in the
above paragraph , because of its dependence on possibility of occurrence on
single or multiple prior conditions , the role of conditional probability is of
paramount importance for machine learning which takes into effect statistical
conditions of occurrence of any event . If the apriori probability can change
because of circumstances, knowing the possible circumstances can give a big
push in one's chances of correctly predicting any event by observing the
underlying examples - exactly what machine learning generally intends to do .
* Generally , the possibility of finding a random person's gender as a male or female is around 50 percent . But , in case one would like to take into consideration the mortal aspects and age factor of any population , we have seen that the demographic tilt is more in favour of females . If under all such conditions , one would take into consideration the female population , and then dictate a machine learning algorithm to find out the gender of the considered person on the basis of their apriori conditions like length of hair , mortality rate etc , the ML algorithm would be able to very well determine the solicited answer
An article on - Bayes Theorem application and usage
Bayes
Theorem application and usage
Instance and example of usage of
Bayes Theorem in Maths and Statistics :
P(B|E)
= P(E|B)*P(B) / P(E)
If one reads the formula , then
one will come across the following terms within the instance which can be elaborated
with the help of an instance in the following manner :
*
P( B | E ) - The probability of a belief(B) given a set of evidence(E) is
called over here as Posterior Probability . Here , this statement tries to
convey the underlying first condition that would be evaluated for going forth
over to the next condition for sequential execution . In the given case , the
hypothesis that is presented to the reader is whether a person is a female and
given the length of her hair is sufficiently long , the subject in concern must
be a girl
*
P( E | B ) - In this conditional form of probability expression , it is
expressed that one could be a female given the condition that the subject has
sufficiently long hair . In this case , the equation translates to a form of
conditional probability .
* P
( B ) -
Here , the case B stands for the general probaility of being a female with a
priori probability of the belief . In the given case , the probability is
around 50 percent which could be also translated to a likelihood of occurrence
of around 0.5 likelihood
*
P(E) -
This is the case of calculating the general probability of having long hair .
As per general belief , in a conditional probability equation this term should
be also treated as a case of priori probability which means the value for its
probability estimate is available well in advance and therefore , the value is pivotal
for formulation of the posterior probability
If one would be able to solve
the previous problem using the Bayes Formula , then all the constituent values
would be put in the given equation which would fill in the given values of the
equation .
The same type of analogy is also
required for estimation of a certain disease among a certain set of population
where one would very likely take to calculate the presence of any particular
disease within a given population . For this one needs to undergo a certain
type of test which would result in producing a viable or a positive result .
Generally , it is perceived that
most of the medical tests are not completely accurate and the laboratory would
tell for the presence of a certain malignancy within a test which would convey
a condensed result about the condition of within a test which would convey a
condensed result about the condition of illness of the concerned case .
For the case , when one would
like to see the number of people showing a positive response from a test is as
follows :
1)
Case -1 :
Who is ill and who gets the correct answer from the test .
This is normally used for the case
of estimation of true positives which amounts to 99 percent of the 1 percent of the population
who get the illness
2)
Case-2 :
Who is not ill and who gets the wrong diagnosis result from the test . This
group consists of 1 percent of the 99 percent of the population who would get a
positive response , even though the illness hasn't been completely discovered
or ascertained in the given cases . Again , this is a multiplication of 99
percent and 1 percent ; this group would correspond to the discovery of false
positive cases among the given sample . In simple words , this category of grouping
takes into its ambit , those patients who are actually not ill (may be fit and
fine ) , but due to some aberrations or mistakes in the report which might be
under the case of mis-diagnosis of a patient that , the patient is discovered
as a ill person . Under such
circumstances, untoward cases of administration of wrong medicines might happen
, which rather than curing the person of the given illness might inflict aggravations
over the person rendering him more vulnerable to hazards , catastrophies and
probably untimely death
* So going through the given cases of estimation of correct cases of Classification for a certain disease or illness could help in proper medicine administration which could help in recovery of the patient owing to right Classification of the case ; and if not then the patient would be wrongly classified in a wrong category and probably wrong medicines could get administered to the patient seeking medical assistance for his illness .
( I hope , there is some
understanding clarity in the cases where the role of Bayesian Probability
estimations could be put to use . As mentioned , the usage of this algorithm
takes place in a wide-manner for the case of proper treatment and
classification of illnesses and patients ; classification of fraudulent cases
or credit card / debt card utilisation , productivity of employees at a given
organisation by the management after evaluation of certain metrices :P ...... I
shall try to extend the use case and applications of this theorem in later
blogs and articles )
Friday, April 16, 2021
Static Methods in Python - Example of a Static Method in a Class in Python
Static Methods in Python
* One can use Static Methods while the class level but one may
not involve the class or the constituting instances .
* Static Methods are used when one wants to process an element
in relation to a class but does not need the class or its instance to perform
any work .
* Example :
writing the environmental variables that go inside creation of a
class , counting the number of instances of the class or changing an attribute
in another class are the tasks related to a class .. such tasks can be handled
by Static Methods
* Static Methods can be used to accept some values , process the
values and then return the result .
* Also one could use Static Methods to accept some values ,
process the values and then return the result
* In this case , the involvement of neither the class nor the
objects is of paramount importance .
* Static methods are written with a decorator @staticmethod
above the methods
* Static Methods are called in the form of classname,method()
* In the following method , one is creating a static method "noObjects()" that counts the number of objects or instances created in MyClass . In MyClass , one can write a constructor that increments the class variable 'n' everytime an instance of the class is created . This incremented value of 'n' gets displayed by the "noObject()" method
Class Methods in Python - An example of creation of a Class Method in Python along with sample code
Class Methods in Python
* These are the set of Methods are act on class level . Class
Methods are the methods which act on the class variables or static variables .
* The Class Methods can be written using @classmethod decorator
above them .
* For example , 'cls.var' is the format to refer to the class
variable which includes methods which can be generally called using the
classname.method()
* The process which is commonly needed by all the instances of
the class is handled by the class methods
* In the given example below, one can see the instance of the
class which is handled by the class methods . The same program can be developed
using an example class which can be used in the following manner .
* In the example , one can refer to a sample Bird Class for more
insight into the description and elaboration of a Method Class . All the birds
in nature have only 2 wings (as we mostly see , but there are abberations
ofcourse ). Here , one can take an instance of a Bird Class . All the Birds in
Nature have 2 wings , therefore one can take 'wings' as a class variable and a
copy of this class variable is
available to all the instances of the Bird Class . In this Bird
class , we will create a hypothetical method which applies to the functions
that a Bird can operate upon and thus will make use of this method that is
"fly" method (... to fly above
the sky ... fly rhymes good with sky .. please accept from me a satirical High
of Hi ... I am poetic too you know :P)
* So where was I .. ya I was at the forefront of creation of a
class which would take into its ambit a Bird which would have some generic
applicable class variables all applicable to the organism class of Birds like
all Birds have a pair of wings .. that makes the count to 2 . And birds fly ..
which is a method attributed to birds . These two class variables and class
methods would be made use of to instantiate a generic sample class of a Bird
* So lets create a Bird with two wings and this flies too (I am
sorry to know that when God created a Bird like Penguin , God forgot to add the
the instance function "fly" to its class genus ... therefore I shall
also keep this off the charts for penguins,kiwis and take only those birds
which can fly ... up above the sky )
* Without further ado .. lets get to this class creation which
would take into effect all the common features of all the instances of a Bird
Class
==================================
==================================
#
understanding class methods
class Bird:
# calling a class
variable
wings = 2
# creating a class
method @classmethod
def fly(cls,name):
print('{} flies with
{} wings'.format(name,cls.wings))
#display information of 2 birds
Bird.fly('Garuda')
Bird.fly('Pigeon')
Bird.fly('Crow')
Bird.fly('HummingBird')
Bird.fly('Eagle')
==================================
==================================
Output
Garuda flies with 2 wings
Pigeon flies with 2 wings
Crow flies with 2 wings
HummingBird flies with 2 wings
Eagle flies with 2 wings
==================================
==================================
Monday, April 12, 2021
Working with Data in Machine Learning - An overview of methodology for working over Data/Datasets in Machine Learning using R and Python
Working with Data in Machine Learning
* Machine Learning is one of the most appealing subjects because
it allows machines to learn from real world examples such as sales records ,
signals from sensors and textual datastreaming from internet and then determine
what such data would imply with the help of that subject
* The most common outputs that can commence from machine
learning algorithms is prediction of the future , prescriptions and
prescriptive knowledge for design and build up of applications etc
* Some of the common outputs that can come from machine learning
algorithms is the following : prediction of the future , prescription to act on
some given knowledge or information , creation of new knowledge in terms of
examples categorised by groups
* Some of the applications which are already in place and have
become a reality thanks to leveraging the use of such knowledge are the
following things :
01) Diagnosing hard to find diseases
02) Discovering criminal behaviour and detecting criminals in action
03) Recommending the right product to the right person
04) Filtering and classifying data from internet at an big scale
05) Driving a car autonomously etc
* The mathematical and statistical basis of machine learning
makes outputting such useful results possible
* One can use Math and Statistics over such accumulated data
which could enable algorithms to understand anything with a numerical basis
* In order to begin the process of working with Data , one
should represent the solution to the problem in the form of a number .
* For example , if one wants to diagnose a disease using a
machine learning algorithm , one can make the response to a particular learning
problem a 1 or 0 (binary response) which would inform about the illness of the
person . A value of 1 would indicate that the person is ill , with a value of 1
stating that the person is ill or not .
Alternatively , one can use a number between the values 0 and 1 to convey an
MapReduce Jobs Execution
MapReduce Jobs Execution
* A MapReduce job is specified by a Map program and the Reduce
program along with the data sets associated with a MapReduce Job
* There is another master program that resides and runs
endlessly over a NameNode which is called as the "Job Tracker" which
tracks the progress of MapReduce jobs from beginning to completion stage
* Hadoop moves the Map and Reduce computation logic to all the
DataNodes which are hosting a fragment of data
* Communication between the nodes is accomplished using YARN ,
Hadoop's native resource manager
* The master machine (NameNode) is completely aware of the data
stored over each of the worker machines (DataNodes)
* The Master Machine schedules the " Map / Reduce jobs
" to Task Trackers with full awareness of the data location which means
that Task Trackers residing within the hierarchial monitoring architecture
being thoroughly aware of the residing data and their location .
By this the job tracker would be able to fully address the issue
of mapping the requisite jobs to their job queues in the form of Job/Task
tracker .
For example , if "node A" contains data (x,y,z) and
"node B" contains data (a,b,c) , the job tracker schedules node B to
perform map or Reduce Tasks on (a,b,c) and node A would be scheduled to perform
Map or Reduce tasks on (x,y,z). This helps in reduction of the data traffic and
subsequent choking of the network .
* Each DataNode within the MapReduce Jobs has a master program
which is called the Task Tracker
Math behind Machine Learning - An introductory article on the usage of mathematics and statistics as the foundation of Machine Learning
Math
behind Machine Learning
* If one wants to implement existing machine learning algorithms
from scratch or if someone wants to devise newer machine learning algorithms ,
then one would require a profound knowledge of probability , linear algebra ,
linear programming and multivariable calculus
* Along with that one may also need to translate math into a
form of working code which means that one needs to have a good deal of
sophisticated computing skills
* This article is an introduction which would help someone in
understanding of the mechanics of machine learning and thereafter describe how
to translate math basics into usable code
* If one would like to apply the existing machine learning
knowledge for implementation of practical purposes and practical projects ,
then one can leverage the best of possibilities of machine learning over
datasets using R language and Python language's software libraries using some
basic knowledge of math , statistics and programming as Machine learning's core
foundation is built upon skills in all of these languages
* Some of the things that can be accomplished with a clearer
understanding and grasp over these languages is the following :
1) Performance of Machine Learning experiments using R and
Python language
2) Knowledge upon Vectors , Variables and Matrices
3) Usage of Descriptive Statistics techniques
4) Knowledge of statistical methods like Mean , Median , Mode ,
Standard Deviation and other important parameters for judging or evaluating a
model
5) Understanding the capabilities and methods in which Machine
Learning could be put to work which would help in making better predictions etc
Thursday, April 8, 2021
Testing MapReduce Programs - An introductory Article on Testing of MapReduce Programs for Load and Performance
Testing MapReduce Programs
* Mapper programs running on a cluster are usually complicated
to debug
* The best way of debugging MapReduce programs is via usage of
print statements over log setions in MapReduce programs
* But in a large application where thousands of programs may be
running at any single point of time , running the execution jobs and programs
over tens or thousands of nodes is preferred to be done in multiple stages
* Therefore , the most preferred mode of execution of any
program is :
(a) To run the programs using small sample datasets ; this would
ensure that what so ever program is running , the program is running in an
efficient and robust manner . And for checking of the same , the tried and
tested formula for applying the working proficiency of the program over a small
dataset is done followed by applying the same over a bigger application /
bigger dataset / more
number of testcases etc
(b) Expanding the unit tests to cover larger number of datasets
and to run the programs over a bigger/larger cluster of network applications .
As mentioned in the earlier point , the scope of execution of the testcases is
enhanced by application of unit testcases over larger datasets in order to
check the robustness
and performance of the system application software
(c) Ensuring that the Mapper and the Reducer functions can
handle the inputs more efficiently . This means that the set of Mapper and
Reducer functions created to work over the split input data would work in
cohesion or in tandem with MapReduce programs desired working condition to
produce serial output in desired format (text,key-value pair) etc
* Running the system application against a full dataset would
likely expose more issues , which might lead to rise of undue errors ,
undiscovered issues , unpredictable results , undue fitting criterias and so on
type of issues over the software because of which it might not be that
conducive for the system analyst to put the entire full dataset to test over
the software . But after all necessary
unit testcases have been checked and working criteria and
pre-requisites have been fulfilled , one may put the program to be tested over
bigger datasets ; by and by making the work of the MapReduce job easier to run
, thereby also enhancing
speed and performance issues gradually
* it may be desirable to split the logic into many simpler
Mapper and Reducer functions , chaining the programs into single Mapper
functions using a facility like ChainMapper library class built within Hadoop
(I am yet to explore all the scopes , ChainMapper library class built within
Hadoop (I am yet to explore all the scopes ,
specifications and functionalities of the ChainMapper Library
which I shall try to cover in a forthcoming session ) . This class can run a
chain of Mappers followed by a Reducer function , followed again by a chain of
Mapper functions within a single MapReduce Job
* More over testing of MapReduce Jobs / Execution of MapReduce
Jobs / Analysis and Debugging of MapReduce Jobs would be done in later articles
under appropriate headers and titles .
Map Reduce Data Types and Formats ( A short explanatory Article )
Map Reduce Data Types and Formats
Map : (K1,V1) -> list(K2,V2)
Reduce : (K1,list(V2)) -> list(K2,V3)
* The Map input key and value types ( K1,V1 ) are different from
the Map output types of (K2,V2)
* However , Reduce input takes in K1 and list values of V2
(which is different in format from that of the Map input over Key1 and associated
value V1) . And yet again the output for the Reduce process is a list of the
key-value pair (K2 and V3) which is again different from that of Reduce
Operations .
* MapReduce can process many different types of data formats
which may range from text file formats to databases .
* An "input split" is a chunk of the input that is processed by a single Map function .
* Each Map process processes a single split where each of the
split is divided into records and the map function processes each record in the
form of a key-value pair
* Splits and Records are a part of logical processing of the
records and even maps to a full file / part of a file / collection of different
files etc
* In a database context , a split corresponds to a range of rows
from a table / record .