This technical blog is my own collection of notes , articles , implementations and interpretation of referred topics in coding, programming, data analytics , data science , data warehousing , Cloud Applications and Artificial Intelligence . Feel free to explore my blog and articles for reference and downloads . Do subscribe , like , share and comment ---- Vivek Dash
Monday, May 3, 2021
Linear Regression in One Variable ( Uni-variate ) expression with small short examples - short hand-written summary notes
Notes on Supervised Machine Learning - Handwritten infographic short-points and scenarios
Saturday, May 1, 2021
Machine Learning Revision Notes - part 01
* Every time you want to
search any particular word or term or statements over the internet , one might be
able to do so in the internet with
full ease by the help of Google or
Microsoft who have a very developed
Machine Learning algorithms and their infrastructure
which help the users these days in
the finding of all the search items and texts
easily . This not only works for the particular search and key words
that the user is trying to find , but
the same also happens to be true for the terms
associated with the term which
help in searching related terms and related
text over the internet .
* Every time you want to access
your friends or your own
photos over Facebook of any
other social media site , if the
photo is found out or fetched properly by the social media platform then it is due to the help of machine learning
* Every time you access
your mail and see some unrelated ,
unimportant mail in your mail
box appear in front of you ,
it happens because of the use of Machine Learning which filters all the not much important mails in
the inbox and shows it to
the user which help the user in scrutinizing which of the emails
are of more importance and
which of the mails are less importance or least
important is done with the help of machine learning which is the backbone of the application running the
Machine Learning Code in the background of the application
* Some of the basic and
important task that the machine learning algorithms do these days can be
thought of as the tasks like finding
the shortest path among a given set of points which were more important for finding or doing the generic
things that the users would like to perform
through the help of the computers
within the shortest time possible
* Today ,
Machine Learning touches several segments of
Computing and Basic Science
* Some of the common
things that are used in the subject of Machine
Learning are the following things like :
01) DataBase Mining
* Here , large datasets from growth of Internet and Web have resulted in the creation of large amount of data which can be mined to pick out relevant results and information as per the desire of the user or the store-keeper of the data .Here , the role of database administrator is different from that of the Database Miner as DBA short for Database Administrator is only to handle the overall functioning , storing and permissions over the database whereas the Database Miner has to do the task of housekeeping , performing relevant transformations and processing of the data . This is how the task of the Database Mining Engineer does his or her relevant tasks. The applicable areas over where these things could be put to usage are: Web Click Data , Medical Records , Biology and Engineering Subjects . For instance in the case of accessing Medical Records , the database applications are used for the purpose of finding out historical medical records , similar cases of Medical problems and finding out predictions of some medical problems etc In Medical fields , the field is used in the solving of problems associated with Genes and Gene Engineering , Mutations etc .
* The secondary use of Machine
Engineering is that as most of the
applications can not program by hand,
the applications are Autonomous Helicopter where the computer residing within
the autonomous helicopter learns to fly by itself and selects the paths which
are well suited for its least route problem . Several other associated algorithms might be
also in use with the path finding and path tracing algorithms in the given scenario..
so with this the autonomous
helicopter is able to do so much of different actions that even
the maneuvered helicopters cannot perform . One of the other uses of such program is in the field of
Handwriting Recognition where a machine learning algorithm is capable to perform the usage of pattern
and word recognition of handwriting
of any person , along with that the other uses of ML are also in the
context of Natural Language Processing where algorithms are used to read and
infer sentiments , patterns and various other aspects of the words of the users
. Now, after all these things have been put to usage , things have evolved to
such an extent that ML and AI can evolve own text and precedence of words and
can even write stories on their own . The next thing used by Machine Learning
is the use of Computer Vision which is quite significant as the computing
algorithms can learn to recognize faces and gestures and make inferences based
on their assessment . The same is also used for vigilance and security purposes
as well .
* ML is
also used in the context of Self-Customization
programs for example in the case of
programs like Amazon , Netflix product
recommendations etc which get
tuned to the usage pattern of
the user who is using a particular product
* Finally , Machine Learning algorithms are being used for identification of Human Learning which happens in the brain which is associated with the real workings of the neural networks which are used for the purpose of enhancement of real Artificial Engines and networks
* In the
forthcoming articles , we will dig deeper into the main types of Machine
Learning Algorithms and their usages
Last
modified: 1 May 2021
Friday, April 30, 2021
Updating Machine Learning Algorithms by Mini-Batch and Batch Wise
* Machine Learning boils down to an optimization problem in which one could look for a global minimum given
a certain cost function
* Working out an optimization
algorithm using all the data available is an advantage , because it allows
checking all the data which is clearly an advantage as it allows checking that
too in the form of iteration by iteration in order to determine the amount of minimization with respect to
all the data
* It is
the single most reason by which Machine Learning algorithms prefer to use all the data available at any
instance , which they want to access
inside the memory of the residing computer or the virtual
memory of the GPU with tons of secondary memory available
to
it
* Learning techniques based on statistical
algorithms use calculus and matrix algebra
, and they need all the data within
the memory .
*
Simpler algorithms such as those based on step-by-step search of the
next best solution by proceeding iteration
by iteration through partial
solution ( such as gradient descent )
can gain an advantage when
developing a hypothesis which is based on all
the data because the algorithms can catch
some amount of weaker signals on the spot
and avoid getting fooled by the noise
in the data . This means that
the machine learning algorithms can develop
themselves for the purpose of
learning either in supervised or un-supervised format which would help in the
overall learning process subject to the conditions of either presence or
absence of any noise .
* While Operating within the data limits of the computer's memory , one can think that one is working upon a core memory .As straightforward as it is , one could imagine that all the operational computations do take place within the memory of the computer which either could be a primary memory or the secondary memory . But as the precedence of the computation needs to be first and foremost , the primary memory is assigned for the task which jostles up when triggered with an incoming process and gets to action .
*
The afore-mentioned
mechanism is quite well suited for the
purpose of memory allocation to a process and an algorithm's
execution which is called as "Batch Algorithm" because as in a
factory where machines process batches of materials , such algorithms learn to handle
and predict a single data or batch data at a given point of time . The incoming
data is generally represented in the form
of a Data Matrix .
* It is
also believed that sometimes data cannot fit into core memory because the
size of data is too big . Under such circumstances , data which is derived
from the web is a typical example of information
that cannot fit easily into the memory .
Since most of the data might be homogenuous or heterogenuous in form and cannot
be boiled down a particular format within the precincts of xml , json , sql ,
no-sql , big data etc the derived data is relatively hard to be deciphered and
fitted .
* A few strategies which can help
in the determination of the amount of data whether it is too big or too low is to fit the data into standard memory
of a single computer . A first
solution that one can try is to
subsample the data into smaller samples
.
* Here , the data is reshaped by a selection of cases and sometimes with features which is based on statistical sampling into a more
manageable yet reduced form of data matrix
.Reducing the data cannot always provide the same results as during the time of analysis of the data .
Also another problem that can come while working with less amount of data is
that they can produce less powerful models . But in case , if the process of subsampling is executed in proper manner , then the approach can generate
reliable and good results . Therefore , it is said that a successful subsampling
must correctly use statistical sampling by employing random or stratified sample
drawings
* Now we
will try to have a bird'e eye view on the various
forms / methods of sampling which are
used during the process of data
reshaping and data reducing :
1)
Random Sampling
* In random
sampling , one can create a sample
by randomly choosing the examples or sub-samples associated with any part of the sample . Here , the larger the size of the sample , the more likely the sample
will resemble the original structure and the variety
of the data .
2)
Stratified Sampling
* In Stratified Sampling , one can control the final disribution of the target variable or of certain features within the data that one
deems critical for successfully replicating the
characteristics of the complete data .
* One of a classic examples
of stratified sampling is to draw a sample in a classroom which is made up of different proportions of males and
females in order to guess the average height of the class .
* If the
females of the class are on average
, shorter / smaller in height in proportion to
the males of the class .. then one may like
to draw a sample which would
replicate the same amount of proportion
from the considered sample in order to obtain
a reliable estimate of the average height .
* If one would only sample
only the males by mistake , then one
will overestimate the average
height as in general the sub-sample which produced
such a result is taken into
consideration .. then the sub-sample would only fetch that
data which is tilted in numbers
towards the more contributing items from
the picked out sub-sample. So as Boys or the males of the class as a
sub-sample outweighs the average height factor inside the sample called class ,
the factor over which this attribute makes the sub-sample supercede the other
sub-sample would lead to an over-estimation of trend due to negating out the
lesser dominant attribute of average height of the sub-sample of girls / females
of the class
================
Sampling Strategy
================
* In order to avoid such problems that might come up during such problems during " Random Sampling " and " Stratified Sampling " , one has to draw a sub-sample of enough examples given a brief idea about what is the exact requirement that one is trying to fulfill which has been provided for in the sampling strategy used for defining the varieties of data .
*
Data with high
dimensionality is larger characterised by many cases and many features , this is more difficult to
sub-sample because this needs a much sample
which may not even fit into the core memory of the sampling strategy
* After one has chosen a proper sampling strategy for creation and picking up a sampling strategy , given the existence of several memory limitations which would be used to represent the variety of data . It is a widespread assumption that Data with high dimensionality , characterised by many number of cases and features are more difficult to sub-sample as it would need a much larger sample , which may not even judiciously or marginally or completely fot over the core memory .
=========
Network Parallelism
=========
· Beyond the process of sub-sampling, a second possible solution to fitting the data in the memory is to leverage the problem of "network parallelism" which splits the data into multiple computers which are connected over a network . Each of the computer handles part of the data for the process of optimization . After each of the computer has done its own computation and all of the parallel optimizations have been reduced to a form of single dimension and proportion , the core memory
*
In
order to understand
how the process of solution
works , one can compare the process of
building a car in a piece by
piece manner starting from its framework as per the blueprint to the core to the
complete body which can be either
done by a line of assembly
workers and robotic hands of manufacturing . Apart from having a faster
assmebly execution , one does not have to keep all the parts within the factory
at the same time . In a very similar manner , one doesn't have to keep all the
data parts within a single computer or computing device , but one can take
advantage of the distributed architecture which helps in the distributed and
parallel working mechanism over different computers , thereby overcoming some of
the core memory limitations that can take place as a result of network
parallelism .
* This approach serves as the basis of map-reduce technology and cluster- computer frameworks, Apache Spark etc . A quick recap of the underlying technology can throw some light upon these in the following manner -- Map Reduce technology is a sorting and storing technique which takes numerous data files , arranges them in the form of virtual data queues with sorted data from top to bottom manner upon which indexing and count of the words is performed and the result is kept in the form of mapped key-value pairs in the storing architecture . Clustered computer and parallel servers also follow a similar structure of data representation in the form of master and slave nodes for data storage and data access . One needs to explore more on the exact manner of such storage systems which facilitate high-level data storage .
*
All these
technologies are focused on mapping a problem over to multiple machines and then finally reducing their output into a desired solution . This means that all the machine learning
computations for large scale data
reading are not done by single data servers or computers but
rather they are done in a parallel and distributed manner
which touches many a nodes ( as in child nodes rest and then finally assimilating back
upon a root node ) before the final result of the entire
learning process along with
the result is thrown as an output to the
user in charge of reading the results
at the root node .
*
But
along with such
sophisticated and complex system in place for reading of such data
, one cannot split all the machine
learning algorithms into separable processes and this problem limits the usability of such an approach . Also , more
importantly one would encounter
significant amount of cost and
time overhead in the process of setup
and maintenance when one keeps a network of computers ready for such kind of data
processing . As such kind of massive level of computation and infrastructure is
beyond the reach of individuals with less funding and lower level application
setup , this is mainly hosted and distributed by a sleuth of large scale
organisations having the ability to havesz big scale infrastructure for implementing
, organising and running the chain .
* The third solution is to rely on out-of-core
algorithms which work by keeping the data on
the storage device and feeding into the computer memory for processing . The feeding process is
called as streaming because the data chunks are smaller than the core memory ,
the algorithm can handle the data properly
and use the data for updating the machine learning algorithm optimization
. After the update , the system discards them in favour of new chunks which the
algorithm uses for the purpose of learning . This process goes on repetitively until
there are no more chunks left . Data Chunks can be small ( depending upon the
Core Memory ) and the process is called as mini-batch learning or they have can
be constituted by just a single example which is called as Online Learning.
* The previously described gradient descent which can be used with other iterative algorithms can work fine with such an
approach however; reaching an optimization
takes longer because the gradient's path is more erratic and non- linear
with respect to a batch approach. The algorithm can reach a solution using fewer numbers of computations with respect to its
in-memory versions.
* While
working with any related updates of the parameters
which are based on mini-batches
and single-examples , the gradient descent algorithm takes the name stochastic gradient descent which
will reach a proper optimization solution with the given pre-requisites
1) The examples streamed are
randomly extracted ( hence they are called as stochastic , recalling
the idea of a random extraction )
2) A proper learning rate is
defined as some fixed or flexible value
which according to the number of observations or other criteria
* The learning parameter can make a great difference in the quality
of the optimisation because a high
learning rate even though is faster than the optimisation can constrain the parameters to the
effects of noisy or erroneous
examples seen at the beginning of the
stream .
* A high learning rate also
renders the algorithm very insensible
to the latter streamed observations
which can prove to be a problem when the algorithm is
learning from sources that are naturally evolving and mutable such as data from digital advertising
sector where new advertising campaigns start mutating
the level of attention and response of the targeted individuals
Last
modified: 12:23