Data Science and AI Quest: Validation of Machine Learning Algorithms and Scenarios

Friday, April 30, 2021

Validation of Machine Learning Algorithms and Scenarios - A short article

Validation of Machine Learning Codes

* It is a widely accepted fact that just having some examples in the form of datasets and machine learning algorithm at hand does not assure that solving a machine learning problem is possible or the results would provide any desired solution

* For example ... if one wants a computer to distinguish a photo of a dog from a photo of a cat , one can do it with good examples of dogs and cats . One can then train a dog versus Cat classiﬁer which is based on some machine learning algorithms that would output the probability that a given photo is that of a dog or a cat . All of the times for a set of photos resembling a given photo , the output would be in the form of a validation quantity which would be expressing some level of accuracy for a number which would reﬂect how well the classiﬁer algorithm was able to perform those computations and with what level of alacrity and accuracy . I am using the alacrity which should convey to the reader about the performnace and speed aspect of the identiﬁcation process of the Machine Learning algorithm when computed upon a batch of photos for ﬁnding resemblance over a batch of photos of classes of photos by doing all forms of stucturisation like segmentation and clustering , KNN etc . And when it comes to the factor of accuracy one can think of the degree and magnitude in terms of percentage of resemblance of the referenced sample to the sample over which the matching is to be calculated .

* Based on the probability which is exressed in percentage accuracy , one can then decide whether the class ( that is if a dog or a cat) is based on the estimated probability as calculated by the algorithm .

* Whenever the obtained probability or percentage would be higher for a dog , one can minimize the risk of making a supposed wrong assessment by choosing the higher chances which would be favouring the probability of ﬁnding a dog .

* The greater the probability difference between the likelihood of a dog against that of a cat , the higher would be the conﬁdence that one can have in their choices of ﬁnding any appropriate result

* And in case , the probability difference between the likelihood of a dog against that of a cat , here it can be assumed that the picture of the subject is not clear or probably the subjects in the picture bear much resemblance in features which would indirectly mean that some of the pictures of the cats are similar to that of the dogs and because of which a confusion may arise and lead to another supposition that whether the dogs are cattish in the concerned pictures .

* On the point of training a classiﬁer :

When you pose a problem and offer the examples , with each of the examples being carefully marked with the label or class that the algorithm should learn ; then the computer trains the algorithm for a while and then ﬁnally one would get a resulting model out of the training process of the model over the dataset .

* Here , your computer trains the algorithm for a while and ﬁnally one would get a resulting model for the answer which provides one with an answer or probability .

* Labellling is an another associated activity that can be carried out but in the end a probability is just an opportunity to propose a solution and get an answer

* At such a point , one may have addressed all the issues and perhaps might guess that the work is ﬁnished , but still one may validate the results for ensuring that the results generated are ﬁrst comprehensible to the human , make sure that the user is able to have a clear understanding of the involved background processes and break-up analysis of the code and the result which can enable other readers to understand the code along with numbers

* More over this would be elaborated in the forthcoming sessions / articles where we will look into the various modes in which the machine learning results could be validated and made comprehensible to the users

Last modiﬁed: 16:39

Data Science and AI Quest

Friday, April 30, 2021

Validation of Machine Learning Algorithms and Scenarios - A short article

No comments:

Post a Comment

One Hot Encoding and Dummy Variables Generation upon a dataframe | Scenario - Perform One-Hot Encoding upon Un-Ordered Data in a sample dataframe and generate One-hot encoded feature variables | Conceptual Infographic Note