Validation of Machine Learning Codes
* For example ... if one wants a computer to distinguish a photo of a dog from a photo of a cat , one can do it with good examples of dogs and cats . One can then train a dog versus Cat classifier which is based on some machine learning algorithms that would output the probability that a given photo is that of a dog or a cat . All of the times for a set of photos resembling a given photo , the output would be in the form of a validation quantity which would be expressing some level of accuracy for a number which would reflect how well the classifier algorithm was able to perform those computations and with what level of alacrity and accuracy . I am using the alacrity which should convey to the reader about the performnace and speed aspect of the identification process of the Machine Learning algorithm when computed upon a batch of photos for finding resemblance over a batch of photos of classes of photos by doing all forms of stucturisation like segmentation and clustering , KNN etc . And when it comes to the factor of accuracy one can think of the degree and magnitude in terms of percentage of resemblance of the referenced sample to the sample over which the matching is to be calculated .
*
Based on the probability which is exressed in
percentage accuracy , one can then decide whether the class ( that is if a
dog or a cat) is based on the estimated
probability as calculated by the algorithm .
* Whenever the
obtained probability or percentage
would be higher for a dog , one
can minimize the risk of making a supposed wrong assessment by choosing
the higher chances which would be favouring the probability of finding a dog .
* The greater the
probability difference between the
likelihood of a dog against that of a cat , the higher
would be the confidence that one can have in their choices of finding any appropriate result
* And in case , the probability difference between the likelihood of a dog against that of a cat , here it can be assumed that the picture of the subject is not clear or probably the subjects in the picture bear much resemblance in features which would indirectly mean that some of the pictures of the cats are similar to that of the dogs and because of which a confusion may arise and lead to another supposition that whether the dogs are cattish in the concerned pictures .
* On
the point of training a classifier :
When you pose a problem and offer the examples , with each of the examples being carefully marked with the label or class that the algorithm should learn ; then the computer trains the algorithm for a while and then finally one would get a resulting model out of the training process of the model over the dataset .
* Here , your computer
trains the algorithm for a while and
finally one would get a resulting model for the answer which provides
one with an answer or probability .
* Labellling is an another associated activity that can be carried out but in the end a probability is just an opportunity to propose
a solution and get an answer
* At such a point , one
may have addressed all the
issues and perhaps might guess that the work is finished , but still one may validate
the results for ensuring that the
results generated are first
comprehensible to the human , make
sure that the user is able to have a clear understanding of the involved background processes and break-up analysis of the code
and the result which can enable
other readers to understand the code along with numbers
* More over
this would be elaborated in the forthcoming sessions / articles where we will look into the various modes in which the
machine learning results could be validated and made comprehensible to the users
Last
modified: 16:39