Data Science and AI Quest

Wednesday, March 3, 2021

Cardinality Ratio concept in "Database and Management Systems" with explanatory figurative example

Cardinality Ratio concept in DBMS

Cardinality Ratio

It is the number of relationship instance that an entity can participate in . If one tries to understand the relationship between Student and Guide , then the relationship between the Student entity and the guide entity can be described in the following entity - relationship diagram .

Considering the given case , one can observe and try to understand the given relationship with the help of a Entity Relation Diagram with the help of a Set Diagram .

Here , some scenarios emerge like the given case :

Case-1

One guide (G1) can guide two students (S1) and (S2) whereas (S1) can only be guided by guide (G1) .

Case-2

Third Student (S3) can be guided by guide (G2) .

This is a kind of restriction set by the relationship where both the entity sets are mutually associated to each other by the relation between them . So here , the number of instance objects relation among each other is restricted and it can be also observed how the cardinality relationship is mapping one particular instance to another particular instance object through the given relationship diagram which depicts the way the entities can participate in .

From the above one can get a pretty good understanding of what a Cardinality Ratio is : -

Definition of Cardinality Ratio

The number of relationship instances that an entity can participate in is called as the Cardinality Ratio .

From the above diagram it can be noted that , one department would have only one HOD . So , in this case , the relationship would be only 1 is to 1 .

From the diagram we deduce that :

Relationship R1 exists from Department D1 to HOD H1 .

Relationship R2 exists from Department D2 to HOD H2 .

Relationship R3 exists from the Department D3 to HOD H3 .

A scenario over where such a type of relationship exists where there is only one relationship mapping from one Set's instance object to another Set's instance object is called a One to One relationship .

One can get a better understanding of this through the help of a E-R diagram shown at the bottom of the above figure .

===================================================================

The second type of relationship that exists is called One to Many Relationship

In the given figure , one can notice that there is a relationship existing between many departments and one student . Each department will have multiple students and thus one can notice from the given relation diagram that multiple relations exist from one department to student of another set but the student would be associated with only one department .

This is an instance of Many to One Relationship which is depicted by the Ratio form of representation (1:M) .which is another form of cardinality ratio expressed in the form of Many to One relation .

This means that Many Instances of any particular Entity Type will be associated or will be participating in the "Has" relationship .

Many to Many Relationship

In the given example , if one can see then one would be able to determine that there is Many to Many Relationship between the students set and Subjects set .

This can be rightfully depicted in the form , Relation between a Student on the left hand side and Subject on the right hand side of the Relationship Diagram . One can notice that multiple instance objects belonging to the set "Student" bear a many to many relationship between students of the other entity set which is the "Subject" set .

The depiction of the relation has been done through the help of an E-R diagram .

===================================================================

Now , we can get to understand the behaviour of these relationships that depict the manner in which relationships exist between instance objects of one set with another or multiple other sets in the article's showcased manner

Tuesday, March 2, 2021

Basic Domain Types Supported by SQL

SQL - Structured Query Language ( a basic course overview & revision on the paradigms and features of SQL as a Relational Database Management Language )

SQL - Structured Query Language

( a basic overview )

· The backbone of any Query Based Language is Relational Algebra which provides a concise , formal notation for representation of queries . However , commercial database systems required a query language which would be much user friendly and for this purpose SQL short for Structured Query Language was created which was the most influential commercially marketed query language .

· SQL uses a combination of relational algebra and relational calculus constructs for its construction and implementation

· Although we refer to SQL language as a "query language" , SQL can do much more than just query a database . SQL can define the structure of the data , modify the data in the database and it can also specify security constraints . This means that SQL can present to its user a brief highlight of the size of the database that the user is working upon . SQL can also be used to define any database and also the tables that a user wants to create and maintain over a database residing in the system as well as server memory . Also , on top of that SQL can be used to modify the structure of the database over which other data reside . Also , SQL can be used to specify the various security constraints that need to be defined over a database which would make the data specific to the users it is meant for by provisioning the access specifiers for access limitation as well as defining the needed integrity constraints which would ensure the ACID properties of a suitable relational database management system .

· In this article a complete User's Guide and How to Use / Implement a SQL installation won't be discussed nor is it intended for in this basic article but the aim of the article is to present SQL's fundamental constructs and concepts for brief usage by any User who is not only new to SQL but may be an intermediate or highly skilled versatile user of the language .

· Background of SQL - IBM developed the original version of SQL which was originally called as "Sequel" ( Any of the present day readers of the article can perhaps think of the word Sequel as some sort of movie sequel kind of the unlimited sequels of the Star Wars series .. seems a poor joke ) as part of the System R project of the early 1970's . The Sequel language has evolved since then , and its name has changed to SQL ( Structured Query Language ) . Many applications and products in the form of standalone and server based products now implement SQL as a part of their database engine and thus over the years SQL has become the supreme rdbms package which is not only widely used but also the most user friendly product among all the users of the current database users .

· In 1986 , the American National Standards Institute ( ANSI ) and the International Organisation of Standardisation ( ISO ) published an SQL standard which used to be called as SQL-86 at that point of time ( the postfix 86 succeeding SQL obviously comes from the manufacturing year of 1986 ) . Following this the standardisation organisation ANSI published an extended standard for SQL , SQL -89 in the year 1989 . The next version of the standard was then published in the year 1992 and thus the version was therefore called as SQL-92 standard , and then later versions followed like SQL-1999 etc . The most recent version that I have worked upon and also is installed over my system is the SQL-2003 version which serves most of the purpose . Apart from some server connection issues and importing data from other incompatible transactional data formats , I have not faced any major hurdles while using the present version of standalone SQL installation , However , if one wants to read a full scale bibliographic notes reference of these standards , one may read them in the documentation notes of the accompanying installation CD or executable software .

· SQL language has the several parts namely :

1) Data Definition Language (DDL)

The SQL DDL provides commands for defining the relation schemas , deletion of the relations and modification of the relation schemas .

2) Interactive Data Manipulation Language (DML )

The SQL DML includes a query language based on both the relational algebra and the tuple relational calculus . Which means that one can enter queries over a table and find out the necessary information from the database . The search query is mostly written in a language which implements the core

principles and working of relational algebra and relational calculus as their underlying coding platforms

3) Integrity

The SQL DDL includes commands for specifying integrity constraints that the data stored in the database must satisfy. All the necessary updates that violate the integrity constraints are not allowed over the query engine of SQL.

4) View Definition

The SQL DDL includes commands for defining the views upon a database. Views are nothing but a sub-part representation of a main table present over a database and it consists of only the structure of the database with filtered or unfiltered data from the main table .However any form of DML statements cannot be made to work upon a view.

5) Transaction Control

SQL includes commands for specifying the beginning and ending of the transactions . The broader meaning of this statement says about the way in which stored procedures , cursors and triggers could be used for the sake of making and breaking any transaction at the desired point of time

6) Embedded SQL and Dynamic SQL

Embedded and Dynamic SQL defines the manner in which SQL statements can be embedded within general purpose programming languages such as C , C++ , Java , PL/I , Cobol , Pascal and Fortran .

7) Authorization

The SQL DDL includes commands for specifying access rights to relations and views . This means that SQL provides its users with the potential to access the tables present over a given / created / hosted database specific rights for access and views to its data within the database . Many database systems support most of the SQL-92 standards and some of the new constructs which are present in SQL : 1999 and SQL : 2003 although no databases provide non-standard features which defer from the detailed and provided features of SQL as provided in the standard specification manual of the different release versions of SQL .

Article on - Data Definition Feature of SQL and its specifications

· The set of relations within a database must be specified to the system by means of a data definition language . Here , it means that all the forms of defining the structure and schema of a database and a relation are to be done with the help of a standard language called as a Structured Query Language .

· The SQL DDL allows the provision of specification of not only a set of relations but also information on each of the relations over a database which also includes the following types :

1) The schema for each of the relation within the database

This means that SQL language can be used for the creation of the schema definition of each of the relation within the database over which one is working upon . For example if there are a set of associated relational tables over a database management system in the given format :

branch (branch_name,branch_city,assets)

customer(customer_name,customer_street,customer_city)

Here , one can notice that the two existing tables within the given supposed database consists of primarily two tables that is branch and customer which has the following attributes as given in the parenthesis brackets . This representative definition of the two relational tables can be practically created and implemented using the SQL language using the create table command which takes the attribute names as parameters within the database schema definition function .

2) The domain of values associated with each of the attribute .

So from the above picture one can notice what are the essential domain types present in SQL that support the purpose of Data Definition within a database relation . So whatever data needs to be defined and stored within the columns/attributes of a relational table they should adhere to these domain standards and any exception in the datatype and format of the data is not facilitated within the SQL scripts .

3) The Integrity Constraints

The purpose of defining integrity constraints upon a database is to make sure that the database consists of attributes which adhere to the definition set upon the database as per the manner prescribed upon the database as in the case wherever not null is defined , the attribute cannot have a null value within it ;

wherever the constraint primary is defined , then it will mean that the values contained within the attribute for the relation cannot having non-unique values within it but columns / attributes where primary is not declared from the beginning , those attributes / columns can contain non--unique values .. etc etc

4) The set of indices to be maintained for each of the relation

Tables and Relations are needed to be maintained and described for the purpose of better searching , easier access and book keeping of relational tables within the database and as such indexing is one of the key parameters while considering the case of creation / definition of the data over databases

5) The security and authorization of information for each of the relation

This aspect of Data definition for a database and all the relations within it is a mandatory criteria for maintenance of security of the database ; otherwise unauthorised users would access the data and make unregulated transactions or tampering which might make the purpose of database and relational database invalidated

6) The physical storage structure of each relation on a disk

Sometimes whenever a relational table and other associated tables are created , the physical storage limit is specified so that stack overflow and buffer overflow type of memory issues could be handled well in time or in advance . Constraints related to data size limits are also set while defining the attributes during

the table creation commands so that any issues relating to memory would be handled while data defining and storage and exceptions would not arise afterwards .

Monday, March 1, 2021

Introduction to Hypothesis Testing In Statistics with sample problems and explanations

Introduction to Hypothesis Testing

In Statistics with problems and examples

* This article will try to breakdown the concept of Hypothesis testing into smaller chunks, and as someone goes through the concept of Hypothesis Testing as covered in the article, one would be able to have a very good idea and hold over the concept of Hypothesis testing and how and when these testing types could be put to use .

* The best part of learning this from this articulated version of Hypothesis Testing is that one can get to have a good understanding of the concept of Hypothesis Testing is by going through the article segment by segment as one may get to notice each of the discussed concepts .

* One of the main reasons why many of the students consider the topic of Hypothesis Testing to be difficult is that many of the students consider the case studies of Hypothesis Testing to be very much different from the various other concepts discussed in within the topic of Statistics as the testing methods of the various kinds of tests within Hypothesis Testing is much different from one another and many a students face a lot of hard time in understanding the concepts and also differentiating each of the concepts from one another on a wide scale where there would be cases where there are smaller sample sizes and there are also going to be samples of higher sample sizes and there would be cases discussed with usage of different to different sets of parameters for each of the underlying test types constituting the concept of Hypothesis Testing .

* And in each of the various distributions, one is going to use the case of different types of distributions and different techniques for arriving at a result which puts to use the various methods used for finding the results of each of the hypothesis tests .

* So we will begin to dig deeper into the concept of Hypothesis Testing by starting with the concept of Hypothesis Testing .

One can state that Hypothesis Testing is a premise or claim that we want to test so basically it is not a type of test where one would go to some laboratory to do some tests, rather one may need to formulate some statistical sample surveys from which the task of the statistician or the sample analyst would be to collect the associated results out of the data and then collect meaningful results from

the sample by testing the hypothesis.

* Next let us get to know what is Null Hypothesis. But first let’s understand what the Null value within the hypothesis means. Null stands for something which is zero or empty which is a widely used jargon in the world of computer science and programming language literature .

* So, when someone is talking about the Null Hypothesis then someone is talking about the Default hypothesis which talks about something which is already established . The Null Hypothesis is generally denoted by the letter capital H with a small zero as the sub-script . H(0) is the most widely accepted value for a parameter which is accepted by almost all the statisticians and analysts .

* And whenever someone wants to challenge the Null and wants to read and analyse the What Else portion of the Hypothesis then it could be said that one is interested in the Alternate or Alternative form of Hypothesis. This Alternative Hypothesis can be represented by the symbol (H1) or (Ha) where the sub-script for a means that it is an alternative hypothesis form. In some statistics books, the alternative hypothesis is also called as the Research Hypothesis and it involves the claim to be tested .

* Lets take the principle of Gravity into consideration, Many centuries back, Newton discovered the essence of Gravity when an apple suddenly comes and falls over his head and did some research and gave the postulates supporting his hypothesis upon Gravity But later Einstein comes into the picture and gives another set of hypothesis which is an alternative from the views that Einstein

Provided and thus they formed as the postulates for an Alternative View of Hypothesis formulated by Einstein. And his view of gravity was much different and complicated for generalisation as what had been provided by Newton and thus his view of alternative hypothesis was a lot more different than that of Newton and thus the view was eventually accepted and gained stature in the Science journals of that time .

========================================================

* Ok now lets take a question to get a better perspective into the idea of Hypothesis Testing with an example .

It is believed that a candy machine makes chocolate bars that are on average 5g in weight. A worker claims that the machine after maintenance no longer produces 5g bars. Formulate the expression for H0 that is Null Hypothesis and H1 which stands for alternative Hypothesis.

Ho = Null Hypothesis

H1 = Alternative Hypothesis

Ho : mean weight of each chocolate bar is 5 grams which is stated as the hypothesis question

H1 : mean weight of each of the chocolate bars is not equal to 5 grams as produced by the chocolate factory

So , one can sense from both the above given statements that the null hypothesis and the alternative hypothesis are mathematically opposite to each other .In all of the hypothesis tests that anyone can perform , the null hypothesis and the alternative hypothesis are always mathematically opposite to each other .

* So now we are interested in testing the outcomes of the test as have been provided in the null hypothesis and the alternative hypothesis when we need to consider that the null hypothesis must be true .

========================================================

* So what are the possible outcomes of the test for Null Hypothesis that one can do over the hypothesis :

1) Reject the Null Hypothesis

2) Accept the Null Hypothesis

* If we reject the null hypothesis then we mean that whatever has been provided against the data is held to be be false which means that mean of all the weights of all the chocolate bars in the sample is not equals to 5 grams and hence the hypothesis is rejected

* If we fail to reject the null hypothesis (the statement - fail to reject the null hypothesis doesn't absolutely mean that we are accepting the null hypothesis ) then one would mean that whatever statement has been provided in the Alternative Hypothesis is false and as such the weights of all the chocolate bars in the given sample is not equal to 5 gram

* The possible outcomes as pointed within the hypothesis statements is just like the opinion formulated in the court of law and then one can either reject the hypothesis ( H0 ) or else one can fail to reject the null hypothesis ( H1 ) .

* So now after conducting the test and then finding out the possible outcomes of the test , one would try to ascertain the Test Statistic in the following manner .

* Test Statistic :

The test statistic is calculated from sample data and can be used to decide whether to reject the null hypothesis or fail to reject the null hypothesis. As an example in the case of the Candy-Bar factory , may be one may start sampling 50 chocolate bars in the factory and from the factory we would be doing a statistical descriptive analysis of the data and get average value of the amount of chocolate bars present within the sample and get the value of the test statistic for the data .

Then one can determine Statistically, the significant value for the data which means how to arrive at a decision whether to reject the null hypothesis or fail to reject the null hypothesis.

Suppose say one guy draws a sample on Monday consisting of 50 bars and finds an average of 5.12 grams which is technically not equivalent to 5 grams , similarly another guy draws a sample of 50 bars which is again not equivalent to exactly 5 grams but is roughly around 5.72 grams and similarly on Friday the average count is noticed to be 6.53 grams then one can deduce from the above three calculations that the values are so much statistically different from each other and as such values are very much different from each other and also very much distant from the null hypothesis set at average price of 5 grams per candy bar .

Therefore, comparing the obtained prices as given for Monday is 5.12 which is pretty much closer to the value of the average weight of 5 gms , then the next average weight is that of 5.72 grams which is a bit distant from the accepted null hypothesis accepted value of 5 grams and then the third reading taken for the 50 bars is found out to be 6.53 grams which is very different and distant from the hypothetical mean value of 5 grams . Therefore , one can come to a general conclusion about the data that it is okay to reject the null hypothesis as the values are nowhere very near to the hypothesis accepted mean value of 5 grams per chocolate bar .

So, in statistical terms by looking at the hypothesis value and the actual values, an analyst should be readily able to make a concrete decision when to reject the null hypothesis and when not to reject the null hypothesis.

This is what in general the purpose of a hypothesis test is that is a hypothesis test needs to collect the data , generalise the data and obtain a test statistic which would enable to make a decision when to reject the null hypothesis and when not to reject the null hypothesis by having a proper look at the data obtained and then ascertain the case where the value obtained is too high and when the value obtained is too low and when to accept the condition for null hypothesis and when to reject the hypothesis which means that one needs to concretely decide the boundaries for the null hypothesis and conditions where to reject or not reject from the statistically significant data available .

========================================================

Level of Confidence

Level of confidence is also alternatively referred to as level of significance in many terms where it is graphically determined where to reject the null hypothesis and where to not reject the null hypothesis. Thus it enables one to know how confident someone is in their decision and what level of confidence expressed in percentage values.

For example if the level of confidence or value is 99% for rejection of a null hypothesis then, everyone would accept the decision for rejection of the hypothesis but in case if the level of confidence is around 50% then it means that it is not a right decision to reject the null hypothesis under the given circumstance as the statistically obtained value for level of confidence is very less .

========================================================

Level of Significance

Basically this is called as alpha which is numerically expressed as

( 1 - C) that is : level of significance is equals to 1 minus the level of confidence for some sample of data .

Mathematically , if the level of confidence is found out to be as 95% with the level of significance (alpha) is found out to be as alpha = 1 - 0.95 , then alpha = 0.05

========================================================

From both the terms "level of confidence" and "level of significance" one can make an appropriate decision whether to reject or not to reject the formulated hypothesis

The analogy that needs to be drawn from this test of hypothesis that could be also thought to be analogous to another is when someone is accused of a crime, the first assumption to be made in favour of the convict is that the convict is innocent and it is upto the lawyers and the evidence that the convict is guilty and if the lawyers and the evidences don’t prove the alternate assumption that the convict

is guilty then automatically the assumption that the convict is innocent would be held true .

Same is the case of our example. We assumed that the mean / average weight of one chocolate bar is around 5 grams but also formulated an alternative assumption that the average / mean weights of the chocolates are not the same but different from each other . To support / test our assumption we took the case of testing / experimentation and found out contradictory results for the data

values obtained and after the results were analysed we came to a conclusion whether to reject the null hypothesis or not to reject the null hypothesis .